Skip to content

Instantly share code, notes, and snippets.

View bryanpaget's full-sized avatar
🎯
Focusing

Bryan Paget bryanpaget

🎯
Focusing
View GitHub Profile

Presentation Outline:

  • 30 Minutes total

Spend 2 Minutes

  • Introduce yourself and thank them for having you.
  • Talk about how Jose is the current team lead but I am happy to give this presentation
  • Talk about how you did a presentation in the summer and those slides are still available
  • Fall
@bryanpaget
bryanpaget / CronJobs Diagram.md
Last active November 21, 2025 00:59
November 2025: The Zone Fall Update
flowchart TD
    A["intelligent-data-monitor
    (Cron Job)
    Generates synthetic data
    Stores in its own Git repo"] --> B["Synthetic Data Git Repo"]

    B --> C["anomaly-monitoring-dashboard
   (Cron Job)
 Pulls data from repo
@bryanpaget
bryanpaget / spark.md
Last active October 29, 2025 14:58
Spark Instructions

Step-by-Step Guide to Install Spark Operator

1. Add Kubeflow Helm Chart Repository

# Add the Kubeflow Helm chart repo
helm repo add kubeflow https://charts.kubeflow.org
helm repo update
@bryanpaget
bryanpaget / StatCan Data Sovereignty Strategy.md
Last active September 8, 2025 17:26
StatCan Data Sovereignty Strategy

StatCan Data Sovereignty Strategy

Core Recommendation

StatCan must implement a Canadian-controlled data platform as our primary infrastructure for sensitive data, with Microsoft Fabric used only for specific, non-sensitive applications.


Why This Matters

@bryanpaget
bryanpaget / Implement Lean Data Virtualization with Spark & Colectica.md
Last active September 2, 2025 15:03
Implement Lean Data Virtualization with Spark & Colectica

Epic: Implement Lean Data Virtualization with Spark & Colectica


Section 1: Deploy Spark on Kubernetes

Context:
No Spark backend exists. Adding Spark enables scalable federated queries and integrates with existing Kubeflow/JupyterLab.

Todo:

  • Install Spark Operator in Kubernetes cluster via Helm.
@bryanpaget
bryanpaget / results.txt
Last active August 28, 2025 14:02
A Python package test suite for The Zone.
Python version: 3.13.5 | packaged by conda-forge | (main, Jun 16 2025, 08:27:50) [GCC 13.3.0]
Testing 63 packages for compatibility with Python 3.13.5 (Offline mode)
Testing numpy...
✅ NumPy basic functionality test passed
Testing pandas...
✅ Pandas basic functionality test passed
Testing scipy...
✅ SciPy basic functionality test passed
Testing matplotlib...
@bryanpaget
bryanpaget / Common Ground.md
Last active August 22, 2025 16:48
📢 Finding Common Ground: Let's Collaborate on Our Data Science Environment

📢 Finding Common Ground

Let's Collaborate on Our Data Science Environment

Dear Zone Friends,

I want to thank everyone for the passionate discussion about our Kubeflow environment. The diverse perspectives shared have highlighted important considerations and helped us refine our approach.

Acknowledging Different Perspectives

We've heard valuable feedback about:

@bryanpaget
bryanpaget / Packages.md
Last active August 22, 2025 14:29
📢 Aidez-nous à façonner notre environnement de science des données! // Help Shape Our Data Science Environment!

📢 Aidez-nous à façonner notre environnement de science des données!

Chers amis de La` Zone,

Nous optimisons notre environnement Kubeflow pour mieux répondre à vos besoins. Pour créer une configuration de base véritablement utile, nous avons besoin de votre avis sur les packages qui comptent le plus pour votre travail quotidien.

État actuel et changements à venir

Notre environnement inclut déjà des packages statistiques essentiels (tidyverse, pandas, scikit-learn), des outils d'entreprise (ODBC, Kubernetes) et des environnements de développement (VSCode, JupyterLab, RStudio).

@bryanpaget
bryanpaget / The Zone: A Modern Data Science Platform.md
Created August 20, 2025 19:33
Empowering Citizen Developers with Open Source
marp theme size paginate header footer
true
default
58140
true
Statistics Canada | Statistique Canada
@bryanpaget
bryanpaget / analysis.md
Last active August 21, 2025 20:34
analysis of packages

1. Top 20 R Packages for Data Science

  1. tidyverse (core ecosystem)
  2. dplyr (data manipulation)
  3. ggplot2 (visualization)
  4. readr (data import)
  5. tidyr (data tidying)
  6. stringr (string manipulation)
  7. lubridate (date/time handling)
  8. forcats (factor handling)
  9. purrr (functional programming)