Skip to content

Instantly share code, notes, and snippets.

View bryanpaget's full-sized avatar
🎯
Focusing

Bryan Paget bryanpaget

🎯
Focusing
View GitHub Profile
@bryanpaget
bryanpaget / Adopting an API-First Architecture for OneLake Integration.md
Last active February 6, 2026 16:13
Adopting an API-First Architecture for OneLake Integration

API-First Architecture for OneLake Integration

TL;DR

Our Recommendation: Mounting OneLake is not worth the effort. While the Medium article proves it's technically possible on a single Linux VM, that approach is mismatched for our Kubernetes environment and would force us to rebuild the same type of fragile, unsupported abstraction layer that caused our past goofys headaches. We achieve better security, stability, and sovereignty natively through the supported API approach.

Executive Summary

We strongly recommend against mounting Microsoft OneLake as a filesystem in our Kubeflow environment. A documented method exists for mounting to a Linux VM using BlobFuse, but its design and constraints are mismatched for our dynamic Kubernetes platform. Adapting it for The Zone would force us to build and maintain a complex, unsupported abstraction layer—**precisely the type of wor

Presentation Outline:

  • 30 Minutes total

Spend 2 Minutes

  • Introduce yourself and thank them for having you.
  • Talk about how Jose is the current team lead but I am happy to give this presentation
  • Talk about how you did a presentation in the summer and those slides are still available
  • Fall
@bryanpaget
bryanpaget / CronJobs Diagram.md
Last active November 21, 2025 00:59
November 2025: The Zone Fall Update
flowchart TD
    A["intelligent-data-monitor
    (Cron Job)
    Generates synthetic data
    Stores in its own Git repo"] --> B["Synthetic Data Git Repo"]

    B --> C["anomaly-monitoring-dashboard
   (Cron Job)
 Pulls data from repo
@bryanpaget
bryanpaget / spark.md
Last active October 29, 2025 14:58
Spark Instructions

Step-by-Step Guide to Install Spark Operator

1. Add Kubeflow Helm Chart Repository

# Add the Kubeflow Helm chart repo
helm repo add kubeflow https://charts.kubeflow.org
helm repo update
@bryanpaget
bryanpaget / StatCan Data Sovereignty Strategy.md
Last active September 8, 2025 17:26
StatCan Data Sovereignty Strategy

StatCan Data Sovereignty Strategy

Core Recommendation

StatCan must implement a Canadian-controlled data platform as our primary infrastructure for sensitive data, with Microsoft Fabric used only for specific, non-sensitive applications.


Why This Matters

@bryanpaget
bryanpaget / Implement Lean Data Virtualization with Spark & Colectica.md
Last active September 2, 2025 15:03
Implement Lean Data Virtualization with Spark & Colectica

Epic: Implement Lean Data Virtualization with Spark & Colectica


Section 1: Deploy Spark on Kubernetes

Context:
No Spark backend exists. Adding Spark enables scalable federated queries and integrates with existing Kubeflow/JupyterLab.

Todo:

  • Install Spark Operator in Kubernetes cluster via Helm.
@bryanpaget
bryanpaget / results.txt
Last active August 28, 2025 14:02
A Python package test suite for The Zone.
Python version: 3.13.5 | packaged by conda-forge | (main, Jun 16 2025, 08:27:50) [GCC 13.3.0]
Testing 63 packages for compatibility with Python 3.13.5 (Offline mode)
Testing numpy...
✅ NumPy basic functionality test passed
Testing pandas...
✅ Pandas basic functionality test passed
Testing scipy...
✅ SciPy basic functionality test passed
Testing matplotlib...
@bryanpaget
bryanpaget / Common Ground.md
Last active August 22, 2025 16:48
📢 Finding Common Ground: Let's Collaborate on Our Data Science Environment

📢 Finding Common Ground

Let's Collaborate on Our Data Science Environment

Dear Zone Friends,

I want to thank everyone for the passionate discussion about our Kubeflow environment. The diverse perspectives shared have highlighted important considerations and helped us refine our approach.

Acknowledging Different Perspectives

We've heard valuable feedback about:

@bryanpaget
bryanpaget / Packages.md
Last active August 22, 2025 14:29
📢 Aidez-nous à façonner notre environnement de science des données! // Help Shape Our Data Science Environment!

📢 Aidez-nous à façonner notre environnement de science des données!

Chers amis de La` Zone,

Nous optimisons notre environnement Kubeflow pour mieux répondre à vos besoins. Pour créer une configuration de base véritablement utile, nous avons besoin de votre avis sur les packages qui comptent le plus pour votre travail quotidien.

État actuel et changements à venir

Notre environnement inclut déjà des packages statistiques essentiels (tidyverse, pandas, scikit-learn), des outils d'entreprise (ODBC, Kubernetes) et des environnements de développement (VSCode, JupyterLab, RStudio).

@bryanpaget
bryanpaget / The Zone: A Modern Data Science Platform.md
Created August 20, 2025 19:33
Empowering Citizen Developers with Open Source
marp theme size paginate header footer
true
default
58140
true
Statistics Canada | Statistique Canada