Skip to content

Instantly share code, notes, and snippets.

View bryanpaget's full-sized avatar
🎯
Focusing

Bryan Paget bryanpaget

🎯
Focusing
View GitHub Profile
@bryanpaget
bryanpaget / issues.md
Created August 6, 2025 17:49
Issues with our repo

Issue 1: Runtime Dockerfile Modification is Fragile and Non-Transparent

Title: Replace runtime Dockerfile modification with build arguments Labels: enhancement, good-first-issue, docker

Problem: The workflow modifies Dockerfiles at runtime using sed to inject the base image:

- name: Set FROM and as in Dockerfile
   run: |
     sed -i '1i FROM ${{ env.BASE_IMAGE}} as ${{ inputs.image }}' 
@bryanpaget
bryanpaget / conda.md
Created August 6, 2025 15:02
Managing Multiple Python and R Versions with Conda.

English Section

Introduction

Conda is a powerful package and environment manager that allows you to maintain multiple versions of Python and R simultaneously on the same system. This approach enables you to keep three versions of each language active at once, reducing migration burdens and allowing gradual transitions between versions.

Installation

If you don't have conda installed, download Miniconda (lightweight version) from: https://docs.conda.io/en/latest/miniconda.html

Creating Environments with Specific Versions

@bryanpaget
bryanpaget / Practical R Data Analysis: Pokemon Dataset.md
Created August 3, 2025 23:11
This document demonstrates a comprehensive exploratory data analysis (EDA) workflow using the Pokemon dataset. EDA is a crucial first step in any data analysis project, helping us understand the data's structure, identify patterns, and generate hypotheses.

Practical R Data Analysis: Pokemon Dataset

Document 1: Exploratory Data Analysis of Pokemon Dataset

Introduction

This document demonstrates a comprehensive exploratory data analysis (EDA) workflow using the Pokemon dataset. EDA is a crucial first step in any data analysis project, helping us understand the data's structure, identify patterns, and generate hypotheses.

Loading and Initial Data Examination

@bryanpaget
bryanpaget / RASK_specs_20130301_20140325_no_spec28.5.md
Created July 31, 2025 20:35
Road Attribute Search Key (RASK) Specifications

Road Attribute Search Key

(RASK) Specifications

@bryanpaget
bryanpaget / MKL vs OpenBLAS.md
Last active July 21, 2025 18:27
Programmatic comparison of MKL vs OpenBLAS vs Default, in both R and Python, using the same benchmark structure across all.

MKL vs OpenBLAS

Overview of What We'll Compare

Language Setup BLAS Library
R Base R (default) Reference BLAS
R Conda + MKL Intel MKL
R Conda + OpenBLAS OpenBLAS
Python Conda + MKL Intel MKL

SAS Chaos to Reproducible Analytical Pipelines (RAP)

What We're Solving

We currently maintain 4.2 million SAS scripts, many of which:

  • Use hardcoded values (regions, thresholds, dates)
  • Are copied and edited manually, creating minor variants of the same logic
  • Live outside of version control
  • Are not modular or testable
@bryanpaget
bryanpaget / fabric.md
Last active July 9, 2025 11:15
Microsoft Fabric is very much like SAS in spirit: a tightly integrated, closed ecosystem that promises "one platform to rule them all." And switching from SAS to Fabric could absolutely be just trading one expensive dependency for another — especially if the goal is to cut costs and increase flexibility.

Microsoft Fabric is very much like SAS in spirit: a tightly integrated, closed ecosystem that promises "one platform to rule them all." And switching from SAS to Fabric could absolutely be just trading one expensive dependency for another — especially if the goal is to cut costs and increase flexibility.

🔁 SAS vs Fabric – Same Playbook

Feature/Philosophy SAS Microsoft Fabric
Vendor lock-in Strong Strong (within Microsoft stack)
All-in-one ecosystem Yes Yes
Designed for low-code users Yes (SAS Studio, GUI tools) Yes (Power BI, Dataflows, etc.)
Analytics + Reporting Built-in Built-in (Power BI)
@bryanpaget
bryanpaget / zone-learning-path.md
Last active July 3, 2025 16:15
Structured progression of certifications and training resources for developers working with Kubernetes and cloud-native technologies.

🧭 Kubernetes Learning Path for Zone Developer Team

Recommended Course: LFS258 - Kubernetes Fundamentals Bundle Price: $595 USD (includes LFS258 + CKA exam + exam retake) Duration: ~35 hours Goal: Build hands-on Kubernetes cluster management skills, including pods, networking, volumes, scheduling, Helm, and troubleshooting.

Justification:

Requesting Your Linux Cloud VM

Follow these steps to request a Linux Cloud VM:

1. Access the VM Request Portal

Go to the VM Request Portal.

2. Search for Compute Services

@bryanpaget
bryanpaget / open‑source-data-catalog.md
Last active May 20, 2025 14:16
This gist presents a turnkey blueprint for a self‑hosted, open‑source data catalog and governance platform designed to run on your existing AKS + Kubeflow cluster, leveraging best‑of‑breed projects like OpenMetadata, Apache Atlas, DataHub, and Amundsen for metadata management, lineage, and search.

Executive Summary

This proposal outlines a self‑hosted, open‑source data catalog solution—leveraging OpenMetadata, Apache Atlas, or DataHub—deployed on your existing AKS + Kubeflow environment. It layers in hardened security controls using Keycloak for identity, Vault for secrets, cert‑manager for TLS, OPA Gatekeeper for policy enforcement, Falco for runtime threat detection, and Kubernetes Network Policies for micro‑segmentation. All components are deployable via Helm charts, minimizing cost while maximizing compliance and operational efficiency.


1. Architecture Overview

1.1 Metadata Catalog Core