Skip to content

Instantly share code, notes, and snippets.

View ParagEkbote's full-sized avatar
🪧
https://paragekbote.github.io/

Parag Ekbote ParagEkbote

🪧
https://paragekbote.github.io/
View GitHub Profile
@ParagEkbote
ParagEkbote / dataset_eda.py
Last active June 9, 2026 17:28
EDA Script for creation of visual plots.
"""
EDA Data Pipeline — Pruna vs HQQ
==================================
Loads raw benchmark datasets from HuggingFace Hub,
cleans them, and saves processed CSVs.
Outputs
-------
benchmark/eda_outputs/
├── combined_cleaned_results.parquet
@ParagEkbote
ParagEkbote / viz_memory_plots.py
Last active June 9, 2026 17:27
Script for visualization of memory plots.
"""
Memory Benchmark Plots — Pruna vs HQQ
Generates three focused memory-centric plots:
1. kv_cache_growth.png — KV cache growth (GB/token) across generation lengths
2. peak_memory_scaling.png — Peak memory (GB) across generation lengths
3. memory_stability_envelope.png — Peak memory mean ± 1 SD envelope
"""
from pathlib import Path
@ParagEkbote
ParagEkbote / aggregate_values_benchmark_table.py
Last active June 9, 2026 17:35
Script to create an aggregate summary table comparing useful metrics for Pruna vs Base (HQQ+torch.compile)
"""
Aggregate Summary Table — Pruna vs Base(HQQ+torch.compile)
=======================================
Loads raw benchmark CSVs, computes aggregate_summary.csv,
then saves it as a styled PNG table.
Input
-----
benchmark/raw_benchmark_results_hqq.csv
@ParagEkbote
ParagEkbote / fetch_contributions.py
Last active June 19, 2026 14:48
Python scripts+ ci job to fetch your open-source contributions and reviews; analyze it with AI buttons.
import os
import requests
import subprocess
import httpx
import asyncio
import urllib.parse
from datetime import date
from pathlib import Path
from collections import Counter
import logging
@ParagEkbote
ParagEkbote / hf_datasets_dagster.py
Last active June 1, 2026 03:58
Example pipeline for transforming glue dataset with dagster-hf-datasets
from dagster_hf_datasets import (
HFDatasetPublisher,
HFParquetIOManager,
HuggingFaceResource,
hf_dataset_asset,
)
from datasets import Dataset
from dagster import AssetExecutionContext, Definitions, MaterializeResult, asset
@ParagEkbote
ParagEkbote / hqq_torch_compile_base.py
Last active June 9, 2026 17:33
A custom benchmark runner script applying optimization algorithms (HQQ (4bit) + torch.compile ) for Llama-3.2-1B-Instruct
import os
import gc
import time
import json
import platform
import torch
import pandas as pd
from tqdm.auto import tqdm
@ParagEkbote
ParagEkbote / hqq_torch_compile_pruna.py
Last active June 9, 2026 17:33
A benchmark runner script with pruna applying optimization algorithms (HQQ (4bit) + torch.compile ) for Llama-3.2-1B-Instruct
import os
import gc
import time
import json
import platform
import torch
import numpy as np
import pandas as pd
#!/usr/bin/env python3
"""
Tokenizer Performance Comparison
"""
# ------------------------------------------------
# Dependencies
# Install with:
#
# uv pip install \
# "datasets>=2.14.0" \
@ParagEkbote
ParagEkbote / pr_insights.py
Created March 9, 2026 09:57
Open-Source Contributions Insight
import os
import json
import subprocess
import requests
from datetime import datetime
from statistics import mean, median, stdev
from collections import defaultdict
from pathlib import Path
import os
import time
import numpy as np
import matplotlib.pyplot as plt
import torch
from tqdm import tqdm
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,