This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| EDA Data Pipeline — Pruna vs HQQ | |
| ================================== | |
| Loads raw benchmark datasets from HuggingFace Hub, | |
| cleans them, and saves processed CSVs. | |
| Outputs | |
| ------- | |
| benchmark/eda_outputs/ | |
| ├── combined_cleaned_results.parquet |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Memory Benchmark Plots — Pruna vs HQQ | |
| Generates three focused memory-centric plots: | |
| 1. kv_cache_growth.png — KV cache growth (GB/token) across generation lengths | |
| 2. peak_memory_scaling.png — Peak memory (GB) across generation lengths | |
| 3. memory_stability_envelope.png — Peak memory mean ± 1 SD envelope | |
| """ | |
| from pathlib import Path |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Aggregate Summary Table — Pruna vs Base(HQQ+torch.compile) | |
| ======================================= | |
| Loads raw benchmark CSVs, computes aggregate_summary.csv, | |
| then saves it as a styled PNG table. | |
| Input | |
| ----- | |
| benchmark/raw_benchmark_results_hqq.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import requests | |
| import subprocess | |
| import httpx | |
| import asyncio | |
| import urllib.parse | |
| from datetime import date | |
| from pathlib import Path | |
| from collections import Counter | |
| import logging |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from dagster_hf_datasets import ( | |
| HFDatasetPublisher, | |
| HFParquetIOManager, | |
| HuggingFaceResource, | |
| hf_dataset_asset, | |
| ) | |
| from datasets import Dataset | |
| from dagster import AssetExecutionContext, Definitions, MaterializeResult, asset |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import gc | |
| import time | |
| import json | |
| import platform | |
| import torch | |
| import pandas as pd | |
| from tqdm.auto import tqdm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import gc | |
| import time | |
| import json | |
| import platform | |
| import torch | |
| import numpy as np | |
| import pandas as pd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| """ | |
| Tokenizer Performance Comparison | |
| """ | |
| # ------------------------------------------------ | |
| # Dependencies | |
| # Install with: | |
| # | |
| # uv pip install \ | |
| # "datasets>=2.14.0" \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import json | |
| import subprocess | |
| import requests | |
| from datetime import datetime | |
| from statistics import mean, median, stdev | |
| from collections import defaultdict | |
| from pathlib import Path | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import time | |
| import numpy as np | |
| import matplotlib.pyplot as plt | |
| import torch | |
| from tqdm import tqdm | |
| from transformers import ( | |
| AutoModelForCausalLM, | |
| AutoTokenizer, |
NewerOlder