Skip to content

Instantly share code, notes, and snippets.

@jsam
Last active February 6, 2025 11:36
Show Gist options
  • Save jsam/f5fcfadffc7f08a91a26d27d44d1535a to your computer and use it in GitHub Desktop.
Save jsam/f5fcfadffc7f08a91a26d27d44d1535a to your computer and use it in GitHub Desktop.
ml prep.

Basic Decoder-Only Transformer Tasks

  • Implement a basic decoder-only transformer from scratch with attention mechanisms
  • Add positional encoding to a transformer model (relative vs rotary)
  • Implement causal/masked (ie. similarities.masked_fill(mask=mask, value=float("-inf")))
  • Build/use BPE tokenizer and vocabulary system
  • Implement beam search and sampling strategies for text generation
  • Explain/add multi-head attention
  • Different attention patterns (sliding, linear, local, sparse)
  • Explain model paralellism elements

Optimization & Model Efficiency

  • gradient checkpointing to reduce memory usage
  • mixed precision traning support
  • explain/add model quantization
  • explain/add model distilation
  • explain/add model prunning
  • implement efficient inference strategies (ie. KV caching)

Training optimization

  • distributed training (ie. deepspeed)
  • add custom learning rate schedulers
  • add gradient clipping and normalization
  • million different way to optimize data loding pipelines (ie. straming vs batching, lowering IO, etc)
  • checkpoint saving and loading with verification

Production readiness

  • logging and monitoring in production
  • model serving and versioning endpoints
  • distributed model serving and low-latency pipelines
  • model A/B testing
  • data drift
  • SafeAI ?
@jsam
Copy link
Author

jsam commented Feb 6, 2025

  • Analyze properties of the generated embedding

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment