Skip to content

Instantly share code, notes, and snippets.

View byrro's full-sized avatar

Renato Byrro byrro

View GitHub Profile
@eladroz
eladroz / arrow2-graviton2.md
Last active August 12, 2022 06:09
Packaging Apache Arrow 2.0 on AWS Graviton2 (ARM64)

I'm now working on big data processing with Pandas at scale, as a lightweight alternative to Spark. Fortunately, the Apache Arrow project brings with it an excellent and very fast Parquet reader and writer.

With the current push to ARM in both personal computers and the data center, I was curious to check the performance of my code on ARM - running on AWS' homegrown Graviton2 processor. Their c6g instance types are 20% cheaper than the equivalent Intel-based c5's, while promising faster performance. If that's the future, why not start getting ready now?

While there are already Python wheels for NumPy and Pandas, there is no official build yet for PyArrow. There's a pull request in the works,

@Birch-san
Birch-san / fine-tuning.md
Last active December 27, 2023 17:24
Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

Fine-tuning LLaMA-7B on ~12GB VRAM with QLoRA, 4-bit quantization

nvidia-smi said this required 11181MiB, at least to train on the sequence lengths of prompt that occurred initially in the alpaca dataset (~337 token long prompts).
You can get this down to about 10.9GB if (by modifying qlora.py) you run torch.cuda.empty_cache() after PEFT has been applied to your loaded model and before you begin training.

Setup

All instructions are written assuming your command-line shell is bash.

Clone repository: