Skip to content

Instantly share code, notes, and snippets.

View S1ro1's full-sized avatar

Matej Sirovatka S1ro1

View GitHub Profile
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: false
fsdp_config:
fsdp_activation_checkpointing: false
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_cpu_ram_efficient_loading: true
fsdp_offload_params: false
from transformers import AutoModelForCausalLM
from accelerate import Accelerator
import torch
torch.cuda.memory._record_memory_history()
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained(model_id)
import torch
def print_test_end():
print("---------------")
def test_vectors_bwd():
print("TEST VECTORS BTW")
a = torch.tensor([[1.0, -2.0, 3.0]], requires_grad=True)