Skip to content

Instantly share code, notes, and snippets.

View Luxter77's full-sized avatar
💭
In the process of imploding

Lucas Daniel Velazquez M. Luxter77

💭
In the process of imploding
View GitHub Profile
@Luxter77
Luxter77 / attention.py
Created December 3, 2024 03:01 — forked from iamlemec/attention.py
Using KV cache with mixed causal/non-causal attention.
import torch
from transformers.models.roberta import RobertaConfig, RobertaModel, RobertaTokenizer
# load model and tokenizer
tokenizer = RobertaTokenizer.from_pretrained('FacebookAI/roberta-base')
model = RobertaModel.from_pretrained('FacebookAI/roberta-base', is_decoder=True).to('cuda')
# tokenize inputs
text = 'hello world, this is a test'
inputs = tokenizer(text, return_tensors='pt').to('cuda')