Lucas Daniel Velazquez M. Luxter77

💭

In the process of imploding

I might be dyslexic

Luxter77 / attention.py

Created December 3, 2024 03:01 — forked from iamlemec/attention.py

Using KV cache with mixed causal/non-causal attention.

	import torch
	from transformers.models.roberta import RobertaConfig, RobertaModel, RobertaTokenizer

	# load model and tokenizer
	tokenizer = RobertaTokenizer.from_pretrained('FacebookAI/roberta-base')
	model = RobertaModel.from_pretrained('FacebookAI/roberta-base', is_decoder=True).to('cuda')

	# tokenize inputs
	text = 'hello world, this is a test'
	inputs = tokenizer(text, return_tensors='pt').to('cuda')