Skip to content

Instantly share code, notes, and snippets.

@freedomtowin
freedomtowin / intro-zero_hugging-face-course-tokenizers.txt
Created November 17, 2024 23:16
Example of creating a Gist using Python
{'input_ids': tensor([[ 101, 1045, 2293, 2023, 3185, 999, 102],
[ 101, 2009, 2001, 3100, 1012, 102, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 0]])}
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-tokenizers.py
Created November 17, 2024 23:16
Example of creating a Gist using Python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
sentences = ["I love this movie!", "It was okay."]
# Tokenize and pad
encoded_input = tokenizer(sentences, padding=True, return_tensors="pt")
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-tokenizers.txt
Created November 17, 2024 23:16
Example of creating a Gist using Python
Tokens: ['i', "'", 've', 'been', 'waiting', 'for', 'a', 'hugging', '##face', 'course', 'my', 'whole', 'life', '.']
Input IDs: tensor([[ 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607,
2026, 2878, 2166, 1012]])
Final Inputs: {'input_ids': [101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Decoded Inputs: [CLS] i've been waiting for a huggingface course my whole life. [SEP]
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-tokenizers.py
Created November 17, 2024 23:16
Example of creating a Gist using Python
import torch
from transformers import AutoTokenizer
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
sequence = "I've been waiting for a HuggingFace course my whole life."
tokens = tokenizer.tokenize(sequence)
print("Tokens:", tokens)
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-automodel-pipelines.txt
Created November 15, 2024 00:12
Example of creating a Gist using Python
BertConfig {
"_attn_implementation_autoset": true,
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-automodel-pipelines.py
Created November 15, 2024 00:12
Example of creating a Gist using Python
from transformers import BertConfig, BertModel
bert_config = BertConfig.from_pretrained("bert-base-cased" , num_hidden_Iayers=10)
bert_model = BertModel(bert_config)
#### Training Code ####
bert_model.save_pretrained("my-bert-model")
bert_model = BertModel.from_pretrained("my-bert-model")
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-automodel-pipelines.txt
Created November 15, 2024 00:12
Example of creating a Gist using Python
Inputs: {'input_ids': tensor([[ 101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172,
2607, 2026, 2878, 2166, 1012, 102],
[ 101, 1045, 5223, 2023, 2061, 2172, 999, 102, 0, 0,
0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])
}
Logits: tensor([[-1.5607, 1.6123],
[ 4.1692, -3.3464]], grad_fn=\<AddmmBackward0>)
Predictions: tensor([[4.0195e-02, 9.5980e-01],
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-automodel-pipelines.py
Created November 15, 2024 00:12
Example of creating a Gist using Python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification#, AutoModel
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# model = AutoModel.from_pretrained(checkpoint)
model_seq_classification = AutoModelForSequenceClassification.from_pretrained(checkpoint)
raw_inputs = [
"I've been waiting for a HuggingFace course my whole life.",
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-automodel-pipelines.txt
Created November 15, 2024 00:12
Example of creating a Gist using Python
graph TD
Model-->Task-A-large-dataset
Task-A-large-dataset-.->Model
Model-->Task-B-small-dataset
Task-B-small-dataset-->Finetuned-Model
@freedomtowin
freedomtowin / intro-zero_hugging-face-course-automodel-pipelines.txt
Created November 15, 2024 00:11
Example of creating a Gist using Python
[{'translation_text': 'This course is produced by Hugging Face.'}]