Last active
May 23, 2025 06:32
-
-
Save thomwolf/ecc52ea728d29c9724320b38619bd6a6 to your computer and use it in GitHub Desktop.
Download and load persona-chat json dataset
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
from pytorch_pretrained_bert import cached_path | |
url = "https://s3.amazonaws.com/datasets.huggingface.co/personachat/personachat_self_original.json" | |
# Download and load JSON dataset | |
personachat_file = cached_path(url) | |
with open(personachat_file, "r", encoding="utf-8") as f: | |
dataset = json.loads(f.read()) | |
# Tokenize and encode the dataset using our loaded GPT tokenizer | |
def tokenize(obj): | |
if isinstance(obj, str): | |
return tokenizer.convert_tokens_to_ids(tokenizer.tokenize(obj)) | |
if isinstance(obj, dict): | |
return dict((n, tokenize(o)) for n, o in obj.items()) | |
return list(tokenize(o) for o in obj) | |
dataset = tokenize(dataset) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, could you explain the data format like this?
train_self_original.txt file:
1 your persona: i like to remodel homes.
2 your persona: i like to go hunting.
3 your persona: i like to shoot a bow.
4 your persona: my favorite holiday is halloween.
5 hi , how are you doing ? i am getting ready to do some cheetah chasing to stay in shape . \t you must be very fast . hunting is one of my favorite hobbies . \t my mom was single with 3 boys , so we never left the projects .|i try to wear all black every day . it makes me feel comfortable .|well nursing stresses you out so i wish luck with sister|yeah just want to pick up nba nfl getting old|i really like celine dion . what about you ?|no . i live near farms .|i wish i had a daughter , i am a boy mom . they are beautiful boys though still lucky|yeah when i get bored i play gone with the wind my favorite movie .|hi how are you ? i am eating dinner with my hubby and 2 kids .|were you married to your high school sweetheart ? i was .|that is great to hear ! are you a competitive rider ?|hi , i am doing ok . i am a banker . how about you ?|i am 5 years old|hi there . how are you today ?|i totally understand how stressful that can be .|yeah sometimes you do not know what you are actually watching|mother taught me to cook ! we are looking for an exterminator .|i enjoy romantic movie . what is your favorite season ? mine is summer .|editing photos takes a lot of work .|you must be very fast . hunting is one of my favorite hobbies .