Skip to content

Instantly share code, notes, and snippets.

@medvedev
Last active January 6, 2025 13:49
Show Gist options
  • Save medvedev/d15a743e7294ddfc0a542dd85bf80292 to your computer and use it in GitHub Desktop.
Save medvedev/d15a743e7294ddfc0a542dd85bf80292 to your computer and use it in GitHub Desktop.
Count words in Zelensky's speeches
from datasets import load_dataset
import pandas as pd
REPO_ID = 'slava-medvedev/zelensky-speeches'
dataset = load_dataset(REPO_ID, split="train", cache_dir="./cache")
df = dataset.to_pandas()
df = df[df['lang'] == 'uk']
df['місяць'] = pd.to_datetime(df['date'], unit='s').dt.strftime('%y-%m')
texts_str = df['full_text'].str
df['незламно'] = texts_str.count('незламн')
df['потужно'] = texts_str.count('потужн')
result = df.groupby('місяць')[['незламно', 'потужно']].sum().reset_index()
result.to_csv('output.csv', index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment