Skip to content

Instantly share code, notes, and snippets.

@michael-simons
Created November 9, 2024 11:31
Show Gist options
  • Save michael-simons/6667018c883738bc22502b2d269e678f to your computer and use it in GitHub Desktop.
Save michael-simons/6667018c883738bc22502b2d269e678f to your computer and use it in GitHub Desktop.
Use duckdb with twitter gdpr archive
sed '1s/.*= //' tweets.js |
duckdb -s "
SELECT strptime(t->'tweet'->>'created_at', '%a %b %d %H:%M:%S %z %Y') AS created_at,
t->'tweet'->>'full_text' AS text,
NULLIF(t->'tweet'->>'in_reply_to_status_id', 'null') IS NOT NULL AS is_reply,
t->'tweet'->>'in_reply_to_screen_name' AS in_reply_to_screen_name,
(t->'tweet'->>'in_reply_to_user_id_str')::bigint AS in_reply_to_user_id_str,
(t->'tweet'->>'in_reply_to_status_id')::bigint AS in_reply_to_status_id
FROM read_json('/dev/stdin') t
WHERE text NOT LIKE 'RT @'
ORDER BY created_at"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment