The following was adapted from:
- https://www.reddit.com/r/DataHoarder/comments/yy8o9w/for_everyone_using_gallerydl_to_backup_twitter/
- https://github.com/mikf/gallery-dl/blob/master/README.rst
- https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
First, I installed gallery-dl using one of the methods suggested in its README file. I then created the following configuration file in $HOME/.config/gallery-dl/config.json based advice from the aforementioned Reddit thread and the project's config docs:
{
"extractor": {
"twitter": {
"text-tweets": true,
"conversations": true,
"expand": true,
"logout": true,
"pinned": true,
"quoted": true,
"replies": true,
"retweets": true,
"postprocessors": [
{ "name": "metadata", "event": "post", "filename": "{tweet_id}_main.json" }
],
"cookies": {
"_twitter_sess": "<REDACTED>",
"ct0": "<REDACTED>",
"lang": "en"
}
}
}
}Manually copying in cookies from the browser's web inspector tool seemed preferable to installing an extension to dump cookies to a cookies.txt file. It wasn't clear which cookies were required, but the ones above worked for me. _twitter_sess definitely sounds relevant and the config docs reference ct0 for generating CSRF tokens.
Note: the extractor.twitter.expand option is potentially very expensive. You may want to disable that option if you find yourself hitting rate limits (e.g. "[twitter][info] Waiting until HH:MM:SS for rate limit reset.").
This backup.sh script can be used to dump tweets from a few URLs (as suggested in the thread). It takes a username as its first and only parameter.
#!/bin/bash
gallery-dl https://twitter.com/${1}/tweets --write-metadata
gallery-dl https://twitter.com/${1}/media --write-metadata
gallery-dl https://witter.com/${1}/with_replies --write-metadata
gallery-dl https://twitter.com/search?q=from:${1} --write-metadataThis approach worked for me, but the extractor.twitter.timeline.strategy option may be worth reading if you would prefer invoking gallery-dl once on a profile URL (e.g. https://www.twitter.com/USERNAME).
If you're worried about missing tweets, I found it helpful to run the script a second time and and call du -s ./gallery-dl/twitter/$USERNAME to check the size of the output directory before and after the second execution. Assuming the account doesn't tweet anything new, the du result should remain constant between executions.
Additionally, this backup-follows.sh script can be used to dump followed accounts. This was primarily useful for backing up the list of accounts I follow.
#!/bin/bash
gallery-dl https://twitter.com/${1}/following --dump-json > ${1}_following.jsonYou can then use a tool like jq to parse that an extract a list of usernames:
cat <username>_following.json | jq ".[][2].legacy.screen_name"These are a few issues I came across in the gallery-dl project that seemed relevant: