The following was adapted from:
- https://www.reddit.com/r/DataHoarder/comments/yy8o9w/for_everyone_using_gallerydl_to_backup_twitter/
- https://github.com/mikf/gallery-dl/blob/master/README.rst
- https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst
First, I installed gallery-dl using one of the methods suggested in its README file. I then created the following configuration file in $HOME/.config/gallery-dl/config.json
based advice from the aforementioned Reddit thread and the project's config docs:
{
"extractor": {
"twitter": {
"text-tweets": true,
"conversations": true,
"expand": true,
"logout": true,
"pinned": true,
"quoted": true,
"replies": true,
"retweets": true,
"postprocessors": [
{ "name": "metadata", "event": "post", "filename": "{tweet_id}_main.json" }
],
"cookies": {
"_twitter_sess": "<REDACTED>",
"ct0": "<REDACTED>",
"lang": "en"
}
}
}
}
Manually copying in cookies from the browser's web inspector tool seemed preferable to installing an extension to dump cookies to a cookies.txt
file. It wasn't clear which cookies were required, but the ones above worked for me. _twitter_sess
definitely sounds relevant and the config docs reference ct0
for generating CSRF tokens.
Note: the extractor.twitter.expand option is potentially very expensive. You may want to disable that option if you find yourself hitting rate limits (e.g. "[twitter][info] Waiting until HH:MM:SS for rate limit reset.").
This backup.sh
script can be used to dump tweets from a few URLs (as suggested in the thread). It takes a username as its first and only parameter.
#!/bin/bash
gallery-dl https://twitter.com/${1}/tweets --write-metadata
gallery-dl https://twitter.com/${1}/media --write-metadata
gallery-dl https://witter.com/${1}/with_replies --write-metadata
gallery-dl https://twitter.com/search?q=from:${1} --write-metadata
This approach worked for me, but the extractor.twitter.timeline.strategy option may be worth reading if you would prefer invoking gallery-dl once on a profile URL (e.g. https://www.twitter.com/USERNAME
).
If you're worried about missing tweets, I found it helpful to run the script a second time and and call du -s ./gallery-dl/twitter/$USERNAME
to check the size of the output directory before and after the second execution. Assuming the account doesn't tweet anything new, the du
result should remain constant between executions.
Additionally, this backup-follows.sh
script can be used to dump followed accounts. This was primarily useful for backing up the list of accounts I follow.
#!/bin/bash
gallery-dl https://twitter.com/${1}/following --dump-json > ${1}_following.json
You can then use a tool like jq
to parse that an extract a list of usernames:
cat <username>_following.json | jq ".[][2].legacy.screen_name"
These are a few issues I came across in the gallery-dl project that seemed relevant: