Last active
April 30, 2024 01:33
-
-
Save jed/3780465d20665b9b329743b732621679 to your computer and use it in GitHub Desktop.
t.co resolver for twitter archives
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// this script replaces all t.co links in the data/tweets.js file of an unzipped twitter archive with their resolved urls. | |
// it replaces all text inline, so be sure to make a backup of the file before running. | |
// usage: deno run -A resolve_tco.js {path to data/tweets.js} | |
let file = Deno.args[0] | |
let text = await Deno.readTextFile(file) | |
let matches = text.match(/"https:\/\/t\.co\/\w+"/g) | |
let unique = [...new Set(matches)] | |
console.log('%s urls found.', unique.length) | |
if (unique.length) for (let match of matches) { | |
console.log('resolving %s...', match) | |
let url = match.slice(1, -1) | |
let res = await fetch(url, {method: 'HEAD'}) | |
if (!res.ok) throw new Error(`A ${res.code} error occured, please run again.`) | |
console.log('resolved: "%s".', res.url) | |
text = text.replace(match, `"${res.url}"`) | |
await Deno.writeTextFile(file, text) | |
await new Promise(cb => setTimeout(cb, 1000)) | |
} | |
console.log('done.') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for sharing! I quickly ran into some URLs that were down (503, tcp-connect fails) and modded the script to deal with that:
For the failing connect I didn't bother trying to parse the error to retrieve the resolve url. That didn't seem worth the effort.
I'm still wondering why tcp connect and dns errors seems to reject the fetch promise, when the docs claim it will always resolve...