Skip to content

Instantly share code, notes, and snippets.

@mixu
Created October 11, 2013 21:47
Show Gist options
  • Save mixu/6942542 to your computer and use it in GitHub Desktop.
Save mixu/6942542 to your computer and use it in GitHub Desktop.
extract URLs from a stream (newline-separated)
#!/usr/bin/env node
/*
Installation: npm install linewise
Usage:
cat urls.json | node extract.js > urls.txt
Assuming urls.json contains lines with zero or more urls per line, this will
extract all the urls and produce a file with one url per line.
You may want to tweak the ending, e.g.
[^"}\s\]\)\n\\]+
which assumes that urls end with ", }, whitespace, ], \n or \.
*/
var re_weburl = /(\w+:\/\/[^"}\s\]\)\n\\]+)/gi;
var nls = require('linewise').getPerLineBuffer();
nls.on('data', function(data) {
var matches = data.match(re_weburl);
if(matches) {
console.log(matches.join('\n'));
}
});
process.stdin.pipe(nls);
nls.resume();
process.stdin.resume();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment