Skip to content

Instantly share code, notes, and snippets.

@averagesecurityguy
Last active February 16, 2016 15:56
Show Gist options
  • Save averagesecurityguy/6d20b9129c9a903f6070 to your computer and use it in GitHub Desktop.
Save averagesecurityguy/6d20b9129c9a903f6070 to your computer and use it in GitHub Desktop.
Strip TLDs from Large Domain List

You can use the above script with parallel to speed up the process. You need to do some prep work first.

  1. Split the large file into 100 smaller files: split -n 100 domains.txt domains_
  2. Make a list of the smaller files and save it: ls -l domains_* | awk '{ print $9 }' > dom_files.txt
  3. Run the script with parallel: parallel -a dom_files.txt -j 10 ./strip.py
  4. Cat all of the domain_*_strip.txt files together: cat *_strip.txt > domains_stripped.txt
#!/usr/bin/env python3
import sys
tlds = ['.{0}'.format(t.rstrip()) for t in open('tlds.txt')]
ifname = sys.argv[1]
ofname = '{0}_strip.txt'.format(ifname)
def process_line(l):
for t in tlds:
if l.endswith(t):
return l[:-len(t)]
of = open(ofname, 'w')
with open(ifname) as f:
for line in f:
s = process_line(line.rstrip())
of.write('{0}\n'.format(s))
of.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment