Skip to content

Instantly share code, notes, and snippets.

@ericleasemorgan
Last active August 26, 2024 15:24
Show Gist options
  • Save ericleasemorgan/1a7722b21128d96a28762191690848bd to your computer and use it in GitHub Desktop.
Save ericleasemorgan/1a7722b21128d96a28762191690848bd to your computer and use it in GitHub Desktop.
some one-liners to extract urls, email address, and a dictionary from a text file
# extract all urls from a text file
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sort | uniq -c | sort -bnr
# extraxt domains from URL's found in text files
cat file.txt | egrep -o 'https?://[^ ]+' | sed -e 's/https/http/g' | sed -e 's/\W+$//g' | sed -e 's/http:\/\///g' | sed -e 's/\/.*$//g' | sort | uniq -c | sort -bnr
# extract email addresses
cat file.txt | grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' | sort | uniq -c | sort -bnr
# list all words in a text file
cat file.txt | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment