Skip to content

Instantly share code, notes, and snippets.

@dwallraff
Created September 13, 2016 18:43
Show Gist options
  • Select an option

  • Save dwallraff/f083794ac25423f83f4b5b28b6d1b718 to your computer and use it in GitHub Desktop.

Select an option

Save dwallraff/f083794ac25423f83f4b5b28b6d1b718 to your computer and use it in GitHub Desktop.
Extract various info from files
# Extract e-mails from text files
cat *.txt | grep -E -o "[a-zA-Z0-9.#?$*_-]+@[a-zA-Z0-9.#?$*_-]+.[a-zA-Z0-9.-]+" > e-mails.txt
# Extract HTTP URLs from text files
cat *.txt | grep http | grep -shoP 'http.*?[" >]' > http-urls.txt
# For extracting HTTPS, FTP and other URL format use
cat *.txt | grep -E '(((https|ftp|gopher)|mailto)[.:][^ >" ]*|www.[-a-z0-9.]+)[^ .,; >">):]' > urls.txt
# Note: if grep returns "Binary file (standard input) matches" use the following approaches
cat *.log | tr '[00-1113-37177-377]' '.' | grep -E "Your_Regex"
# OR
cat -v *.log | egrep -o "Your_Regex"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment