githubcom13 · February 19, 2020 02:41
diff --git a/gistfile1.txt b/gistfile1.txt
 # split the file into a file of 50,000 lines.
 split -b 50000 bigfile.txt

 # OR split the file into a 100MB file.
 split -b 100M bigfile.txt

 # extract the email addresses contained in all the files of a directory
 grep -r -E -h -o "[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*" /dir/* > emails.txt

 # remove duplicates and sort order ASC
 sort -fu emails.txt > unique_emails.txt

 UPDATE

 The simplest (but probably not the quickest) way to process all the files would be to do so one by one, using a loop:

    for file in /dir/*; do
      grep -r -E -h -o '\b(pattern)\b' "$file"
    done > outs.txt
 The overhead of launching all those `grep`s is potentially quite significant, though, so maybe you could use `xargs` to help:

    find /dir/ -maxdepth 1 -type f -print0 |
      xargs -0 -n 1000 grep -r -E -h -o '\b(pattern)\b' > outs.txt
 This uses `find` to produce the list of files in `dir` and passes them safely to `xargs`, separated by a null byte `\0` (a character guaranteed not to be in a filename). `xargs` then passes the files to `grep` in batches of 1000.

 (I'm assuming that you have GNU versions of `find` and `xargs` here, for `find -print0` and `xargs -0`)


 EX. 
 find /dir/ -maxdepth 1 -type f -print0 | xargs -0 -n 1 grep -r -E -h -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' > emails.txt
	# split the file into a file of 50,000 lines.
	split -b 50000 bigfile.txt

	# OR split the file into a 100MB file.
	split -b 100M bigfile.txt

	# extract the email addresses contained in all the files of a directory
	grep -r -E -h -o "[[:alnum:]+\.\_\-]@[[:alnum:]+\.\_\-]" /dir/* > emails.txt

	# remove duplicates and sort order ASC
	sort -fu emails.txt > unique_emails.txt

	UPDATE

	The simplest (but probably not the quickest) way to process all the files would be to do so one by one, using a loop:

	for file in /dir/*; do
	grep -r -E -h -o '\b(pattern)\b' "$file"
	done > outs.txt
	The overhead of launching all those `grep`s is potentially quite significant, though, so maybe you could use `xargs` to help:

	find /dir/ -maxdepth 1 -type f -print0 \|
	xargs -0 -n 1000 grep -r -E -h -o '\b(pattern)\b' > outs.txt
	This uses `find` to produce the list of files in `dir` and passes them safely to `xargs`, separated by a null byte `\0` (a character guaranteed not to be in a filename). `xargs` then passes the files to `grep` in batches of 1000.

	(I'm assuming that you have GNU versions of `find` and `xargs` here, for `find -print0` and `xargs -0`)


	EX.
	find /dir/ -maxdepth 1 -type f -print0 \| xargs -0 -n 1 grep -r -E -h -o '[[:alnum:]+\.\_\-]@[[:alnum:]+\.\_\-]' > emails.txt