This assumes we're starting with a CSV file containing the MSISDNs and other fields. If you have a file that contains only MSISDNs, with one MSISDN per line, skip to Step 4.
In a terminal type:
head myfile.csv
This shows the first ten lines of the file.
Note which column contains the MSISDN (for the rest of this we'll assume it's column 3).
Type:
cut -d , -f 3 myfile.csv > msisdns.txt
The -d ,
specifies that fields are separated by commas. The -f 3
selects column 3 (if your MSISDNs are in a different number column, use that instead).
Check that the result looks good by typing:
head msisdns.txt
The CSV file likely has a header row with column names. Remove that using:
tail -n +2 msisdns.txt > msisdns-nohdr.txt
Type:
sort -u msisdns-nohdr.txt > msisdns-unique.txt
You now have a file of unique MSISDNs.
Type:
wc msisdns-unique.txt
Which outputs three numbers -- the number of lines (i.e. unique MSISDNs), the number of words (should be the same) and the number of characters (which we can ignore).
Given two files of unique MSISDNs, do the following:
cat msisdns-unique-1.txt msisdns-unique-2.txt | sort -u > msisdns-unique-combined.txt
That will concatenate the two files into one, sort the result keeping only unique entries, and output the result to msisdns-unique-combined.txt
.