A dummy example for testing
cat DATA.tsv
ID head1 head2 head3 head4
1 25.5 1364.0 22.5 13.2
2 10.1 215.56 1.15 22.2
cat LIST.TXT
ID
head1
head4I need to extract column ID, head1 and head4 from DATA.tsv.
## the column number to be extracted
head -1 DATA.tsv | tr "\t" "\n" | grep -nf LIST.TXT | sed 's/:.*$//'
1
2
5
### save to a variable and format it to 1,2,5 for cut command
cols=$(head -1 DATA.tsv | tr "\t" "\n" | grep -nf LIST.TXT | sed 's/:.*$//' | tr "\n" "," | sed 's/,$//')
## cut out
cut -f "${cols}" DATA.tsv
ID head1 head4
1 25.5 13.2
2 10.1 22.2benchmarking for my 26G file:
time cut -f "${cols}" myfile.tsv > mysubset.txt
real 32m10.947s
user 31m42.511s
sys 0m26.686s
## memory usage very low!
top -M
top - 17:03:17 up 86 days, 4:43, 56 users, load average: 13.99, 13.72, 13.05
Tasks: 754 total, 2 running, 742 sleeping, 5 stopped, 5 zombie
Cpu(s): 13.8%us, 5.2%sy, 0.0%ni, 80.3%id, 0.0%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 31.354G total, 6535.461M used, 24.971G free, 274.668M buffers
Swap: 32.000G total, 2132.094M used, 29.918G free, 1367.434M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18042 mtang1 20 0 102m 4808 604 R 100.0 0.0 5:41.71 cut