Skip to content

Instantly share code, notes, and snippets.

@NicMcPhee
Last active September 4, 2020 17:22
Show Gist options
  • Save NicMcPhee/5302e3c70f84ef0480e8f079a2340c48 to your computer and use it in GitHub Desktop.
Save NicMcPhee/5302e3c70f84ef0480e8f079a2340c48 to your computer and use it in GitHub Desktop.
Using pipes to eliminate temporary files

Using pipes to eliminate temporary files

This illustrates using pipes to eliminate temporary files. We start with a bash script that takes some demographic data (see MOCK_DATA.csv) specified as a file name as a command line argument. The script then outputs a count of how many people come from different states. The output on the included data file is:

  49 MN
  21 IA
  20 WI
   7 ND
   3 SD

The first version generates a lot of temporary text files (one for each step); the second does the same thing but uses pipes (|) to turn the output of each command into the input of the next command. This avoids creating a ton of extra temporary files, each of which we have to name, and naming is hard. We should also delete each of the temporary files when we're done so we don't clutter up the world. Thus not having them is Very Nice.

Note that we need to escape the newline at the end of each line (including comments) by putting a backslash (\) at the end of the line. (The backslash needs to be the very last character.) This causes bash to "ignore" the newline and see this as one (very long) line. We could actually make it one lone long, but breaking it up like this is a lot more readable.

The mock data was generated using Mockaroo.

See this StackOverflow answer from Jonathan Leffler for more on the advantages of pipes vs temporary files.

#!/usr/bin/env bash
# Takes some demographic data (see MOCK_DATA.csv below)
# specified as a file name as a command line argument.
# Outputs a count of how many people come from different
# states. The output on the data file below is:
#
# 49 MN
# 21 IA
# 20 WI
# 7 ND
# 3 SD
data_file="$1"
# Get rid of the header line
tail -n +2 "$data_file" > no_header.txt
# Extract just the state column
# The `-F ','` tells `awk` to use `,` as the field separator.
# That's necessary here because the fields are separated by
# commas and not spaces, which is `awk`'s default field
# separator.
awk -F ',' '{ print $5 }' no_header.txt > just_states.txt
# Sort the states so I can use `uniq` to count
sort just_states.txt > sorted_states.txt
# Now count with uniq
uniq -c sorted_states.txt > state_counts.txt
# Sort by occurrances. `-n` tells `sort` to sort numerically
# (instead of alphabetically), and `-r` tells it to reverse
# the order so the biggest values end up at the top.
# This sends the output to standard output.
sort -nr state_counts.txt
#!/usr/bin/env bash
# Takes some demographic data (see MOCK_DATA.csv below)
# specified as a file name as a command line argument.
# Outputs a count of how many people come from different
# states. The output on the data file below is:
#
# 49 MN
# 21 IA
# 20 WI
# 7 ND
# 3 SD
# This is the same as before, but we use pipes (`|`) to turn the
# output of one command into the input of the next command. This
# avoids creating a ton of extra temporary files, each of which we
# have to name, and naming is hard. We should also delete each of the
# temporary files when we're done so we don't clutter up the
# world. Thus not having them is nice.
# We need to escape the newline at the end of each line (all the
# backslashes `\`) so the shell sees this as one (very long) line.
# We could actually make it one lone long, but this is a _lot_ more
# readable.
data_file="$1"
# Get rid of the header line
tail -n +2 "$data_file" | \
# Extract just the state column \
# The `-F ','` tells `awk` to use `,` as the field separator. \
# That's necessary here because the fields are separated by \
# commas and not spaces, which is `awk`'s default field \
# separator. \
awk -F ',' '{ print $5 }' | \
# Sort the states so I can use `uniq` to count \
sort | \
# Now count with uniq \
uniq -c | \
# Sort by occurrances. `-n` tells `sort` to sort numerically \
# (instead of alphabetically), and `-r` tells it to reverse \
# the order so the biggest values end up at the top. \
# This sends the output to standard output. \
sort -nr
id first_name last_name email State ZIP
1 Nilson Kurt [email protected] WI 53790
2 Gregory Lethby [email protected] MN 55805
3 Wendy Domanek [email protected] MN 55458
4 Emmet Peracco [email protected] IA 52410
5 Elizabet O'Heaney [email protected] WI 53205
6 Randolf Ullyott [email protected] WI 54305
7 Costanza Orred [email protected] MN 55446
8 Garry Ousby [email protected] IA 50310
9 Gery Kirrens [email protected] MN 55551
10 Marquita Gingle [email protected] MN 55428
11 Erinn Zanotti [email protected] MN 55172
12 Garret Kimbley [email protected] IA 52410
13 Jacki Aizkovitch [email protected] WI 53705
14 Rozanna Lohden [email protected] MN 55146
15 Lib Hellier [email protected] WI 53405
16 Hiram Trimme [email protected] MN 55487
17 Lilllie Handsheart [email protected] MN 55805
18 Tobiah Holsey [email protected] MN 55585
19 Padraig Acey [email protected] MN 55114
20 Dulcy Ellaway [email protected] MN 55407
21 Karena Costigan [email protected] MN 55428
22 Patti MacAllen [email protected] MN 56372
23 Sibilla Benion [email protected] IA 50315
24 Allister Player [email protected] WI 53726
25 Cross Shanks [email protected] WI 53234
26 Cassy Orris [email protected] ND 58207
27 Concordia Paolini [email protected] MN 55458
28 Madlin Sansome [email protected] IA 50981
29 Jarrett Raddon [email protected] MN 55428
30 Georgette Thorpe [email protected] MN 55407
31 Darlene Dowbekin [email protected] WI 53225
32 Reggie Roches [email protected] ND 58207
33 Leigh Duthy [email protected] MN 55428
34 Kirsteni Querrard [email protected] MN 55166
35 Kalli Whooley [email protected] WI 54305
36 Dag Cheshire [email protected] SD 57188
37 Ferdinand Sier [email protected] MN 55172
38 Vanya Bim [email protected] MN 55551
39 Blondy Hitchens [email protected] MN 55590
40 Alis Websdale [email protected] IA 50315
41 Leeland Windridge [email protected] ND 58106
42 Dionysus Maude [email protected] IA 52809
43 Hyacintha Helm [email protected] IA 51110
44 Brenda Rounsefull [email protected] MN 55487
45 Greer Pohlke [email protected] MN 55572
46 Craig Bottrill [email protected] MN 55585
47 Zacharie Dellow [email protected] IA 50320
48 Maryanne Broker [email protected] ND 58207
49 Elvis O'Monahan [email protected] MN 55108
50 Felecia Butterworth [email protected] WI 53790
51 Brook Hulk [email protected] MN 55441
52 Merry O'Caherny [email protected] SD 57198
53 Laryssa Sleit [email protected] MN 55103
54 Therine Croydon [email protected] MN 55407
55 Ainslie Bowne [email protected] ND 58207
56 Tammie Leavry [email protected] MN 55146
57 Filmore Cuerda [email protected] MN 55585
58 Ronnie Truitt [email protected] WI 53215
59 Cyndie Patty [email protected] IA 51105
60 Lorelei Handslip [email protected] SD 57193
61 Ingra Belasco [email protected] IA 52410
62 Page Stockell [email protected] WI 53263
63 Dareen Strevens [email protected] IA 51105
64 Hurleigh Kynforth [email protected] MN 55441
65 Moira Moggach [email protected] MN 55108
66 Elroy Bowerman [email protected] WI 54313
67 Gayler Vsanelli [email protected] ND 58505
68 Valli Hellyar [email protected] WI 53215
69 Etan Claris [email protected] MN 55436
70 Sam Laker [email protected] IA 50981
71 Meier Oliveira [email protected] MN 55480
72 Rosalie Fahy [email protected] MN 55407
73 Binni Veasey [email protected] MN 55407
74 Chaddy Aronin [email protected] IA 50315
75 Nelson Burd [email protected] WI 53785
76 Hilda Licence [email protected] WI 53220
77 Ennis Dearnaley [email protected] IA 50369
78 Claribel Leads [email protected] MN 55127
79 Kiley De Mico [email protected] MN 55564
80 Emogene Long [email protected] IA 50330
81 Bellanca Ritch [email protected] IA 51105
82 Lana Studdeard [email protected] ND 58505
83 Cynthie Dowdell [email protected] IA 50706
84 Willy Noore [email protected] IA 52245
85 Luci Barrasse [email protected] MN 55446
86 Viv Sowood [email protected] WI 54313
87 Horacio Pilmoor [email protected] WI 53210
88 Nikita Madine [email protected] WI 53790
89 Kally Klees [email protected] MN 55417
90 Taddeusz Christou [email protected] MN 55123
91 Margarita Lafont [email protected] MN 56398
92 Sebastiano Sarginson [email protected] MN 55598
93 Salome Howitt [email protected] MN 55565
94 Hilda Gethins [email protected] MN 55441
95 Wayland Ilett [email protected] IA 50369
96 Brook Nashe [email protected] IA 50369
97 Corrie Bleackley [email protected] WI 53705
98 Cheslie O' Markey [email protected] MN 55811
99 Valeria Heeron [email protected] MN 55565
100 Rosalind Gordon-Giles [email protected] MN 55402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment