This is my best try at transcribing of the lightning talk I gave at RUG::B on April 2016. Due to poor time management (LOL) the delivery was rushed and some examples were skipped, I hope having them posted here makes them more useful.
xargs
is a small but very useful program that is installed in most if not all of your computers¹. Many of you probably know it. Those who don't will learn something really useful, but those who do will learn a couple of cool tricks, too.
You might have heard about the Unix philosophy:
- Small programs that do one thing and do it well
- Compose them with pipes
I don't need to build a scrolling UI into all my programs, I can combine them with less
:
ls -l | less
I don't need to build a search feature into all my programs, I can combine them with grep
:
ls -l | grep ^-rw
This is a beautiful concept, so beautiful that it's not true, or at least it's not complete. There's some other input to Unix programs that is missing in that picture: arguments. Many Unix programs don't get their input from the standard input but from arguments. File utilities such as cp
, mv
or rm
are the most typical example, but far from the only one. Think about git
, curl
, and many others. And that's what xargs
is for.
On its most basic form, xargs transforms some input (its standard input) into an arguments list and calls a program with it:
grep -l xargs *.md | xargs tar czf xargs.tar.gz
This example uses grep
to get the list of markdown files that mention xargs
and pipes it into xargs
, which will call tar
with those filenames as arguments, creating an archive of my articles about xargs
.
Another example that I use everyday (wrapped in an alias):
git branch --merged | grep -v '^*' | grep -v master | xargs git branch -d
We get a list of the existing branches that are already merged, we remove the current one (which is prepended with *
by git branch
) and master, and then use xargs
to call git branch -d
(delete branch) on them.
This kind of thing is already very useful and what most people use xargs
everyday for.
But maybe contradicting the aforementioned Unix philosophy, xargs
has a lot of options that slightly change its behavior. Let's learn some neat tricks!
xargs
runs the command once with all the arguments. Sometimes that's irrelevant (e.g. when deleting files) and sometimes it's what we want (e.g. creating the tar file before), but sometimes it's not.
xargs
supports the -n
option, to set a maximum number of arguments to give to the executed command. xargs
will run it the necessary number of times. We can, for example, go back to our previous tar example, and change it so that we create several archives of, say, 10 files each:
grep -l xargs *.md | xargs -n 10 tar czf xargs-$RANDOM.tar.gz
By default, xargs
adds the arguments at the end of the command, which usually makes sense, but we might want something different. The -I
option allows as to set a placeholder which we can use to construct the command. For example, for moving or copying files²:
grep -l xargs *.md | xargs -I FILE mv FILE target/
Or to make HTTP requests!
cat ids.txt | xargs -I ID curl -X POST -d '{"id":ID}' http://localhost:4567/data
You didn't expect #roflscale from a Tool From The Past™? You expected wrong! This last example could be slow if the list is long. -P
to the rescue!
cat ids.txt | xargs -P 20 -I ID curl -X POST -d '{"id":ID}' http://localhost:4567/data
This option will make xargs
process arguments in parallel, having a maximum of 20 curl
processes running at the same time. We have basically implemented a thread pool with back pressure³ with a shell oneliner!
So some reminders:
- Pipes are great
- Pipes are parallel by design
- Pipes implement back pressure
- With
xargs
we can turn into pipes things that in principle aren't pipes
On that note, as a reading exercise, I'll leave you this article that covers xargs
and some additional topics, explaining how to do “““big data””” with shell tools. Really entertaining and interesting!
For more, run man xargs
and read this awesome manual. Thanks for listening reading!
My name is Sergio and you can find me around in twitter, github or my website.
¹ A short investigation led me to find out that it first appeared in PWB/UNIX in 1977. It's eversince present in most/all Unix systems. Today there are two main versions, GNU xargs (present in Linux) and BSD xargs (present in *BSD and Mac). There are subtle differences that you might want to check out, and in any case you can install both versions in any system (e.g. brew install gxargs
installs GNU xargs on a Mac).
² -I
implies -n 1
so in this example files will be moved one by one (mostly irrelevant); There is a similar option (-J
) that doesn't do that.
³ Back pressure in this example means that the file will only be read from disk as fast as the 20 curl
processes can consume its lines, keeping the memory usage low.
Always remember to use
-0
flag if there are (gasp) spaces between the filenames.