Skip to content

Instantly share code, notes, and snippets.

@remexre
Last active June 18, 2018 21:38
Show Gist options
  • Save remexre/55a4867b3dbd31f73904db8c46923e81 to your computer and use it in GitHub Desktop.
Save remexre/55a4867b3dbd31f73904db8c46923e81 to your computer and use it in GitHub Desktop.
HOWTO: GNU Parallel

HOWTO: GNU Parallel

Installation

It's the parallel package on Ubuntu.

Before running parallel, you must run parallel --citation. Yeah...

Simple Example

# Serial
for f in data/*; do ./foo $f; done

# Parallel
parallel ./foo {} ::: data/*
# `data/{}` instead of `{}` since `ls data` will give e.g. `1.txt 2.txt` instead of `data/1.txt data/2.txt`
ls data | parallel ./foo data/{}

# "Pretty" Parallel
parallel --bar ./foo {} ::: data/*

More Complex Example

# Serial
for f in /bin/*; do strings $f > $(basename $f).txt; done

# Parallel
parallel 'strings {} > {/}.txt' ::: /bin/*

TIL dockerd has over 950000 strings in it. RIP, should've stripped it.

Obligatory Rant Section

GNU parallel is basically everything I see wrong with GNU in specific and software in general. Here's some choice quotes from the manpage:

DESCRIPTION
      STOP!

      Read the Reader's guide below if you are new to GNU parallel.

Not a great start...

  Reader's guide
      If you prefer reading a book buy GNU Parallel 2018 at http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
      or download it at: https://doi.org/10.5281/zenodo.1146014

      Otherwise start by watching the intro videos for a quick introduction: http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

      Then look at the EXAMPLEs after the list of OPTIONS (Use LESS=+/EXAMPLE: man parallel). That will give you an idea of what GNU parallel is
      capable of.

      Then spend an hour walking through the tutorial (man parallel_tutorial). Your command line will love you for it.

      Finally you may want to look at the rest of this manual if you have special needs not already covered.

      If you want to know the design decisions behind GNU parallel, try: man parallel_design. This is also a good intro if you intend to change
      GNU parallel.

This is the section in its entirety...

{=perl expression=}

    Replace with calculated perl expression. $_ will contain the same as {}. After evaluating perl expression $_ will be used as the
    value. It is recommended to only change $_ but you have full access to all of GNU parallel's internal functions and data
    structures. A few convenience functions and data structures have been made:

Of course this is a Perl script...

If you use --will-cite in scripts to be run by others you are making it harder for others to see the citation notice. The development of GNU parallel is indirectly financed through citations, so if your users do not know they should cite then you are making it harder to finance development. However, if you pay 10000 EUR, you have done your part to finance future development and should feel free to use --will-cite in scripts.

Uhhhhhhhhhhh

--plus
    Activate additional replacement strings: {+/} {+.} {+..} {+...} {..} {...} {/..} {/...} {##}. The idea being that '{+foo}' matches
    the opposite of '{foo}' and {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} = {...}.{+...} =
    {+/}/{/...}.{+...}

wtf

--profile profilename
-J profilename
    Use profile profilename for options. This is useful if you want to have multiple profiles. You could have one profile for running
    jobs in parallel on the local computer and a different profile for running jobs on remote computers. See the section PROFILE FILES
    for examples.

When you have so many CLI options you need people to save their favorite options in different files

--plain
    Ignore any --profile, $PARALLEL, and ~/.parallel/config to get full control on the command line (used by GNU parallel internally
    when called with --sshlogin).

When you have so many CLI options you need people to save their favorite options in different files, but manage to screw that up too

--quote
-q  Quote command. The command must be a simple command (see man bash) without redirections and without variable assignments. This will
    quote the command line and arguments so special characters are not interpreted by the shell. See the section QUOTING. Most people
    will never need this.  Quoting is disabled by default.

WTF not your job

--regexp
    Use --regexp to interpret --recstart and --recend as regular expressions. This is slow, however.

not if you implement regexes correctly...

--sqlmaster DBURL
    Submit jobs via SQL server. DBURL must point to a table, which will contain the same information as --joblog, the values from the
    input sources (stored in columns V1 .. Vn), and the output (stored in columns Stdout and Stderr).

wtf

You need to install TOR and setup a hidden service. In torrc put:

  HiddenServiceDir /var/lib/tor/hidden_service/
  HiddenServicePort 22 127.0.0.1:22

Then start TOR: /etc/init.d/tor restart

wtf wtf wtf

You can use GNU parallel to start interactive programs like emacs or vi:

  cat filelist | parallel --tty -X emacs
  cat filelist | parallel --tty -X vi

why would anyone ever want this

BUGS
  [...]
  Database with MySQL fails randomly
      The --sql* options may fail randomly with MySQL. This problem does not exist with PostgreSQL.

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment