Skip to content

Instantly share code, notes, and snippets.

@NicholasBallard
Last active November 1, 2019 15:02
Show Gist options
  • Save NicholasBallard/9a3bd40bfd95049789408245ecb90c11 to your computer and use it in GitHub Desktop.
Save NicholasBallard/9a3bd40bfd95049789408245ecb90c11 to your computer and use it in GitHub Desktop.

Crontab is Beautiful

Nicholas Ballard


AWS Lambda. Azure Functions. Firebase Functions. Cloud Run. Serverless. Docker. Kubernetes. Hakuna Matata.

We're swimming in a world of great hosted options for scheduling and automating our software programs.

But crontab.

Crontab is the answer.

Crontab is a program on just about every Linux distribution used to drive the program cron. There's about a thousand versions of the system scheduler out there, through the most common one is Vixie Cron. The history of crontab is vast and ancient, but we don't care about any of that. All that matters is crontab just works. And is beautiful.

Actual footage of a developer trying to get a Lambda handler working with dependencies.

Crontab dates back to before you were a sparkle in your father's eye. It's of an age with vi and sh. Like these *nix programs, crontab is still around and going strong, when there are so many "better" options out there.

And really, a lot of options are better. Because crontab is raw, fast, and dirty. Out of the box it uses a subshell different from the /bin/bash goodness you're used to, and because of that, all the PATH definitions and aliases we take for granted suddenly don't matter. Plus it runs on a server, and who likes dealing with those?

The unbeatable sales pitch for using crontab, however, is:

Crontab schedules and defines what you want to run in one line.

The Part You Should Have Skipped Down to in the First Place

Log. In. Server.

Look at some basic usage for this heckin' thing. (Language, now!)

man crontab will give you the full documentation but reading is boring.

Looks like crontab -e will open a pager where I can read the crontab file and modify it.

crontab -e

This is what I see:

The default editor in this case is vim. If you do not know how to use Vim, then you have lived a sheltered existence up to this point. Haha! 😒 Seriously though, Vim is for sadists but is a very useful tool to know. The i command will let you edit the file and use arrows to move around the cursor. :wq will let you "write and quit" the vim program and get back to the Bash shell you are used to.

The first thing to do when making a crontab file is to set up where the program will email output to. Since we want to not get emailed (and don't want to bother with an email server)... and want to keep local logs, we are going to set that variable to an empty ''.

Next is to explicitly state which shell to use. We are going to use the "normal" one: /bin/bash (Bourne Again SHell). If you do not do this, then you will likely need anger management soon. crontab uses the ancient shell, sh. All of your programs will not run as you expect from the command line if left at its default.

Ensure this is defined before the cron commands:

MAILTO=""
SHELL=/bin/bash

Setting a Cron Job

*/1 * * * * echo "Hi, look at the date: $(date)" >> readthis.txt

Put that in your cron file, save and let it run. Every minute this will append to the file readthis.txt in the HOME directory.

Here is a great reference for translating when and how often you want your program to run to cron-speak: ⏩ crontab.guru

Now, what about running real programs? The ones we're doing this all for so we can pay the bills?

The trick is the library dependencies all modern programs have. Run pip list on your laptop and see the libaries your Python program relies on. Or ls node_modules in your JavaScript program's root, and you'll know what I'm talking about.

In this server example here, we run our code inside a virtual environment defined by the venv directory in our root. Problems are:

  1. python3 assumes it's being run in /bin/bash
  2. without access to the virtual environment Python won't look in the right PYTHONPATH for the dependencies
  3. the subshell's HOME directory is not the same directory where the Python program's entry points are
  4. the too-familiar "we" pronoun I keep using to refer to the code

It makes for a lot more complicated line than what "we" wrote about to append the date to a file every minute... but by using explicit and absolute file path references, crontab will follow our commands and execute the right files.

Check this out:

# run this program every Saturday and Sunday at 5am on the server's time
0 */5 * * 6-7 . $HOME/rug/venv/bin/activate && cd /home/ubuntu/rug/criteo && python3 client.py >> $HOME/logs/criteo.log 2>&1

Let's break it down to its different parts.

0 */5 * * 6-7

The first part of the line of the cron command. Schedules when to run everything to the right of this on the same line. In this case, "run at 5am every Saturday and Sunday". I know, duh! Obviously!

. $HOME/rug/venv/bin/activate

This command sources, or runs, the activate script located at $HOME/rug/venv/bin/. This is Python-specific, and runs a script in the virtualenv module to activate a virtual environment that gives us access to all the dependencies our code needs. Note that variable expansion with $HOME and even . aliasing source would not have worked had we not set the SHELL variable in sh to /bin/bash first in the script.

cd /home/ubuntu/rug/criteo && python3 client.py

Navigate to the folder where the program I want to run is, and run it. Again, python3 would not have worked without setting SHELL to Bash. Otherwise, would have had to run the program with /home/ubuntu/rug/venv/bin/python3 client.py, which is the same thing without PYTHONPATH set.

>> $HOME/logs/criteo.log 2>&1

This uses something called redirection (twice). What it's doing is taking all the output from the program, both standard error and standard output, and appending it (not overwriting) a file called criteo.log located in a folder called logs in the HOME directory. $HOME = /home/ubuntu in this case. This is totally optional but useful for debugging.


Save the crontab file and exit the pager, and Ubuntu (our Linux distribution on this server) will set the crontab up, and it should work!

Now the trusty server running 24/7 in the datacenter will be running your programs on your schedule. Much more reliable than running programs off your own machine. And for production in business use cases, crontab is really minimum table stakes.

P.S. What is nice about a crontab file, is you can run multiple scheduled jobs from the same file. Remember, just one line per job! Must faster setup than the "right" way of doing things, with Dockerfile or Lambda and Lambda Layers with Cloudwatch. Not that those aren't something to aspire to.

Just ... I find most of my programs still run with crontab.

🔚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment