AWS Lambda. Azure Functions. Firebase Functions. Cloud Run. Serverless. Docker. Kubernetes. Hakuna Matata.
We're swimming in a world of great hosted options for scheduling and automating our software programs.
But crontab
.
Crontab is the answer.
Crontab is a program on just about every Linux distribution used to drive the program cron
. There's about a thousand versions of the system scheduler out there, through the most common one is Vixie Cron. The history of crontab is vast and ancient, but we don't care about any of that. All that matters is crontab just works. And is beautiful.
Actual footage of a developer trying to get a Lambda handler working with dependencies. |
Crontab dates back to before you were a sparkle in your father's eye. It's of an age with vi
and sh
. Like these *nix programs, crontab is still around and going strong, when there are so many "better" options out there.
And really, a lot of options are better. Because crontab is raw, fast, and dirty. Out of the box it uses a subshell different from the /bin/bash
goodness you're used to, and because of that, all the PATH definitions and aliases we take for granted suddenly don't matter. Plus it runs on a server, and who likes dealing with those?
The unbeatable sales pitch for using crontab, however, is:
Crontab schedules and defines what you want to run in one line.
Log. In. Server.
Look at some basic usage for this heckin' thing. (Language, now!)
man crontab
will give you the full documentation but reading is boring.
Looks like crontab -e
will open a pager where I can read the crontab file and modify it.
crontab -e
This is what I see:
The default editor in this case is vim
. If you do not know how to use Vim, then you have lived a sheltered existence up to this point. Haha! 😒 Seriously though, Vim is for sadists but is a very useful tool to know. The i
command will let you edit the file and use arrows to move around the cursor. :wq
will let you "write and quit" the vim
program and get back to the Bash shell you are used to.
The first thing to do when making a crontab file is to set up where the program will email output to. Since we want to not get emailed (and don't want to bother with an email server)... and want to keep local logs, we are going to set that variable to an empty ''
.
Next is to explicitly state which shell to use. We are going to use the "normal" one: /bin/bash
(Bourne Again SHell). If you do not do this, then you will likely need anger management soon. crontab
uses the ancient shell, sh
. All of your programs will not run as you expect from the command line if left at its default.
Ensure this is defined before the cron
commands:
MAILTO=""
SHELL=/bin/bash
*/1 * * * * echo "Hi, look at the date: $(date)" >> readthis.txt
Put that in your cron file, save and let it run. Every minute this will append to the file readthis.txt
in the HOME
directory.
Here is a great reference for translating when and how often you want your program to run to cron-speak: ⏩ crontab.guru
Now, what about running real programs? The ones we're doing this all for so we can pay the bills?
The trick is the library dependencies all modern programs have. Run pip list
on your laptop and see the libaries your Python program relies on. Or ls node_modules
in your JavaScript program's root, and you'll know what I'm talking about.
In this server example here, we run our code inside a virtual environment defined by the venv
directory in our root. Problems are:
python3
assumes it's being run in/bin/bash
- without access to the virtual environment Python won't look in the right PYTHONPATH for the dependencies
- the subshell's
HOME
directory is not the same directory where the Python program's entry points are - the too-familiar "we" pronoun I keep using to refer to the code
It makes for a lot more complicated line than what "we" wrote about to append the date to a file every minute... but by using explicit and absolute file path references, crontab
will follow our commands and execute the right files.
Check this out:
# run this program every Saturday and Sunday at 5am on the server's time
0 */5 * * 6-7 . $HOME/rug/venv/bin/activate && cd /home/ubuntu/rug/criteo && python3 client.py >> $HOME/logs/criteo.log 2>&1
Let's break it down to its different parts.
0 */5 * * 6-7
The first part of the line of the cron command. Schedules when to run everything to the right of this on the same line. In this case, "run at 5am every Saturday and Sunday". I know, duh! Obviously!
. $HOME/rug/venv/bin/activate
This command sources, or runs, the activate
script located at $HOME/rug/venv/bin/
. This is Python-specific, and runs a script in the virtualenv
module to activate a virtual environment that gives us access to all the dependencies our code needs. Note that variable expansion with $HOME
and even .
aliasing source
would not have worked had we not set the SHELL
variable in sh
to /bin/bash
first in the script.
cd /home/ubuntu/rug/criteo && python3 client.py
Navigate to the folder where the program I want to run is, and run it. Again, python3
would not have worked without setting SHELL
to Bash. Otherwise, would have had to run the program with /home/ubuntu/rug/venv/bin/python3 client.py
, which is the same thing without PYTHONPATH set.
>> $HOME/logs/criteo.log 2>&1
This uses something called redirection (twice). What it's doing is taking all the output from the program, both standard error and standard output, and appending it (not overwriting) a file called criteo.log
located in a folder called logs
in the HOME
directory. $HOME
= /home/ubuntu
in this case. This is totally optional but useful for debugging.
Save the crontab file and exit the pager, and Ubuntu (our Linux distribution on this server) will set the crontab up, and it should work!
Now the trusty server running 24/7 in the datacenter will be running your programs on your schedule. Much more reliable than running programs off your own machine. And for production in business use cases, crontab is really minimum table stakes.
P.S. What is nice about a crontab file, is you can run multiple scheduled jobs from the same file. Remember, just one line per job! Must faster setup than the "right" way of doing things, with Dockerfile or Lambda and Lambda Layers with Cloudwatch. Not that those aren't something to aspire to.
Just ... I find most of my programs still run with crontab
.
🔚