Skip to content

Instantly share code, notes, and snippets.

@jennifersmith
Last active December 22, 2015 10:58
Show Gist options
  • Save jennifersmith/6462195 to your computer and use it in GitHub Desktop.
Save jennifersmith/6462195 to your computer and use it in GitHub Desktop.
Bootstrapping scripts

I am writing some bootstrapping scripts to manage downloading a bunch of data from s3, creating a postgres db, dumping the data into it. Dependencies are on s3cmd and postgres being available from your local package manager.

The idea behind these scripts is largely to guide someone new to the project in setting up their machine with the right data in the right location. Where they need to get access keys and other credentials (from me by magic secure pixie delivery, not in source control), I want them to know what they need and where to plug these things into the script where possible. It's about getting a smart person up and running quickly, avoiding many manual steps etc. I don't need to go full, all-out puppet - happy to make the script stop with a message "Now go and install foreman".

The fact I am concerned with databases etc. is more to do with the nature of our current project. It's a technique I have used/seen used in the past to set up AWS permissions, local dev servers and the like. I originally saw the idea from a presentation by Github about some of their dev practices which I cannot find now. An example here: https://github.com/github/developer.github.com/blob/master/script/bootstrap

The pain I am currently experiencing is with scripting. These scripts normally start out as shell scripts, but become progressively more complicated as you start to do stuff like check whether files exist, programs are installed, services are running etc. I have lost count of the number of times that I have googled "bash test whether a file exists". This is only the pain spent writing the scripts, maintaining them is another problem. As application developers, we don't spend a whole lot of time writing bash scripts and as a language it's not the easiest to get your head around when it comes to error handling/flow control/environment vars etc. I have lost count of the amount of scripts that badly report errors, fail to exit with the right codes or similarly irritate me.

Has anyone solved this problem in a way other than "learn the language"? I think maybe I am looking for a set of primitive commands I could use for bootstrapping script purposes that let me use the only flow control that makes sense to me in bash land:

package-available --name="postgres,s3cmd"
service-running --name="postgres"  --fail-message="Postgres is not running!"
file-exists /tmp/something-important  --fail-message="You need to have the something-important file - contact your local friendly AWS person"

What would separate these from just a straight up "ps aux | grep bar" or "which foo" is that they would a) fail with nice messages that tell the expert user what has happened and b) be a bit more explanatory in the script oh and c) works across different environments

Of course I am basing this on staying with shell scripting, rather than moving up to ruby or Rake or something. Would prefer to keep to shell scripts which tend to stay pretty neutral - both for humans and for operating systems.

Is this a crazy idea? Does anything exist that solves the problem of writing better bootstrap scripts?

@aterreno
Copy link

aterreno commented Sep 6, 2013

If I understand correctly you want something like capistrano or even better mina http://nadarei.co/mina/ to setup your box? or puppet/chef?

@sw1nn
Copy link

sw1nn commented Sep 6, 2013

My Advice is to use something like Vagrant to manage VM instances and provision those instances using Chef or Puppet (Vagrant supports this). I know you say you don't need the full puppet, but you do :-), telling the user to 'now go and install foreman' isn't enough.

e.g.

  • Should I put it in the default install location?
  • Which Options should I choose?
  • But that conflicts (in a non-obvious way) with tool X that I also need installed for project Y
  • That doesn't start on my machine.

It's a real pain to maintain these sort of scripts and if people aren't using them every day they bit rot and become useless.

The only realistic way to get something that works everytime is to actively use it. There is definitely some pain to moving to this, but it's worth it in the end. The Continuous Delivery people say something along the lines 'If something is hard/painful, change your workflow to make it happen more often, not less, then you'll learn about it better'

@jbrechtel
Copy link

Either Vagrant+Chef/Puppet or Ruby.

Skip bash....really. Knowing the language won't make the language any better for you or anyone else.

Ruby is everywhere.

Watch as I transform anecdote to data:....since you know Ruby better than Bash the other devs on your project are more likely to know Ruby than Bash. The reasoning might not be sound but the conclusion is.... :)

Also, in the context of your dev environment Ruby is just as portable as bash.

I don't think the reason to stick with shell scripting over Ruby holds water. If you accept this then the answer to your question is obvious. Hack together some Rake tasks and let your instincts tell you when your project outgrows your hacked together Rake tasks and needs Vagrant+Chef/Puppet.

@martinfowler
Copy link

Many years (> 20) ago I did sysadmin work and got very familiar with shell scripts (bourne and csh, that was pre bash/zsh). I concluded that that writing complex shell scripts was a bad idea and moved to scripting languages (first perl, then python, then Ruby). Now I use Ruby for anything more than half-a-dozen lines of a shell script. I think Python is just as good as Ruby, but I'm more familiar with Ruby.

So my suggestion would be don't try to learn shell scripting (unless you have to maintain something done by someone else). Instead pick a good scripting language (ruby or python) and use that.

@jennifersmith
Copy link
Author

Thanks all!

I would say that 99% of the scripts that I see used for getting things started are shell scripts - I guess that I perpetuate that myself :)

Vagrant and fobbing stuff of onto VMs is a great approach but I still have the gap between "precise64" and my working dev environment - the vagrant config gets me so far but I still have to do provisioning and setup tasks. Actually, as I do more work with AWS and maintaining various keys in various env vars, having an isolated VM is probably the only thing stopping me from uploading my personal website onto a client's S3 bucket :).

Re puppet: I appreciate the manifest-style of puppet (and chef I guess) but I guess I feel there is a lot of complexity around the fact it supports incremental changes etc. that I don't need. Also, my current setup task involves: configuring s3cmd, downloading a large gzipped file, creating a postgres database + table, dumping in said data. I think puppet/chef will get me half the way there, but I am back to writing my own scripts for the rest of it. I think if you are blowing away your VM with regularity, you don't really need the incremental approach. Obv depends how long setup takes.

Re ruby: Yeah I guess ruby is everywhere these days and scripting is one thing it does well. The problem is that I know there are basic file manipulation tasks in core ruby, but if I am not mistaken calling out to other processes is still a PITA. There are some gems out there that do make things easier but then you just have one more thing to install.

One thing that it is not clear to me how to do in ruby is the whole pipes/lazy io bit - I dig how commands like head/grep/wc etc. work by allowing input to flow between them. For example, I had to split three gz-ed files into 6 parts, which I can express fairly nicely as:

gzcat data/data-*.txt.gz | split -l $chunksize "data/parts/part"

My only way to achieve that in the same way in ruby is to write lots of io handling stuff, or to wrap it in backticks, substituting file paths where necessary. It feels just as icky and unneccessary as creating sql statements in code.

I guess now I have had to think about it more, I want a better way to get from wet-to-set for a developer work station (for now), to have a better way to automate the things we need to do (like downloading data dumps, setting up keys etc.), that is:

  • Fast to get going with
  • Minimal dependencies
  • Easy to understand, extend and change - avoiding being an alien artefact script
  • Provides primitives for dealing with file-based tasks (like the example above)
  • Clear error system allowing the smart user to figure out what line went wrong, what it was trying to do and what the error was

@sragu
Copy link

sragu commented Sep 7, 2013

Heroku uses similar idea as you mentioned (github). They have a structure for scripts and content of the script varies based on application type. https://devcenter.heroku.com/articles/buildpacks

Vagrant with lxc should let you to destroy and create environment easily even if you are going with bootstrap style scripts.

With FPM its easy to build packages even for datasets and have a post install script within the package which does the splitting operation for you. Having the post install script inside package also let you break the bootstrap script into manageable chunks (only having script related to the package). You could chose to use zip instead of native packages if that simplifies things.

Packages with scripts for post install configuration would be the simplest approach. And upstart to keep things running.

Salt Stack is alternative to chef/puppet. Again its not that simple as bash functions.

@scottmuc
Copy link

scottmuc commented Sep 7, 2013

Not sure if this helps, but this is what I'm working on right now to configure my laptop: https://bitbucket.org/scottmuc/workstation/src

@martinfowler
Copy link

When something can be done well in a pipeline, I leave it in a pipeline. I have no problem with using plenty of system or sh commands to use the shell for things it's good for (pipelines) while using the scripting language to handle logic and coordination.

For Ruby, the file handling stuff is in FileUtils. I also find rake's pathmap awesome for file path manipulations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment