On 17 Jul 2014, at 11:41, Matthew Westcott wrote:
So, as a compromise, how about we keep the packages in there, but use a '>=' rule so that they pick up the latest version? The downside is that new versions can be released at any time, and we can no longer guarantee that local and live installations of a site are running the same version - which is bad news if the packages introduce breaking changes (hellooo elasticsearch!). But then again, that already happens to some extent (e.g. packages that define their own dependencies using >=) and it doesn't seem to have been a problem up to now.
Having pondered the '>=' versus '==' conundrum some more, I'm starting to think that the "best practice" for requirements files as codified by the Two Scoops book has some room for improvement. The root of the problem is that the requirements file has two conflicting purposes:
- a description of the range of package versions that our app is able to work with, presented as a list of constraints which pip has to solve: e.g. Django >=1.6 but less than 1.7; psycopg2 2.5.2 or later; elasticsearch 0.4 exactly
- a list of specific package versions that satisfies our constraints, and has been tested and verified to work, and can be confidently deployed to a production server.
- is exactly what 'pip freeze' is designed to do. But the output of 'pip freeze' is useless for purpose 1, because it gives us no way to distinguish between packages that we actually directly care about, and ones which are a dependency of a dependency of a dependency of a library that we ditched three releases ago. (What the hell is kombu, and why do we need version 3.0.13 of it? What do we do if some other library wants to use kombu version 4.0? Can we really ditch lxml now, or is some other library depending on it for something obscure?)
It seems to me that we really ought to have two files, both committed to git, named something like:
requirements.txt
: A human-edited list of the packages our app directly depends on, giving as broad version ranges as possible (i.e. mostly using '>=' rules, or possibly not specifying a version at all so that it picks up the latest one)installed_packages.txt
: a known good configuration, as produced by pip freeze (which will specify exact versions using '==')
Live deployments and Vagrant box provisioning would run 'pip install -r installed_packages.txt'. Whenever you add, change or remove requirements - or if you just want to give your packages a version bump because they've not been updated for a while - the procedure would be:
- Edit requirements.txt with your new requirements
- pip install --upgrade -r requirements.txt
- run unit tests and check that nothing has broken. If it has - i.e. we have an incompatible version of some package - edit requirements.txt to exclude that version and repeat
- pip freeze > installed_packages.txt
- commit both files to git
- tell your fellow developers that they'll need to run 'pip install -r installed_packages.txt' when they next pull
(NB This process doesn't allow for any "garbage collection" - removing packages, either directly or due to changes in dependencies, will not remove them from installed_packages.txt. To do that, you'd probably have to delete and recreate your virtualenv before running these steps.)
So what about dev.txt and production.txt? Well, following this logic, those are evil and wrong and should be avoided. Every difference between local and live environments is a chance for unforeseen version mismatches to creep in - who knows if django-debug-toolbar or gunicorn have a really weird dependency that brings in a different version of a library to the one you're expecting? I can't think of a legitimate reason to keep dev-only and production-only packages separate - there's no harm in just sticking them in the master list to be installed everywhere, even if they don't get used.
Does this sound like a plan? Should we adopt this practice for Wagtail? Should I write it up as a blog post and see how much it gets shot down by the Django community?
Matthew