brian wickman - @wickman
[TOC]
Pants makes the manipulation and distribution of hermetically sealed Python environments painless.
But why another system?
There are several solutions for package management in Python. Almost
everyone is familiar with running sudo easy_install PackageXYZ. This
leaves a lot to be desired. Over time, your Python installation will
collect dozens of packages, become annoyingly slow or even broken, and
reinstalling it will invariably break a number of the applications
that you were using.
A marked improvement over the sudo easy_install model is
virtualenv to isolate Python environments on a
project by project basis. This is useful for development but does not
directly solve any problems related to deployment, whether it be to a production
environment or to your peers. It is also challenging to explain to a
Python non-expert.
A different solution altogether, zc.buildout attempts to provide a framework and recipes for many common development environments. It has arguably gone the farthest for automating environment reproducibility amongst the popular tools, but shares the same complexity problems as all the other abovementioned solutions.
Most solutions leave deployment as an afterthought. Why not make the development and deployment environments the same by taking the environment along with you?
The lingua franca of Pants is the PEX file (PEX itself does not stand for anything in particular, though in spirit you can think of it as a "Python EXecutable".)
PEX files are single-file lightweight virtual Python environments.
The only difference is no virtualenv setup instructions or
pip install foo bar baz. PEX files are self-bootstrapping Python
environments with no strings attached and no side-effects. Just a simple
mechanism that unifies both your development and your deployment.
First it is necessary to install Pants. See installation instructions.
$ git clone git://github.com/twitter/commons
$ cd commons
$ mkdir -p src/python/twitter/my_project
$ vi src/python/twitter/my_project/BUILDsrc/python/twitter/my_project/BUILD:
python_binary(
name = 'hello_world',
source = 'hello_world.py'
)$ vi src/python/twitter/my_project/hello_world.pysrc/python/twitter/my_project/hello_world.py:
print('Hello world!')To run directly:
$ ./pants py src/python/twitter/my_project:hello_world
Build operating on target: PythonBinary(src/python/twitter/my_project/BUILD:hello_world)
Hello world!To build:
$ ./pants src/python/twitter/my_project:hello_world
Build operating on targets: OrderedSet([PythonBinary(src/python/twitter/my_project/BUILD:hello_world)])
Building PythonBinary PythonBinary(src/python/twitter/my_project/BUILD:hello_world):
Wrote /Users/wickman/clients/science-py-csl/dist/hello_world.pexand run separately:
$ dist/hello_world.pex
Hello world!NOTE: The first time you run ./pants will likely take a ridiculous amount
of time as Pants bootstraps itself inside your directory. Note, it never
installs anything in a global site-packages.
Build dependencies in Pants are managed with BUILD files that are
co-located with your source. These files are used to describe the following:
- libraries: bundles of sources and resources, that may or may not also depend on other libraries
- binaries: a single source (the executable) and libraries it depends upon
- requirements: external dependencies as resolved by dependency managers e.g. pypi in Python or ivy on the JVM
The main point of Pants is to take these BUILD files and do something useful with them.
These descriptions are stored in files named BUILD and colocated near the binaries/libraries they describe. Let's take for example the src/python/twitter/tutorial subtree in commons:
$ ls -lR src/python/twitter/tutorial/
total 16
-rw-r--r-- 1 wickman wheel 137 Apr 9 22:59 BUILD
-rw-r--r-- 1 wickman wheel 118 Apr 9 22:59 hello_world.pyLet's take a look at the BUILD file in src/python/twitter/tutorial/BUILD:
python_binary(
name = "hello_world",
source = "hello_world.py",
dependencies = [
pants("src/python/twitter/common/app"),
]
)This BUILD file names one target: hello_world, which is a python_binary target. The hello_world target
contains one source file, hello_world.py and depends upon one other
target, the format of which will be described shortly.
It should be noted that sources are relative to the location of the BUILD
file itself, e.g. hello_world.py inside of src/python/twitter/tutorial/BUILD actually refers to
src/python/twitter/tutorial/hello_world.py:
from twitter.common import app
def main():
print('Hello world!')
app.main()Dependencies, on the other hand, are relative to the source root of the repository which is defined
by the BUILD file that sits next to the pants command:
# Define the repository layout
source_root('src/antlr', doc, page, python_antlr_library)
source_root('src/java', annotation_processor, doc, jvm_binary, java_library, page)
source_root('src/protobuf', doc, java_protobuf_library, page)
source_root('src/python', doc, page, python_binary, python_library)
source_root('src/scala', doc, jvm_binary, page, scala_library)
source_root('src/thrift', doc, java_thrift_library, page, python_thrift_library)
source_root('tests/java', doc, java_library, java_tests, page)
source_root('tests/python', doc, page, python_library, python_tests, python_test_suite)
source_root('tests/scala', doc, page, scala_library, scala_tests)This file can be tailored to map to any source root structure such as Maven
style, Twitter style (as described above) or something more flat such as a
setup.py-based project. This however is an advanced topic that is not
covered in this document.
Within the src/python/twitter/tutorial/BUILD, only one target is defined,
specifically hello_world. This target is addressed by
src/python/twitter/tutorial:hello_world which means the target
hello_world within src/python/twitter/tutorial/BUILD. In general,
targets take the form <path>:<target name> with the special cases:
- in the case of
path/to/directory/BUILD:target, theBUILDcomponent may be elided and insteadpath/to/directory:targetmay be used path/to/directoryis short form forpath/to/directory:directory, sosrc/python/twitter/common/appis short form forsrc/python/twitter/common/app/BUILD:app
src/python/twitter/tutorial/BUILD referenced pants('src/python/twitter/common/app') in its
dependencies. The pants() keyword is akin to a "pointer dereference" for an address. It will point
to whatever target is described at that address, in this case a python_library target:
src/python/twitter/common/app/BUILD:
python_library(
name = "app",
sources = globs('*.py'),
dependencies = [
pants('src/python/twitter/common/dirutil'),
pants('src/python/twitter/common/lang'),
pants('src/python/twitter/common/options'),
pants('src/python/twitter/common/util'),
pants('src/python/twitter/common/app/modules'),
]
)which in turn includes even more dependencies. The job of Pants is to manage the transitive closure of all these dependencies and manipulate collections of these targets for you.
BUILD files themselves are just Python. The only thing magical is that the
statement from twitter.pants import * has been autoinjected. This
provides a number of Python-specific targets such as:
python_librarypython_binarypython_requirementpython_thrift_library
and a whole host of other targets including Java, Scala, Python, Markdown,
the universal pants target and so forth. See
src/python/twitter/pants/__init__.py for a comprehensive list of targets.
A python_library target has a name, zero or more source files, zero or
more resource files, and zero or more dependencies. These dependencies may
include other python_library-like targets (python_library,
python_thrift_library, python_antlr_library and so forth) or
python_requirement targets.
A python_binary target is almost identical to a python_library target except instead of sources, it takes one
of two possible parameters:
source: The source file that should be executed within the "library" otherwise defined bypython_binaryentry_point: The entry point that should be executed within the "library" otherwise defined bypython_binary. Entry points take the format ofpkg_resources.EntryPoint, which is something akin tosome.module.name:my.attrwhich means run the function pointed bymy.attrinside the modulesome.moduleinside the environment. The:my.attrcomponent can be omitted and the module is executed directly (presuming it has a__main__.py.)
A python_requirement target describes an external dependency as understood by easy_install or pip. It takes only
a single non-keyword argument of the Requirement-style string, e.g.
python_requirement('django-celery')
python_requirement('tornado==2.2')
python_requirement('kombu>=2.1.1,<3.0')This will resolve the dependency and its transitive closure, for example django-celery pulls down the following
dependencies: celery>=2.5.1, django-picklefield>=0.2.0, ordereddict, python-dateutil,
kombu>=2.1.1,<3.0, anyjson>=0.3.1, importlib, and amqplib>=1.0.
Pants takes care of handling these dependencies for you. It will never install anything globally. Instead it will
build the dependency and cache it in .pants.d and assemble them a la carte into an execution environment.
A python_thrift_library target takes the same arguments as python_library arguments, except that files described
in sources must be thrift files. If your library or binary depends upon this target type, Python bindings
will be autogenerated and included within your environment.
Now you're ready to build your first PEX file (technically you already have,
by building Pants itself.) By default if you specify ./pants <target>, it
assumes you mean ./pants build <target> and does precisely that:
$ PANTS_VERBOSE=1 ./pants src/python/twitter/tutorial:hello_world
Build operating on targets: OrderedSet([PythonBinary(src/python/twitter/tutorial/BUILD:hello_world)])
Resolver: Calling environment super => 0.046ms
Building PythonBinary PythonBinary(src/python/twitter/tutorial/BUILD:hello_world):
Building PythonBinary PythonBinary(src/python/twitter/tutorial/BUILD:hello_world):
Dumping library: PythonLibrary(src/python/twitter/common/app/BUILD:app) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/dirutil/BUILD:dirutil) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/lang/BUILD:lang) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/options/BUILD:options) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/util/BUILD:util) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/app/modules/BUILD:modules) [relative module: ]
Resolver: Calling environment super => 0.016ms
Dumping binary: twitter/tutorial/hello_world.py
Wrote /private/tmp/wickman-commons/dist/hello_world.pexYou will see that despite specifying just one dependency, the transitive
closure of hello_world pulled in all of src/python/twitter/common/app
and its direct descendants. That's because those library targets depended
upon other library targets, than in turn depending on even more. At the end
of the day, we bundle up the closed set of all dependencies and bundle them
into hello_world.pex.
Since it uses the twitter.common.app framework, we know we can fire it up
and poke around with --help:
$ dist/hello_world.pex --help
Options:
-h, --help, --short-help
show this help message and exit.
--long-help show options from all registered modules, not just the
__main__ module.If we specify --long-help, we can see the help of transitively included
modules, e.g. twitter.common.app itself:
$ dist/hello_world.pex --long-help
Options:
-h, --help, --short-help
show this help message and exit.
--long-help show options from all registered modules, not just the
__main__ module.
From module twitter.common.app:
--app_daemonize Daemonize this application. [default: False]
--app_profile_output=FILENAME
Dump the profiling output to a binary profiling
format. [default: None]
--app_daemon_stderr=TWITTER_COMMON_APP_DAEMON_STDERR
Direct this app's stderr to this file if daemonized.
[default: /dev/null]
--app_debug Print extra debugging information during application
initialization. [default: False]
--app_daemon_stdout=TWITTER_COMMON_APP_DAEMON_STDOUT
Direct this app's stdout to this file if daemonized .
[default: /dev/null]
--app_profiling Run profiler on the code while it runs. Note this can
cause slowdowns. [default: False]
--app_ignore_rc_file
Ignore default arguments from the rc file. [default:
False]
--app_pidfile=TWITTER_COMMON_APP_PIDFILE
The pidfile to use if --app_daemonize is specified.
[default: None]Or we can simply execute it as intended:
$ dist/hello_world.pex
Hello world!We've only discussed so far the "pants build" command. There's also a
dedicated "py" command that allows you to manipulate the environments
described by python_binary and python_library targets, such as drop into
an interpreter with the environment set up for you.
The default behavior of pants py <target> is the following:
- For
python_binarytargets, build the environment and execute the target - For one or more
python_librarytargets, build the environment that is the transitive closure of all targets and drop into an interpreter. - For a combination of
python_binaryandpython_librarytargets, build the transitive closure of all targets and execute the first binary target.
Let's take src/python/twitter/tutorial/BUILD and split out the dependencies from
our hello_world target into hello_world_lib and add dependencies upon
Tornado and psutil.
python_binary(
name = "hello_world",
source = "hello_world.py",
dependencies = [
pants(":hello_world_lib")
]
)
python_library(
name = "hello_world_lib",
dependencies = [
pants("src/python/twitter/common/app"),
python_requirement("tornado"),
python_requirement("psutil"),
]
)This uses the python_requirement target which can refer to any string in pkg_resources.Requirement format as
recognized by tools such as easy_install and pip as described above.
Now that we've created a library-only target src/python/twitter/tutorial:hello_world_lib, let's drop
into it using pants py with verbosity turned on so that we can see what's
going on in the background:
$ PANTS_VERBOSE=1 ./pants py src/python/twitter/tutorial:hello_world_lib
Build operating on target: PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib)
Resolver: Calling environment super => 0.019ms
Building PythonBinary PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib):
Dumping library: PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/app/BUILD:app) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/dirutil/BUILD:dirutil) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/lang/BUILD:lang) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/options/BUILD:options) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/util/BUILD:util) [relative module: ]
Dumping library: PythonLibrary(src/python/twitter/common/app/modules/BUILD:modules) [relative module: ]
Dumping requirement: tornado
Dumping requirement: psutil
Resolver: Calling environment super => 0.029ms
Resolver: Activating cache /private/tmp/wickman-commons/3rdparty/python => 356.432ms
Resolver: Resolved tornado => 357.219ms
Resolver: Activating cache /private/tmp/wickman-commons/.pants.d/.python.install.cache => 41.117ms
Resolver: Fetching psutil => 10144.264ms
Resolver: Building psutil => 1794.474ms
Resolver: Distilling psutil => 224.896ms
Resolver: Constructing distribution psutil => 2.855ms
Resolver: Resolved psutil => 12210.066ms
Dumping distribution: .../tornado-2.2-py2.6.egg
Dumping distribution: .../psutil-0.4.1-py2.6-macosx-10.4-x86_64.egg
Python 2.6.7 (r267:88850, Aug 31 2011, 15:49:05)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>In the background, pants used cached version of tornado but fetched
psutil from pypi and any necessary transitive dependencies (none in this
case) and built a platform-specific version for us.
You can convince yourself that the environment contains all the dependencies
by inspecting sys.path and importing libraries as you desire:
>>> import psutil
>>> help(psutil)
>>> from twitter.common import app
>>> help(app)It should be stressed that dependencies built by Pants are never installed globally. These dependencies only exist for the duration of the Python interpreter forked by Pants.
Let us turn our hello_world.py into a basic top application using tornado:
from twitter.common import app
import psutil
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write('<pre>Running pids:\n%s</pre>' % '\n'.join(map(str, psutil.get_pid_list())))
def main():
application = tornado.web.Application([
(r"/", MainHandler)
])
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
app.main()We have now split our application into two parts: the hello_world binary
target and the hello_world_lib library target. If we run pants py src/python/twitter/tutorial:hello_world_lib, the default behavior is to
drop into an interpreter.
If we run pants py src/python/twitter/tutorial:hello_world, the default behavior is to run
the binary target pointed to by hello_world:
$ ./pants py src/python/twitter/tutorial:hello_worldThen point your browser to localhost:8888
There is also a --pex option to pants py that allows you to build a PEX
file from a union of python_library targets that does not necessarily have a
python_binary target defined for it. Since there is no entry point
specified, the resulting .pex file just behaves like a Python interpreter,
but with the sys.path bootstrapped for you:
$ ./pants py --pex src/python/twitter/tutorial:hello_world_lib
Build operating on target: PythonLibrary(src/python/twitter/tutorial/BUILD:hello_world_lib)
Wrote /private/tmp/wickman-commons/dist/hello_world_lib.pex
$ ls -la dist/hello_world_lib.pex
-rwxr-xr-x 1 wickman wheel 1404174 Apr 10 13:00 dist/hello_world_lib.pexNow if you use dist/hello_world_lib.pex, since it has no entry point, it will drop you into an interpreter:
$ dist/hello_world_lib.pex
Python 2.6.7 (r267:88850, Aug 31 2011, 15:49:05)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import tornadoAs mentioned before, it's like a single-file lightweight alternative to a
virtualenv. We can even use it to run our hello_world.py application:
$ dist/hello_world_lib.pex src/python/twitter/tutorial/hello_world.pyThis can be an incredibly powerful and lightweight way to manage and deploy
virtual environments without using virtualenv.
As mentioned above, PEX files without default entry points behave like Python interpreters that
carry their dependencies with them. For example, let's create a target that
provides a Fabric dependency within src/python/twitter/tutorial/BUILD:
python_library(
name = 'fabric',
dependencies = [
python_requirement('Fabric')
]
)And let's build a fabric PEX file:
$ ./pants py --pex src/python/twitter/tutorial:fabric
Build operating on target: PythonLibrary(src/python/twitter/tutorial/BUILD:fabric)
Wrote /private/tmp/wickman-commons/dist/fabric.pexBy default it does nothing more than drop us into an interpreter:
$ dist/fabric.pex
Python 2.6.7 (r267:88850, Aug 31 2011, 15:49:05)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>But suppose we have a local script that depends upon Fabric, fabric_hello_world.py:
from fabric.api import *
def main():
local('echo hello world')
if __name__ == '__main__':
main()We can now use fabric.pex as if it were a Python interpreter but with
fabric available in its environment. Note that fabric has never been
installed globally in any site-packages anywhere. It is just bundled inside
of fabric.pex:
$ dist/fabric.pex fabric_hello_world.py
[localhost] local: echo hello world
hello worldAn advanced feature of python_binary targets, you may in addition specify
direct entry points into PEX files rather than a source file. For example,
if we wanted to build an a la carte fab wrapper for fabric:
python_binary(name = "fab",
entry_point = "fabric.main:main",
dependencies = [
python_requirement("fabric"),
]
)We build:
$ ./pants src/python/twitter/tutorial:fab
Build operating on targets: OrderedSet([PythonBinary(src/python/twitter/tutorial/BUILD:fab)])
Building PythonBinary PythonBinary(src/python/twitter/tutorial/BUILD:fab):
Wrote /private/tmp/wickman-commons/dist/fab.pexAnd now dist/fab.pex behaves like a standalone fab binary:
$ dist/fab.pex -h
Usage: fab [options] <command>[:arg1,arg2=val2,host=foo,hosts='h1;h2',...] ...
Options:
-h, --help show this help message and exit
-d NAME, --display=NAME
print detailed info about command NAME
-F FORMAT, --list-format=FORMAT
formats --list, choices: short, normal, nested
-l, --list print list of possible commands and exit
--set=KEY=VALUE,... comma separated KEY=VALUE pairs to set Fab env vars
--shortlist alias for -F short --list
-V, --version show program's version number and exit
-a, --no_agent don't use the running SSH agent
-A, --forward-agent forward local agent to remote end
--abort-on-prompts abort instead of prompting (for password, host, etc)
...Pants also has excellent support for JVM-based builds and can do similar things like resolving external JARs and packaging them as standalone environments with default entry points.
Given a PEX file, it is possible to alter its default behavior during invocation.
If you have a PEX file with a prescribed executable source or entry_point specified, it may still
occasionally be useful to drop into an interpreter with the environment bootstrapped. If you
set PEX_INTERPRETER=1 in your environment, the PEX bootstrapper will skip any execution and instead
launch an interactive interpreter session.
If your environment is failing to bootstrap or simply bootstrapping very slowly, it can be useful to
set PEX_VERBOSE=1 in your environment to get debugging output printed to the console. Debugging output
includes:
- Fetched dependencies
- Built dependencies
- Activated dependencies
- Packages scrubbed out of
sys.path - The
sys.pathused to launch the interpreter
If you have a PEX file without a prescribed entry point, or want to change
the entry_point for the duration of a single invocation, you can set
PEX_MODULE=entry_point using the same format as described in the
python_binary Pants target.
This can be a useful tool for bundling up a number of packages together and being able to use a single file to execute scripts from each of them.
Another common pattern is to link pytest into your PEX file, and run
PEX_MODULE=pytest my_pex.pex tests/*.py to run your test suite in its
isolated environment.
There is nascent support for performing code coverage within PEX files by
setting PEX_COVERAGE=<suffix>. By default the coverage files will be written
into the current working directory with the file pattern .coverage.<suffix>. This
requires that the coverage Python module has been linked into your PEX.
You can then combine the coverage files by running PEX_MODULE=coverage my_pex.pex .coverage.suffix* and run a report using PEX_MODULE=coverage my_pex.pex report. Since PEX files are just zip files, coverage is able
to understand and extract source and line numbers from them in order to
produce coverage reports.
As an aside, in Python, you may not know that you can import code from directories:
$ mkdir -p foo
$ touch foo/__init__.py
$ echo "print 'spam'" > foo/bar.py
$ python -c 'import foo.bar'
spamAll that is necessary is the presence of __init__.py to signal to the importer that we
are dealing with a package. Similarly, a directory can be made "executable":
$ echo "print 'i like flowers'" > foo/__main__.py
$ python foo
i like flowersAnd because the zipimport module now provides a default import hook for
Pythons >= 2.4, if the Python import framework sees a zip file, with the
inclusion of a proper __init__.py, it can be treated similarly to a
directory. But since a directory can be executable, if we just drop a
__main__.py into a zip file, it suddenly becomes executable:
$ pushd foo && zip /tmp/flower.zip __main__.py && popd
/tmp/foo /tmp
adding: __main__.py (stored 0%)
/tmp
$ python flower.zip
i like flowersAnd since zip files don't actually start until the zip magic number, you can embed arbitrary strings at the beginning of them and they're still valid zips. Hence simple PEX files are born:
$ echo '#!/usr/bin/env python2.6' > flower.pex && cat flower.zip >> flower.pex
$ chmod +x flower.pex
$ ./flower.pex
i like flowersRemember pants.pex?
$ unzip -l pants.pex | tail -2
warning [pants.pex]: 25 extra bytes at beginning or within zipfile
(attempting to process anyway)
-------- -------
7900812 543 files
$ head -c 25 pants.pex
#!/usr/bin/env python2.6The __main__.py in a real PEX file is somewhat special:
import os
import sys
__entry_point__ = None
if '__file__' in locals() and __file__ is not None:
__entry_point__ = os.path.dirname(__file__)
elif '__loader__' in locals():
from pkgutil import ImpLoader
if hasattr(__loader__, 'archive'):
__entry_point__ = __loader__.archive
elif isinstance(__loader__, ImpLoader):
__entry_point__ = os.path.dirname(__loader__.get_filename())
if __entry_point__ is None:
sys.stderr.write('Could not launch python executable!\n')
sys.exit(2)
sys.path.insert(0, os.path.join(__entry_point__, '.bootstrap'))
from twitter.common.python.importer import monkeypatch
monkeypatch()
del monkeypatch
from twitter.common.python.pex import PEX
PEX(__entry_point__).execute()PEX is just a class that manages requirements (often embedded within PEX
files as egg distributions in the .deps directory) and autoimports them
into the sys.path, then executes a prescribed entry point.
If you read the code closely, you'll notice that it relies upon monkeypatching
zipimport. Inside the twitter.common.python library we've provided a recursive
zip importer derived from Google's pure Python zipimport
module that allows for depending upon eggs within eggs or zips (and so forth)
so that PEX files need not extract egg dependencies to disk a priori. This even
extends to C extensions (.so and .dylib files) which are written to disk long
enough to be dlopened before being unlinked.
Strictly speaking this monkeypatching is not necessary and we may consider making that optional.
TODO: converting python_library targets to eggs
TODO: auto dependency resolution from within PEX files
TODO: dynamically self-updating PEX files
TODO: tailoring your dependency resolution environment with pants.ini, including local cheeseshop mirrors
TODO: multi-interpreter / multi-platform support with pants.multi / pants goal setup
Thanks for the fantastic tutorial. Is there a way to run all tests under a directory?
I was guessing something like:
./pants src/path/to/a/package:...