Skip to content

Instantly share code, notes, and snippets.

@FRidh
Last active February 12, 2023 18:00
Show Gist options
  • Save FRidh/c3e974e34eac738405af699786b462d5 to your computer and use it in GitHub Desktop.
Save FRidh/c3e974e34eac738405af699786b462d5 to your computer and use it in GitHub Desktop.

Python on Nix infrastructure

Introduction

Python is used throughout Nixpkgs. We use it for certain scripts, we provide Python libraries, and we provide applications. There are several methods on how to use Python on Nix, each with their pros and cons. An overview of all issues with the current Python infrastructure on Nix is available in the placeholder issue 1819.

Goal

While many things work really nice there definitely still are issues. This document states how we intend to support Python on Nix, it describes our current infrastructure, and contains a proposal for an improved infrastructure that supports the following use cases:

  • installing Python applications in a profile. These should expose the program but not the Python modules.
  • creating environments for Python development like virtualenv but with the additional possibility of including other non-Python programs.
  • temporary Python environments but also a permanent environment by installing it in a profile.
  • Python programs that call other Python programs without mangling the search path for modules. That means e.g. that a Python 2 program can call a Python 3 program without issues.
  • namespace packages
  • combine any of the above without issues.

Furthermore, we would like to support the following Python tools:

  • virtualenv for creating virtual environments. While nix-shell can do the same and more many still need virtualenv.
  • tox for testing against multiple environments.
  • nuitka, a Python compiler that depends on SCons. The challenge here is that SCons is a Python 2.7 tool while Nuitka can work with any CPython version.

Specific test cases

The following are test cases for each of the issues:

TODO

Issues solved

The following issues are supposed to be solved

  • #11423: Have a Python package on PATH without adding it to PYTHONPATH
  • #16591: PYTHONPATH leaks in subprocesses.
  • #22688: Do not use --prefix PYTHONPATH because it leaks PYTHONPATH.
  • #23676: Subprocesses do not have modules on their sys.path.
  • #24128: wrapPythonPrograms should not add (propagated)BuildInputs build inputs to wrappers.

Overview of Python packaging

Applications, libraries and environments

A common distinction to make is that between applications and libraries. An application is a standalone program. The application can depend on Python libraries but any libraries provided by the application (direct or indirect) shouldn't be shared or integrated in other applications or environments.

When developing one is interested in the interpreter, (Python) libraries and possibly some tools that may depend on the exact environment they're used in.

An environment provides all the programs needed. In the case of an application this typically means the only entry point provided by the environment is the application itself, whereas in the case of development environments multiple tools may be available.

Let's clarify each with an example. The e-book suite Calibre is a program that is written in Python. When using Calibre one is not interested in any libraries. One just wants to use the program and thus we call this an application.

The package numpy is a library, and is used for development. It does provide the program f2py2 but this is typically only used in conjunction with the development numpy is used in. A similar example is pytest; one typically uses pytest in the actual development environment.

In some cases this distinction may not be so clear. E.g. the Jupyter Notebook is an application that is used for development. It depends on a kernel which is chosen for the environment one uses for development, e.g. a Python 2 or 3 kernel. However, it supports multiple kernels simultaneously so one could separate the package, having the Notebook as an application and the kernels as libraries.

Distributing and installing Python code

Python code can be distributed in different ways.

The most common format is a Source Distribution or "sdist". This contains the essential source code along with some meta data for pip. Source distributions are typically installed with setuptools and can be recognized by its setup.py file. They used to be installed with python setup.py install but are nowadays commonly installed in two steps by first creating a wheel with python setup.py bdist_wheel and then installing the wheel.

The wheel isn't just an intermediate step in the building process but is also a popular distribution format. A wheel is a Built Distribution. Wheels are often pure Python but can contain binary code. Wheels are installed with pip using pip install *.whl. While setuptools is most commonly used for building wheels, there exist other tools for building wheels. One example is flit.

In some cases installation is done entirely different, e.g. with the help of a Makefile. Libraries can sometimes also provide Python bindings.

Finally, when developing one might want to use an editable or development mode installation with pip install -e.

Finding Python libraries

Python modules are installed in lib/pythonX.X/site-packages/<pname>. Installed right next to it is the dist-info folder, lib/pythonX.X/site-packages/<pname>-<version>.dist-info. This folder is needed for pip/setuptools to determine which packages have been installed.

The exact Python import logic is quite extensive. What follows is a very brief summary:

  • Python modules can be imported from folders that are on sys.path.
  • During startup of the interpreter it looks for the folder sitecustomize.py on sys.path. This file can be used to add additional site-packages folders to sys.path.
  • After startup, it checks the environment variable PYTHONPATH which is a list of folders. These folders are added to sys.path before everything else.
  • Entries that are added directly to sys.path are not recursed into. One can instead use site.addsitedir to add folders to sys.path. site.addsitedir does recurse by e.g. following .pth files.
  • .pth files list folders or other .pth files that can be added to sys.path.

The first entry in sys.path is special and is the directory containing the script that was used to invoke the interpreter.

Another environment variable of interested is PYTHONHOME. This environment variable can be used to change the location of the standard Python libraries. By default, the libraries are searched in prefix/lib/pythonversion and exec_prefix/lib/pythonversion, where prefix and exec_prefix are installation-dependent directories, both defaulting to /usr/local.

Name of and path to the program

The sys.argv attribute represents the list of arguments passed to a Python program. The first value, argv[0], is the script name. Its OS-dependent whether this is a full path or not but on Linux and Darwin systems it is. If the command was executed using the -c command line option to the interpreter, argv[0] is set to the string '-c'. If no script name was passed to the Python interpreter, argv[0] is an empty string.

The name and full path to the program are of interest because programs might want to call themselves.

Current implementation of Python on Nix

Applications, libraries and environments

Python applications are spreadout throughout the Nixpkgs tree following the general guidelines.

The file pkgs/top-level/python-packages.nix contains or refers to all Python library expressions, and these packages can be accessed through pkgs.pythonXX.pkgs.<name>. Typically one creates a environment with pythonXX.withPackages or pythonXX.buildEnv.

Packaging Python packages

The main function for packaging Python packages is buildPythonPackage. Furthermore, buildPythonApplication exists for applications. The only difference is that buildPythonPackage modifies the name to include the interpreter version.

An important argument is format which is used to choose between setuptools (sdist), flit, wheel and other. The most common format is setuptools. Wheels are also increasingly used in Nixpkgs. The last option is used when none of the others apply. In this case the packager needs to provide a buildPhase and installPhase.

The goal of the buildPythonPackage (and buildPythonApplication) is to guarantee that applications work and modules can be found.

Building a package

The Python interpreter provides a setup hook that recurses into the propagatedBuildInputs and adds the site-packages folder of each to the environment variable PYTHONPATH. This allows the package that is being build to find its dependencies. The hook is also run by nix-shell. While that makes sense when building/debugging the build, it is also abused for creating temporary environments with nix-shell -p python3.numpy python3.pytest.

The wrapPythonPrograms shell function wraps all executables in a derivation and does two things:

  • it uses site.addsitedir to update sys.path with dependencies. It recursively traverses propagatedBuildInputs and pythonPath.
  • it fixes the name, sys.argv[0], of the script. This has to be done because the wrapper moves the original script.

The buildPythonPackage function patches the shebangs of all scripts provided. That way, the scripts can find the correct Python interpreter. It also exectutes wrapPythonPrograms. Python applications that are installed can now find its dependencies and will function.

Building an environment

The python.buildEnv function creates an environment that consists of symbolic links to all files that are provided by the packages that are to be included in the environment. The shebangs of the scripts have already been patched by buildPythonPackage to point to the correct Python interpreter. However, that store entry contains just the interpreter, and not other Python packages that are to be included. Therefore, python.buildEnv not only creates symbolic links but also wraps each script with a wrapper that sets PYTHONHOME to the interpreter in the newly created environment. This environment can now be installed. The python.withPackages function provides as simpler interface to python.buildEnv.

Suggested improvements

Finding libraries

  • add a sitepackages.py to the interpreter that listens to NIX_PYTHONPATH and/or NIX_PYTHON_PTH with the latter referring to a .pth file.

Installing packages

Setting the name

  • use exec -a name program to set the name of program. Python however does not support exec -a.
  • wrap the interpreter and set the name of the program through sitecustomize.py. The attribute sys.argv is unavailable at that point.
  • patch the interpreter to listen to an environment variable, NIX_PYTHON_NAME, that defines the name. https://github.com/python/cpython/blob/3.5/Python/sysmodule.c#L2050

Wrap without leaking

  • use --set PYTHONPATH. This breaks a Python feature. See the discussion.

Proposed implementation

@bjornfor
Copy link

Ok, so I guess this is a bad place to discuss then. Just wanted to let you know I read your message above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment