There are many situations where we are inclined to produce multiple variants of the same package, with each variant depending on a different set of low-level dependencies. For instance:
- A numerical package might rely on the use of the Basic Linear Algebra Subprograms (BLAS). There are a variety of implementations of the BLAS we might wish to support, including including MKL, OpenBLAS, ACML, Accelerate, ATLAS.
- We might wish to compile Python against different compilers that are not link-compatible with each other; thus all packages compiled against the CPython API must be recompiled.
The existence of these multiple variants can potentially pose a problem for users: how do they make sure that all of the packages in their environment are compatible with each other? That is: how do we ensure all packages that rely on BLAS use the same BLAS variant? How do we ensure that all packages with CPython dependency use the same ABI?
If there are only two variants, then conda's features
/track_features
facility provides a solution. For instance, if a user installs the nomkl
metapackage, it turns on the nomkl
feature, which causes all packages
that link to BLAS to select an OpenBLAS variant instead of an MKL variant.
Unfortunately, features (or perhaps our deployment of them) have proven
to be a bit fragile, and they are necessarily limited to two variants.
To address this problem, we propose to formalize an approach for relying
on conda
's natural dependency resolution facilities. As you might have
guessed, we are calling this approach variants.
To construct a set of variants, we begin by collecting the following information:
- A name for the variant class; e.g.,
blas
- Names for variant instance; e.g.,
mkl
,openblas
,accelerate
,atlas
.
These names must be compatible with Windows and Unix filename conventions,
and cannot contain dash -
characters (underscores are fine). Armed with
this information, we proceed to build a set of packages, one for each
variant instance, as follows:
- Package name: the variant class; e.g.,
blas
- Build string: the variant instance; e.g.,
mkl
- Version number: 1 for the preferred instance; 0 for all others
- Build number: 0, identical across all instances
- Dependencies: none
The specific choices of 0 and 1 are not necessarily important for the version
and build numbers. However, selecting exactly one variant instance to have version 1,
and using identical values in all other cases, is important to communicating the
preference information to conda
. In theory, you could provide a preference hierarchy
using version numbers 2, 3, etc. as well.
As a result of this build process, we will obtain a set of files with names of
the form name-0-instance.tar.bz2
or name-1-instance
, assuming that the standard
naming convention is employed. For instance, for BLAS, we might have the following
filenames:
blas-1-mkl.tar.bz2
blas-0-openblas.tar.bz2
blas-0-accelerate.tar.bz
blas-0-atlas.tar.bz2
Once the variants have been built, we can now build packages that rely on them. To do so, we simply include the appropriate package as a dependency. For instance, the MKL version of a package might have this in their dependency list:
depends:
- mkl
- blas * mkl
Note the use of the wildcard for version number. This gives you the ability to build these packages without knowing which variant is preferred. In fact, you can even change the preferences after the fact without having to rebuild these packages.
One might be tempted to simplify this process by including mkl
as a dependency
of blas-1-mkl
, openblas
as a dependency of blas-0-openblas
, and so forth.
In some cases, this should work just fine, but I would recommend this approach
only if that dependency can be made completely version free. In other words,
don't make blas-1-mkl
depend on mkl 12.1.*
; just make it depend on mkl
.
It will be very important to avoid the need to update these metapackages as
the new versions of their underlying dependencies change. If a particular package
does require a specific version of MKL, it can still be specified alongside
the variant metapackage; e.g.,
depends:
- mkl >=12.1,<13
- blas * mkl
Having said this, in some cases a variant will naturally be tied to particular versions. For
instance, suppose we used a variant approach to differentiate between incompatible C++ ABIs.
In this case, the individual variant instances might be drawn from a matrix of different
C++ compilers and versions; e.g., cppabi-*-gcc5
, cppabi-*-icc4
, etc. (These are simply
examples; I have no specific knowledge of C++ ABI issues.) In this case, it would be
desirable for the variant metapackages to include version specifications in their dependencies.
Now that the variants have been put in place, a user can begin taking advantage of them without even knowing they are present. Suppose for instance the NumPy and SciPy have been built against multiple BLAS versions. Then performing
conda create -n newenv python=2.7 numpy scipy
will automatically install blas-1-mkl.tar.bz
, and sure that the mkl
variant of both
NumPy and SciPy are selected.
If the user wishes to specify a particular variant, they can do this:
conda create -n newenv python=2.7 numpy scipy blas=*=openblas
Note the use of the wildcard to specify the version number. This will create the same environment as before, but with the openblas variant. To change variants, the user can simply install the a new variant package; for instance,
conda install blas=*=atlas
force NumPy and SciPy to be updated to their ATLAS variants.
Using the version number to specify the "preferred" or "default" variant
introduces a problem with conda update --all
. When this command is
run, conda
will select the highest version number of the variant class.
It will switch the user to this preferred variant instance, whether or not
they asked for it.
Unfortunately, giving all of the variant metapackages the same version
number eliminates our ability to specify one as the default---and it
still runs into problems with conda update --all
. Under this
scenario, conda
will see a tie across all of the variants, and it
will break that tie in an undefined manner. There will be no
predicatbility on initial installs of the variant unless it is
explicitly specified.
So it is clear that we will need to come up with an improvement to
the Conda solver that will allow us to achieve the full behavior we
seek. I propose a simple modification: when conda update --all
is
specified, we do not include variant metapackages in the list of
packages to be updated. This will require some formal way to
communicate to the solver that a package is not to be included in
conda update --all
.
If someone wishes to use variant packages effectively with an
older version of conda
, then they could pin the particular
variant metapackage.
Consider again the following sequence of commands:
conda create -n newenv python=2.7 numpy scipy blas=*=openblas
conda install blas=*=mkl
The first command will install the OpenBlas variants of NumPy and SciPy, which will
require the installation of the openblas
conda package. The second command will
replace NumPy and SciPy with ATLAS variants, and install the mkl
conda package.
The second command, however, does not remove OpenBLAS from the conda environment,
even though it is not being used.
This is a natural consequence of the way conda
works, and is not necessarily a
problem if mkl
and openblas
are properly designed. In fact, we might want
both packages to be installed alongside each other. For instance, there might be
an applciation outside of the Python ecosystem that depends on a different BLAS
variant than the one we have specified for Python.
Nevertheless, it points to a potential improvement in conda: the ability to
detect these "orphan" packages and, upon request (say, with a conda clean
command)
remove them from an environment. This can be accomplished by examining the install
and remove history for a given environment and differentiating between packages
that are explicitly installed, those required because of dependencies, and orphans.
Would add that if preference is to be held in this variant concept, there should be 2 versions numbers. The reason being the ordering may need to be adjusted or other package metadata may need to be updated. In this case, it would be ideal to be able to change the first version number for this content so everyone gets the latest working one and preference can be handled in the second one.
As far as preference goes generally, the
0/1
situation with preference is a little to flat. That said, after a year of theblas
package, I think conveying preference was a bad idea and would rather have none at all. Things like channel priority have rectified things significantly. Ultimately it is up to users to make these choices. In this case, there would only need to be one version number for the package.Have you tested cases where a package exists for only one type of BLAS, but not others? This case has become very common as there are many packages in
conda-forge
built withopenblas
, which simply don't exist for other variants (or in other channels). I don't expect any of the ideas presented to cause problems for this use case, but it would be good to verify that is true. Happy to provide example packages if needed.