Recently, I was building out a set of Python packages and services and needed to find a way to pull down packages from an Azure Artifacts feed into a Docker image. It was straightforward to use the tasks to package an artifact, authenticate to the feed, and publish.
I had to do a bit more digging to piece together a flow I was comfortable with for building container images. This post describes some of the challenges involved and how I solved for them.
The PipAuthenticate
task is great - it authenticates with your artifacts feed and per the docs, will store the location of a config file that can be used to connect in the PYPIRC_PATH
environment variable.
That said - by design, containers run in an isolated environment. We can't directly access it while building a container image. We need a way to get that config inside the build phase so that our calls to python -m pip install
are successful. You are using a virtual environment & python -m pip install
to install packages, right?
Docker doesn't currently support* mounting volumes at build time. So we can't just mount our PYPIRC_PATH
file from the Azure Pipelines host into the build.
It would be much easier to pass a string as a --build-arg
to Docker and then consume it. Azure Pipelines tasks are open source on GitHub, so I thought I'd take a look to see how the task worked and possibly extend it. It turns out that the PipAuthenticate
task has some undocumented behavior bonus features and it already does what I want! It populates the PIP_EXTRA_INDEX_URL
environment variable, which is automatically picked up by pip
.
*Well, sort of! You can solve this with
--mount=type=secret
when you enable BuildKit. If this was a personal project, I'd have stopped there and said #shipit! In this case, I was really looking to find something that works for all users and isn't explicitly marked "experimental".
Great! We pass in our build arg, set ENV PIP_EXTRA_INDEX_URL=$PIP_EXTRA_INDEX_URL
and call it a day, right! Right...?
Not so fast - we want to have PIP_EXTRA_INDEX_URL
available when we pull packages, but we don't want secret environment variables baked into any of the layers of a runtime image. So we'll combine what we've learned so far with a multi-stage build and we're off to the races!
In my real container build, I needed to install gcc
, musl-dev
, python3-dev
and a bunch of other things to pull down my dependencies & build wheels - so a multi-stage build drops my final image size from >1GB down to ~100MB anyway
I've attached a few sample files that I pulled from my working pipeline to get you started with this approach. I hope this helps and plan for this post to be soon obsolete after I complete a few pull requests into Microsoft docs! :)
You can form a manual extra index using a PAT, see here