This document instructs how to set up a Fly.io instance for personal use that runs ChaiNNer. Fly.io has several advantages, namely that
- it is pay for what you use, and thus very cheap, and if you spend <$5 in a month, totally free, and
- if your account has GPUs enabled (which you have to email support for), you get access to very powerful GPUs such as the L40S, which we will be using here.
Credit to the document ChaiNNer with remote backend for providing the basic instructions.
Requirements:
- A Fly.io account with GPUs enabled.
- A Unix system with socat, Unison and NPM installed.
It only requires Unix for shell scripting,
so could probably work on other OSes without much effort –
maybe you could translate the code to Python.
Note that some repositories – such as Ubuntu’s –
ship a gutted version of Unison that lacks the features these instructions require
(in particular, the
unison-fsmonitor
binary), so installing it from scratch would be recommended.
Make a new directory for the server. Make sure it has a stable path.
Start by making a fly.toml
config file for our app:
app = '{YOUR USERNAME}-chainner'
primary_region = 'ord'
[[services]]
internal_port = 1234
protocol = "tcp"
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 0
[[services.ports]]
handlers = []
port = 1234
[[services]]
internal_port = 8000
protocol = "tcp"
auto_stop_machines = "stop"
auto_start_machines = true
min_machines_running = 0
[[services.ports]]
handlers = []
port = 8000
[[vm]]
size = 'l40s'
Replace {YOURUSERNAME}
with your username.
Actually, you can call the app anything you like,
but be aware that app names are global.
We choose ord
as the primary region,
as it is the only region with l40s
support.
We register two services on ports 1234 and 8000
for Unison and ChaiNNer respectively.
Technically, the entire
services
section is unnecessary; it is only needed if you want to connect over Flycast instead of directly. The main advantage of using Flycast is that Fly.io will automatically shut down your machines when not in use, ensuring that you won’t accidentally rack up a huge bill.
If there is a set of models that you expect yourself to commonly use,
it makes sense to put it directly in the app’s image.
So make a subdirectory – here we call it Models
–
and put whatever files you like in there.
Next is the all-important Dockerfile
.
FROM debian:bookworm-slim
ENV DEBIAN_FRONTEND=noninteractive
RUN rm -f /etc/apt/apt.conf.d/docker-clean; \
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update && \
apt-get install -y --no-install-recommends git curl ca-certificates libglib2.0
RUN git clone https://github.com/chaiNNer-org/chaiNNer && \
cd chaiNNer && \
git checkout 4f056759bda3ef1bb0c66c88d033970667cf6947
# Manually install Unison (to sync folders) from sources cause Unison in
# the APT repository does not contain 'unison-fsmonitor'.
ARG UNISON=2.53.7
RUN mkdir /tmp/unison && cd /tmp/unison \
&& curl -Lo unison.tar.gz https://github.com/bcpierce00/unison/releases/download/v$UNISON/unison-$UNISON-ubuntu-x86_64-static.tar.gz \
&& tar -xzvf unison.tar.gz \
&& cp ./bin/* /usr/local/bin \
&& rm -rf *
COPY --from=ghcr.io/astral-sh/uv:0.5.27 /uv /uvx /bin/
ENV UV_LINK_MODE=copy
COPY /chainner.diff /
RUN --mount=type=cache,target=/root/.cache/uv cd chaiNNer && \
git apply /chainner.diff && \
uv venv --python 3.11 && \
uv pip install setuptools && \
uv pip install -r requirements.txt && \
uv run backend/src/run.py --install-builtin-packages --close-after-start
RUN mkdir -p /path/to/your/server/directory/Models
COPY Models /path/to/your/server/directory/Models
COPY /init.sh /
ENTRYPOINT ["/init.sh"]
The Dockerfile
- downloads some basic dependencies needed in the Dockerfile itself,
as well as
libglib2.0
, which is depended on by one of ChaiNNer’s Python dependencies; - clones the ChaiNNer repo itself, checks it out to the latest commit at the time of writing, and applies the patch given in this Gist;
- installs Unison and uv;
uv
isn’t strictly necessary, but it is much much faster than Pip, so I use it here. - installs all the basic packages needed in the server (stuff like Pytorch, NCNN, etc – this step will take a while), and
- copies over your common models into the
Models
directory.
Make sure to replace /path/to/your/server/directory
with its actual path.
The patch applied to ChaiNNer does several things:
- Pip is replaced with
uv
(as part of this, various other changes are made, such as the deletion ofpyproject.toml
and the renaming ofspandrel_extra_arches
to its “real” PyPI name,spandrel-extra-arches
); typing_extensions
is updated, to work around a bug;opencv-python
is replaced withopencv-python-headless
, which avoids the need for several dependencies;- a step is hacked into the server to uninstall
opencv-python
(which is implicitly installed as it is depended on byncnn-vulkan
andfacexlib
) and replace it fully withopencv-python-headless
; - most of dependency installation code is replaced with simpler versions, since we don’t get progress bars anyway with Docker – this is not strictly necessary but makes things much easier to debug;
- the code that auto-detects whether an NVIDIA graphics card is present is replaced with code that unconditionally uses CUDA, which is necessary as the Fly.io builder servers do not have graphics cards, and so otherwise the Dockerfile would erroneously install CPU-only PyTorch;
- PyTorch is upgraded from 2.1.2 to 2.6.0, as 2.1.2 supports a maximum CUDA version of 12.1, while the L40S requires at least CUDA 12.2, and
- the HTTP server is modified to listen using dual-stack IPv6 instead of only IPv4, which enables connecting to the server directly if you want (and not just over Flycast).
The init.sh
file is the entrypoint of the image,
and is rather minimal:
#!/bin/sh
unison -socket 1234 &
cd chaiNNer
uv run backend/src/run.py 8000
We just start the Unison server listening on socket 1234
,
and also start the ChaiNNer backend itself.
After installing flyctl
and logging in to your Fly.io account,
use fly launch --flycast --copy-config
to create the app.
--flycast
configures the app for accessing it over Flycast,
and its inclusion is equivalent to running fly ips allocate-v6 --private
after the fact.
--copy-config
is needed because the fly.toml
has already been made.
You might have to now run fly deploy
to get the app actually up and running.
The command may decide to make you two Machines,
in which case you can avoid paying for them both by destroying one of them –
fly machine list
will show their IDs,
and fly machine destroy {ID}
will get rid of it.
You’ll want to connect to your Fly.io VPN now, so you can actually access the apps. They’re not exposed to the public by default (and for good reason).
Find some directory and clone the ChaiNNer repository in there.
In this example, we assume it is present at ~/.local/src/chaiNNer
.
Put following shell script somewhere in your $PATH
,
for example at ~/.local/bin/chainner-client
.
#!/bin/sh
set -eu
app_name={YOUR USERNAME}-chainner
machine_id={YOUR MACHINE ID}
remote_host={YOUR FLYCAST IPV6 ADDRESS}
trap "exit" INT
all_traps=
add_trap() {
all_traps="$1;$all_traps"
trap "$all_traps" EXIT
}
# set to `false` for debugging (see Debugging section)
if true; then
host="[$remote_host]"
socat TCP-LISTEN:8000,fork,reuseaddr "TCP:$host:8000" &
proxy=$!
add_trap "kill $proxy"
fly machines -a $app_name start $machine_id
add_trap "fly machines -a $app_name stop $machine_id"
fly ssh console -a $app_name -C "sh -c \"rm -rf '$PWD' && mkdir -p '$PWD'\""
else
host='127.0.0.1'
docker exec chainner sh -c "rm -rf '$PWD' && mkdir -p '$PWD'"
fi
unison . socket://$host:1234$PWD -batch -auto -repeat watch -ignore 'Name *.kra' -ignorearchives &
unison=$!
add_trap "kill -s INT $unison"
(cd ~/.local/src/chaiNNer && npm run frontend || true)
Set app_name
to the same app name as above;
set machine_id
to the ID of your machine,
which can be obtained by running fly machine list
in the server directory
(or, more directly, by fly machine list -a $app_name | grep l40s | awk '{ print $1 }'
);
set remote_host
to the Flycast IP address
which can be obtained by running fly ips list
in the server directory
(or, more directly, by fly ips list -a $app_name | grep v6 | awk '{ print $2 }'
).
Then enter into any directory which has the images you want to process
and run the script.
After some waiting while things get set up
you should be able to use ChaiNNer like normal,
with the one caveat being that you cannot read or write to images
outside of the directory in which you ran the script –
so make sure you stick to that directory.
You also have read-only access to the models in the Models
directory.
The script uses Socat to establish a TCP proxy between your local port 8000,
which ChaiNNer expects to connect to the server on,
and the remote port 8000.
ChaiNNer’s frontend does expose an (unstable) --remote-host
CLI option
to which a URL like http://$host
could be directly passed,
but I could never get this to work –
perhaps due to the use of IPv6.
The script uses Unison to synchronize the contents of the directory you ran the script in
with its newly-made equivalent on the remote machine.
In this example, we pass in -ignore 'Name *.kra'
to avoid transferring all Krita files to the server;
adjust this option to your liking,
and read through the Unison manual for information about the syntax.
The -repeat watch
option ensures that synchronization is continuous;
thus output files from the remote will immediately appear on your machine.
-ignorearchives
ensures that Unison doesn’t assume that the server is persistent,
since in our case its filesystem is ephemeral.
In theory, one should be able to set remote_host=$app_name.flycast
,
but I found this to not work in practice,
and I’m not sure if it’s a Fly.io bug.
To bypass Flycast and use a direct connection
one can set remote_host
to the IP address of the machine,
which is shown in the fly machine list
table
(again, one can in theory use remote_host=$app_name.internal
for this purpose
but I found that unreliable).
Also in theory one does not need to explicitly start the machine as I do here,
as Flycast should handle that,
but I start it anyway.
I recommend placing the following script in the server directory for running and debugging the setup locally (requires Docker):
#!/bin/sh
set -eu
cd "$(dirname "$0")"
git --work-tree ~/.local/src/chaiNNer --git-dir ~/.local/src/chaiNNer/.git diff > chainner.diff
docker container kill chainner || true
docker container rm chainner || true
trap exit INT
trap "docker container kill chainner && docker container rm chainner" EXIT
docker buildx build -t chainner . --progress=plain
docker run --name=chainner -p 127.0.0.1:1234:1234 -p 127.0.0.1:8000:8000 chainner
Edit the client script above to use if false
instead of if true
to set it up for Docker.
You might notice starting the server up is quite slow,
and if you check the logs
it gets stuck at the [Worker] Loading Nodes...
phase.
I don’t know the internals of ChaiNNer,
so I don’t really know what’s happening there.
The server also complains that ONNX Runtime (GPU) is missing.
I’m not sure what’s up with that.
There is an error [Worker] vkCreateInstance failed -9
that appears in the logs, so maybe it’s related.