for now we only need one Dockerfile, because the only difference between the two is EXPOSE and CMD. If we take the batch image as the default, we could launch the Prodigy server via `docker run -p 8080:8080 python -m prodigy ctgov_titles tsr_worthy_dataset -F recipe.py' getting error in building the Dockerfile batch, issue with prodigy wheel file. Don't know why, but I think the following is the best way to solve it upgrade to Python 3.10-slim-buster since we are at it upgrade to prodigy 1.11.8 include all linux wheels in the Docker build process just to be sure At a later stage we could use Docker [multi-stage builds](Docker multi-stage builds) to keep Docker images to a minimum. using FROM python:3.10-slim-buster will make the image size smaller (see this article). Full 3.10 is almost 1 GB, the slim version around 165-200 MB.
-
-
Save dkapitan/c3342956631a59799c7462320082ce52 to your computer and use it in GitHub Desktop.
Dockerfile for spacy prodigy for cloud setup using remote postgresql including changing instructions file and overriding index.html - and the leanest I've got
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FROM python:3.6-alpine | |
# Opted for alpine to get a lean docker image as possible | |
RUN apk add --no-cache openssl | |
ENV DOCKERIZE_VERSION v0.6.1 | |
RUN wget https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \ | |
&& tar -C /usr/local/bin -xzvf dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \ | |
&& rm dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz | |
# Python deps for alpine | |
RUN apk add --no-cache postgresql-libs && \ | |
apk add --no-cache --virtual .build-deps gcc musl-dev postgresql-dev g++ | |
RUN mkdir -pv /prodigy /prodigy/src | |
WORKDIR /prodigy | |
# the prodigy wheel file is something you get when you buy prodigy, it's not a free package | |
COPY ./*.whl /prodigy | |
COPY requirements.txt /prodigy | |
RUN pip install -r requirements.txt --no-cache-dir \ | |
&& find /usr/local \ | |
\( -type d -a -name test -o -name tests \) \ | |
-o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \ | |
-exec rm -rf '{}' + \ | |
&& runDeps="$( \ | |
scanelf --needed --nobanner --recursive /usr/local \ | |
| awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \ | |
| sort -u \ | |
| xargs -r apk info --installed \ | |
| sort -u \ | |
)" \ | |
&& apk add --virtual .rundeps $runDeps \ | |
&& apk del .build-deps | |
ENV PRODIGY_HOME /prodigy | |
# the template that dockerize would use to create the actual prodigy.json config based on environment variables | |
COPY ./prodigy.json.tpl /prodigy/prodigy.json.tpl | |
#Preparing an instructions file | |
COPY ./instructions.txt /prodigy/instructions.txt | |
COPY *.sh /prodigy/ | |
COPY src/* /prodigy/src/ | |
# Comment the two next lines if you don't override the index.html file | |
COPY static/index.html index.html | |
RUN PRODIGY_FILES=`python -c "import prodigy; print(prodigy.__file__.replace(\"__init__.py\",''))"`; cp index.html $PRODIGY_FILES/static/ | |
# The actual entry point compiling the template | |
CMD ["dockerize", "-template", "/prodigy/prodigy.json.tpl:/prodigy/prodigy.json", "./launch.sh"] | |
EXPOSE 8080 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"batch_size": 5, | |
"host":"0.0.0.0", | |
"instructions":"/prodigy/instructions.txt", | |
"hide_meta": true, | |
"choice_auto_accept": true, | |
"db": "postgresql", | |
"db_settings": { | |
"postgresql": { | |
"host":"{{ .Env.DB_HOST }}", | |
"dbname": "{{ .Env.DATABASE_NAME }}", | |
"port": 5432, | |
"user":"{{ .Env.DB_USERNAME }}", | |
"password":"{{ .Env.DB_PASSWORD }}" | |
} | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment