Skip to content

Instantly share code, notes, and snippets.

@gabrielhuang
Created March 8, 2023 18:27
Show Gist options
  • Save gabrielhuang/5a9582c264f714a24a133336a6ae9008 to your computer and use it in GitHub Desktop.
Save gabrielhuang/5a9582c264f714a24a133336a6ae9008 to your computer and use it in GitHub Desktop.
GROBID dockerfile
FROM grobid/grobid:0.7.2
RUN apt-get update && \
apt-get install openjdk-8-jdk -y --no-install-recommends && \
apt-get autoremove -y --purge && \
apt-get clean -y && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* && \
rm -f /etc/legal /etc/motd
WORKDIR /
RUN wget https://github.com/kermitt2/grobid/archive/0.7.2.zip && \
unzip 0.7.2.zip && \
mv grobid-0.7.2 grobid && \
rm 0.7.2.zip
WORKDIR /grobid
RUN ./gradlew clean install
WORKDIR /grobid/grobid-home/config
RUN sed -i 's/temp: "tmp"/temp: "\/tmp\/grobid-internal"/g' grobid.yaml
WORKDIR /grobid
ENV IN_DIR /data/in
ENV OUT_DIR /data/out
CMD ["sh", "-c", "java -Xmx4G -jar grobid-core/build/libs/grobid-core-0.7.2-onejar.jar -gH grobid-home -ignoreAssets -dIn ${IN_DIR} -dOut ${OUT_DIR} -exe processFullText"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment