Skip to content

Instantly share code, notes, and snippets.

@usrbinkat
Last active October 27, 2024 18:59
Show Gist options
  • Save usrbinkat/3c87a5fa67c8144fa6a02460dfae58c4 to your computer and use it in GitHub Desktop.
Save usrbinkat/3c87a5fa67c8144fa6a02460dfae58c4 to your computer and use it in GitHub Desktop.
Pandoc Markdown to PDF

Pandoc Docker Container

A Docker container for converting Markdown files to high-quality PDFs using Pandoc and XeLaTeX.

Features

  • Easy Conversion: Transform Markdown files into professional PDFs effortlessly.
  • High-Quality Output: Leverages XeLaTeX and custom fonts for superior typography and Unicode support.
  • Fully Featured: Pre-installed with Pandoc, extensive LaTeX packages, and fonts for comprehensive PDF generation.
  • Customizable: Modify the Dockerfile to suit your specific needs or extend functionality.
  • Pipeline Ready: Ideal for integration into CI/CD pipelines or automated documentation workflows.

Table of Contents

Getting Started

Prerequisites

  • Docker: Ensure Docker is installed on your system. Get Docker

Installation

Pull the Docker image from Docker Hub:

docker pull containercraft/pandoc

Or build the image locally using the provided Dockerfile:

docker build --progress plain --tag containercraft/pandoc -f Dockerfile .

Usage

Simple Conversion

To convert a Markdown file (my_document.md) to PDF:

docker run --rm -v $(pwd):/convert containercraft/pandoc my_document.md
  • --rm: Automatically removes the container after execution.
  • -v $(pwd):/convert: Mounts the current directory into the container.
  • my_document.md: The Markdown file to convert.

The generated PDF (my_document.pdf) will be saved in your current directory.

Advanced Conversion

The container uses an entrypoint script (pandoc-entrypoint) with the following Pandoc command:

pandoc my_document.md -o my_document.pdf \
    -V mainfont="Noto Serif" \
    -V monofont="Noto Mono" \
    -V geometry:margin=1in \
    --highlight-style=kate \
    --pdf-engine=xelatex \
    --toc -N

Explanation of Options:

  • -V mainfont="Noto Serif": Sets the main text font.
  • -V monofont="Noto Mono": Sets the monospaced font.
  • -V geometry:margin=1in: Sets document margins.
  • --highlight-style=kate: Applies syntax highlighting style.
  • --pdf-engine=xelatex: Uses XeLaTeX for better font and Unicode support.
  • --toc: Includes a table of contents.
  • -N: Numbers the sections.

Custom Usage

To customize the conversion process, you can:

  • Modify the Entrypoint Script: Adjust pandoc-entrypoint with your preferred options.
  • Run Pandoc Directly: Access the container's shell and run Pandoc commands manually.
docker run --rm -it -v $(pwd):/convert containercraft/pandoc /bin/bash

Once inside the container:

pandoc my_document.md -o my_document.pdf [your options]

Examples

Batch Conversion

Convert all Markdown files in a directory:

for file in *.md; do
  docker run --rm -v $(pwd):/convert containercraft/pandoc "$file"
done

Integration with CI/CD Pipelines

Use the container in automated workflows:

GitHub Actions Example:

jobs:
  build_pdf:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Convert Markdown to PDF
        run: |
          docker run --rm -v ${{ github.workspace }}:/convert containercraft/pandoc my_document.md

GitLab CI/CD Example:

pdf_generation:
  image: containercraft/pandoc
  script:
    - pandoc my_document.md -o my_document.pdf
  artifacts:
    paths:
      - my_document.pdf

Building the Docker Image

Clone the repository and build the image:

git clone https://github.com/yourusername/pandoc-docker.git
cd pandoc-docker
docker build --progress plain --tag containercraft/pandoc -f Dockerfile .

Customization

Modify the Dockerfile

  • Add Packages: Include additional LaTeX packages or fonts by modifying the APT_PKGS variable.
  • Change Entrypoint: Update pandoc-entrypoint to alter default Pandoc options.

Extend Functionality

  • Install Additional Tools: Install tools like tesseract-ocr for OCR capabilities.
  • Integrate Filters: Add Pandoc filters or Lua scripts for advanced processing.

Contributing

Contributions are welcome! Please:

  1. Fork the Repository: Click the "Fork" button on GitHub.
  2. Create a Feature Branch: git checkout -b feature/your-feature
  3. Commit Your Changes: git commit -m 'Add your feature'
  4. Push to the Branch: git push origin feature/your-feature
  5. Open a Pull Request: Describe your changes and submit.

Acknowledgments

###############################################################################
# Use:
# - docker build --progress plain --tag docker.io/containercraft/pandoc -f Dockerfile .
# - docker run --rm -it --name pandoc --hostname pandoc --volume .:/convert docker.io/containercraft/pandoc my_document.md
###############################################################################
FROM docker.io/library/ubuntu:24.04
LABEL tag="pandoc"
ENV DEVCONTAINER="pandoc"
SHELL ["/bin/bash", "-c", "-e"]
#################################################################################
# Environment Variables
# Set locale to en_US.UTF-8
ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US:en
ENV LC_ALL=en_US.UTF-8
# Disable timezone prompts
ENV TZ=UTC
# Disable package manager prompts
ENV DEBIAN_FRONTEND=noninteractive
# Set default bin directory for new packages
ENV BIN="/usr/local/bin"
# Set default binary install command
ENV INSTALL="install -m 755 -o root -g root"
# Common Dockerfile Container Build Functions
ENV apt_update="apt-get update"
ENV apt_install="TERM=linux DEBIAN_FRONTEND=noninteractive apt-get install -q --yes --no-install-recommends"
ENV apt_clean="apt-get clean && apt-get autoremove -y && apt-get purge -y --auto-remove"
ENV curl="/usr/bin/curl --silent --show-error --tlsv1.2 --location"
ENV dir_clean="\
rm -rf \
/var/lib/{apt,cache,log} \
/usr/share/{doc,man,locale} \
/var/cache/apt \
/root/.cache \
/var/tmp/* \
/tmp/* \
"
#################################################################################
# Base package and user configuration
#################################################################################
# Apt Packages
ARG APT_PKGS="\
locales \
pandoc \
texlive-latex-base \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-extra \
texlive-xetex \
texlive-luatex \
texlive-science \
fonts-lmodern \
fonts-noto-cjk \
fonts-noto-core \
fonts-noto-color-emoji \
"
# Install Base Packages and Remove Unnecessary Ones
RUN echo \
&& export TEST="pandoc --version" \
&& ${apt_update} \
&& bash -c "${apt_install} ${APT_PKGS}" \
&& locale-gen \
&& update-locale LANG=en_US.UTF-8 \
&& bash -c "${apt_clean}" \
&& ${dir_clean} \
&& ${TEST} \
&& true
#################################################################################
# Set the default command
#################################################################################
ADD ./rootfs /
WORKDIR /convert
ENTRYPOINT ["pandoc-entrypoint"]
#!/bin/bash -x
file_name="$(echo $1 | sed 's/\.md//')"
echo "INFO >> Converting file to pdf: ${file_name}.md > ${file_name}.pdf"
pandoc ${file_name}.md -o ${file_name}.pdf \
-V mainfont="Noto Serif" \
-V monofont="Noto Mono" \
-V geometry:margin=1in \
--highlight-style=kate \
--pdf-engine=xelatex \
--toc -N
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment