Multi-stage Docker builds for Python apps

Posted on

Smaller docker images are quicker to transfer and deploy. What’s more, by only including what is absolutely required you can avoid security vulnerabilities in packages that aren’t even needed.

There are many examples online for applications written in Go, which deploys as a single statically-linked binary. It’s not so obvious how to translate these examples to an application written in Python.

Images based on alpine Linux are the smallest, but are not compatible with manylinux1 wheels due to the use of musl libc instead of glibc. This isn’t an issue for applications written in pure Python but for extensions written in C/Cython a compiler (and associated toolchain) is required. Including GCC in addition to Python quickly balloons the alpine image from 59MiB to 216MiB.

The solution is to use multi-stage docker builds. The dependencies (and perhaps the application itself) are built and installed in a virtual environment within the first stage. The whole environment (but not all of the build tools) is then copied into a clean image. The environment needs to be activated in the second environment which is as simple as setting the PATH and VIRTUAL_ENV variables.

In the example below we install cffi which depends on libffi using Pipenv.

FROM alpine:3.9 AS build
WORKDIR /opt/app
# Install Python and external dependencies, including headers and GCC
RUN apk add --no-cache python3 python3-dev py3-pip libffi libffi-dev musl-dev gcc
# Install Pipenv
RUN pip3 install pipenv
# Create a virtual environment and activate it
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH" VIRTUAL_ENV="/opt/venv"
# Install dependencies into the virtual environment with Pipenv
COPY Pipfile Pipfile.lock /opt/app/
RUN pipenv install --deploy
FROM alpine:3.9
WORKDIR /opt/app
# Install Python and external runtime dependencies only
RUN apk add --no-cache python3 libffi
# Copy the virtual environment from the previous image
COPY --from=build /opt/venv /opt/venv
# Activate the virtual environment
ENV PATH="/opt/venv/bin:$PATH" VIRTUAL_ENV="/opt/venv"
# Copy your application
COPY . /opt/app/

In the build stage we install Python (python3 and python3-dev), pip (py3-pip), libffi (libffi and libffi-dev), GCC (gcc) and the musl development files (musl-dev). In the second stage we only need the runtime dependencies (python3 and libffi).

There isn’t much specific to Pipenv here. Note that only the Pipfile/Pipfile.lock files are copied in the first stage, rather than the entire application. This improves the speed of subsequent builds as the first stage only needs to be rebuilt if the dependencies change. Also note the use of --deploy, which ensures that Pipfile.lock is up to date and that the checksums for dependencies have not changed.