Skip to content

Using Docker on HTCondor

Prepared by Carlo Emilio

Within this post I wish to give a list on tips and tricks on how to take advantage of Docker when it comes to obtain high portability and consistency when executing code on CERN's clusters.

Docker offers huge advantages when it comes to reconstruct complex programming environments and tech stacks, especially when GPU/CUDA usage is required. Most of the fundamental elements and documentation of Docker usage on HTCondor are offered here, here, and here.

However, these advantages come at a price, as some of HTCondor's behaviours in the "Docker Universe" are rather counter-intuitive and are hardly learned only after hours and hours of mindless debugging and trial-and-error.

I hope to save at least an hour or two to some of you.

What is Docker exactly?

Docker is a tool that can package software into containers that are guaranteed to work reliably and consistently on every environment that has Docker installed.

The concept of containerized software is best understood if compared to standard virtualisation:

with standard virtual machines, you virtualise every component of a computer, starting from the bare metal, and on top of it you install the operating system and every software component required for having the exact software stack you need.

With Docker, instead, you have a standardized platform that takes the Linux kernel on which your host system is based on and directly shares it with images of operating systems and softwares, making them run into frozen containers. The result is the reproducibility of a virtual machine but with way better performances and almost no latency due to the virtualisation layer.

A great summary of what Docker is and its core concepts is given in this video.

Getting started with Docker development

Core concepts

  • A Dockerfile is a text file containing a list of commands a Docker environment must execute to build a defined Docker image.
  • A Docker Image is a frozen image of a complete OS + software stack, that guarantees portability and reproducibility. Each image is named with a tag system that by standard goes in the form of username/imagename:version_tag
  • A Container is a running instance of an image that is running a specific task. It can be its standard internal command or an external set of instructions/commands given by the user. After the Container has finished running the requested commands, it will be destroyed immediately.
  • A Docker Registry is a storage and content delivery system, holding named Docker images, available in different tagged versions. Imagine it like a sort of GitHub but for docker images. If you want to use Docker on HTCondor, you must create a free registry account on DockerHub and load your personal images on the DockerHub registry.

Installation

First thing first, you need to setup properly your Docker environment on your personal machine. If you are working on Linux, the process is very straightforward, as your operating system is already based on a Linux kernel, on which the Docker environment will operate. Otherwise, you will be required to install a Docker-made specific application for having a Linux kernel available on your machine.

For Linux users, be sure to also follow some of the post-installation steps such as setting up Docker for non-root users, as this will allow you to run Docker without casting a sudo every time, and will allow fundamental tools like the Docker extension on VScode to interact properly with the environment.

The VScode Docker plugin

With the VScode Docker plugin, development of Docker images becomes almost flawless. I highly recommend it over the standard CLI.

Pulling a ready-to-use Docker image

One thing you might want to try as your first Docker experience, is to pull a standard Tensorflow image and see if you can play some basic commands within it, by either starting a bash within the image with the command:

docker run -it --rm tensorflow/tensorflow bash

or by directly starting a basic python interpreter with

docker run -it --rm tensorflow/tensorflow python

You'll see that, having now a perfectly ready-to-use tensorflow environment with GPU support (assuming you have a GPU and you installed the proper plugins in your Docker environment), you can theoretically run every kind of Python script or Python-based application requiring tensorflow without worrying too much about the nature of the environment on which it's running.

In the context of surprise-rich environments like the ones on HTCondor, this translates to peace of mind.

Creating your own Docker Image

Scenario: using XSuite with GPU support.

We want to create a Docker image that is: * Ubuntu based; * Provided with a precise release of the CUDA runtime and tools; * With an essential Python environment having an exact list of packages;

A Dockerfile representing this requirements, paired with a requirements.txt file, might be the following:

# Dockerfile
# use an Ubuntu+CUDA official image as base
FROM nvidia/cuda:11.4.2-devel-ubuntu20.04

# setting locales
ENV TZ=Europe/Zurich

# installing the necessary Ubuntu and Python packages
# in a single RUN command, to reduce the number of layers
# in our image. (Every Dockerfile command creates a compressed layer)
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
    echo $TZ > /etc/timezone && \
    apt-get update && apt-get install -y --no-install-recommends \
    git \
    build-essential \
    cmake \
    python3 python3-dev python3-pip && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# copy requirements.txt file inside the image and
# pip install everything
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# set environment variables for cupy (will explain later why)
ENV CUPY_CACHE_DIR /tmp/cupy_cache
# requirements.txt
numpy==1.21.4
numba==0.53.1
scipy==1.7.1
pandas==1.3.4
tqdm==4.62.3
matplotlib==3.4.3
xsuite==0.2.1
cupy-cuda114

With the resulting image, we are guaranteed to have a fully functional Ubuntu OS with the CUDA11.4 runtime, hosting a Python environment with xsuite 0.2.1 on it. Regardless if we were to run it on our local machine or on a CentOS based work node.

This image can then be named, for example, username/myxsuite:v0.1, and then pushed on your personal DockerHub registry, so that you can pull it down whenever you want on any other machine or server.

Scenario: using your own weird compiled C++/Python library

Let's assume, as a purely hypothetical scenario, that you are rushing the development of a Python library with C++ bindings, made with Pybind11, that is not exactly fully compatible with the tech stack available on the server you want it to run.

Or, even if it is theoretically compatible, you are short on time to the point that you can't yet afford the full compatibility of the library for different OSes.

Or, most importantly, you want to be sure that your code will behave in the exact same way as much as possible in every machine.

You can then just replicate your development environment with a reduced set of instructions condensed in a single Dockerfile, where you will install all the necessary development packages and compilers, clone your personal repositories, and finally set up your code the way it has to be set up:

FROM nvidia/cuda:11.4.2-devel-ubuntu20.04

ENV TZ=Europe/Zurich

RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
    echo $TZ > /etc/timezone && \
    apt-get update && apt-get install -y --no-install-recommends \
    git \
    build-essential \
    cmake \
    libfftw3-dev \
    python3 python3-dev python3-pip && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# install requirements
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# install my henon map package
RUN git clone --recurse-submodules https://github.com/carlidel/henon_map_cpp && \
    pip install -e ./henon_map_cpp

Running Docker Containers on HTCondor

The moment you have a specific image you want to use for running your job, you can write your submit file accordingly by specifying the Docker Universe.

Refer directly to the specific exercise page on how to write your submit file and consider also reading the direct source in the HTCondor docs.

Why should I use Docker on HTCondor?

You should consider using Docker in your HTCondor Jobs if:

  • You want to use bleeding-edge tools that require modern software that is painful to setup on the standard CentOS environment of Lxplus/HTCondor.
  • You need an environment tailored to specific versions of the CUDA runtime (i.e. you want to use Cupy and you want to be sure that the CUDA runtime version installed is exactly the right one compatible with your Cupy installation).
  • You work with your own compiled code and you want to maintain a single build system, without worries about code portability on more conservative systems like CentOS.
  • You want to have your own standardized Python environment without having HTCondor grab your Miniconda environment directly from AFS.
  • You use complex and modern software stacks that are in highly active development (e.g. Tensorflow) that do require complicated setups with GPU environments and have a standard Docker release on their own.
  • You want to being able to test and run the exact same environment on both your local machine and the remote HTCondor node, for code and bug reproducibility.

You might want instead to stay in the Vanilla Universe if:

  • You are already using standard software stacks available on CVMFS (therefore, you are already working under the containerized philosophy with internal CERN tools).
  • Your code is very self-contained by itself (e.g. it only requires a standard compiler or an essential Python environment, with no complicated components or libraries whatsoever).
  • You do not require consistent CUDA environments.

Things to be very careful for

Caching of Docker images happens in background

If for any reason you do some modifications on your Docker images (e.g. you fix a bug in the internal code or you switch to an update package), be sure to modify the tag name of the new image following a versioning of some sort.

For example, if you change something in your own user/myxsuite:0.1 image, name the new one user/myxsuite:0.2 and then push it to your registry. Then modify all of your job submissions accordingly.

This is due to the fact that HTCondor's Docker daemon does some caching of images in the background, meaning that even if you updated an image of yours in the registry, it is not guaranteed that HTCondor will pull down the updated version!

If the names are different, instead, you will be 100% sure.

Non-root access in your own Docker container

When you run your Docker containers on your personal machine, by default you'll run them as a user with root privileges. This behaviour is different for obvious security reasons when you run the container on an HTCondor worker node (as you could use super-user powers to break the Docker environment and perform privilege-escalation on the Host machine).

If your container is well set, and you execute only non-root tasks inside it, this will have no repercussions on your workflow.

However, this must be taken into consideration when installing particular packages like Cupy.

Cupy, by default, saves its cache inside the ${HOME}/.cupy/kernel_cache directory. But, if the Docker image does not have a non-root user properly configured, this will cause a Cupy job to try to write the cache in /.cupy/kernel_cache directly, without having superuser privileges, leading to an error.

To solve this, you want to either configure a proper non-root user inside your Docker image as default entry point, or use the quick-and-dirty environment configuration:

# set environment variables for cupy 
ENV CUPY_CACHE_DIR /tmp/cupy_cache
so that Cupy will write in a position in which even a non-root user has permission for.

Scratch directory is mounted inside the container

In a Vanilla Universe job, the worker node is initialized with a scratch directory provided with the required input files, and AFS and EOS mounted as external filesystems (with the opportune limitations). The job is then performed while being located in this Initial Working Directory.

In a Docker Universe job, this behaviour is mostly replicated by having the scratch directory bind-mounted to the Docker container, as well as AFS and EOS. This implies that a Docker Universe job has reading capabilities towards the two filesystems and (don't use this! Unless extremely necessary) writing capabilities towards AFS.

Writing directly towards AFS within a job execution is an extremely bad practice in general, as it can lead to inconsistent behaviour when bandwidth issue happens with the worker nodes. In general, only standard I/O operations should be performed by using the standard submit files or (as shown in next section) by using a stage-in stage-out approach with xrdcp.

"Standard" EOS tool eos cp tool is not available anymore

As reported in the tutorial "Working with big files", when it comes to reading/writing consistent files from/to EOS you are supposed to use a stage-in stage-out approach with standard tools like eos cp.

Doing this differently, with either AFS or EOS is a guarantee for having a very bad time.

Unfortunately, your standard non-CERN-related Docker image is not equipped by default with the xrootd-client package (which contains the xrdcp cli tool), this implies that we need to install it ourselves.

Is this a big deal? Not at all! Considering the XSuite example, it's just a matter to add a repository and an extra package to our Dockerfile list of instructions (following the notes in here):

# Dockerfile
# Since this is regular Ubuntu20.04LTS, we have a stable
# installation procedure!
FROM nvidia/cuda:11.4.2-devel-ubuntu20.04

ENV TZ=Europe/Zurich
# Notice the addition of "curl"
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
    echo $TZ > /etc/timezone && \
    apt-get update && apt-get install -y --no-install-recommends \
    git \
    build-essential \
    cmake \
    python3 python3-dev python3-pip 
    curl && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# This is necessary for having kerberos installed without interactive
# messages (that can not recieve inputs being this a Docker build)
ENV DEBIAN_FRONTEND=noninteractive
# Adding the repo and installing xrootd-client...
echo "deb [arch=$(dpkg --print-architecture)] http://storage-ci.web.cern.ch/storage-ci/debian/xrootd/ focal stable-4.12.x" | tee -a /etc/apt/sources.list.d/cerneos-client.list > /dev/null && \
    curl -sL http://storage-ci.web.cern.ch/storage-ci/storageci.key | apt-key add - && \
    apt-get update && apt-get install -y --no-install-recommends \
    krb5-user krb5-config \
    xrootd-client && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
ENV CUPY_CACHE_DIR /tmp/cupy_cache

With this addition, the resulting Docker image will be provided with the xrdcp command line tool (on which eos cp is essentially based on).

Whenever we want to stage-out a file in an EOS directory, we can do so by using the command:

xrdcp  my_out_file.out root://eosuser.cern.ch//eos/my/eos/directory
likewise for the opposite.

This is possible due to the fact that, in our HTCondor job, the Docker container gets via black magic our internal credential token and gets the environment variable KRBCCNAME automatically set.

By using this procedure, and avoiding direct contacts or transfers from and to other places like AFS, you can rely on stable I/O behaviours and top transfer speeds (~200MB/s).

Very very rarely, HTCondor's Docker Daemon goes on strike

It happened to me one time that an HTCondor job crashed in its initial stage. Discussing with IT the event, it turned out there was a very limited and temporary problem with the Docker daemon on the worker node.

In order to protect your jobs from this very rare kind of problems, it is best practice to add the following extra lines to your submission files:

on_exit_remove          = (ExitBySignal == False) && (ExitCode == 0)
max_retries             = 3
requirements = Machine =!= LastRemoteHost
queue

Going one step further: unpacking Docker in CVMFS

If you have developed a relevant production-ready Docker image that you want to share with your CERN colleague, so that they can use it with maximum efficiency within CERN's infrastructure, you can consider distributing it via the unpacked.cern.ch utility offered within the CVMFS system.

Publishing a Docker image in CVMFS gives some specific advantages of fast caching and availability and, most importantly, allows its utilization with the Singularity utility (imagine a Docker daemon, but with scientific computing in mind and with more secuirty concerns solved).

While you have strong limitations in executing Docker containers on lxplus (i.e. for security reasons, you can't, as you could use nasty container dark magic to privilege-escalate inside the host machine. You can only make a worker node execute your code in strong non-root mode). Singularity allows you to execute them everywhere and benefit the advantages of containerization, without you being a threat to everyone's security.

When it comes to execute Singularity on a HTCondor job, a Vanilla Universe job is everything you need.

As this service is still in its early release phase, you might want to reach IT directly for support if you are interest in this kind of service.