If you use Docker to manage your infrastructure, you need to put time into thinking about how to build it. Here is a quick rundown of the things to keep in mind:
Smaller is Better
By itself, Docker makes great use of filesystem space. Because each container only holds the changes from the images, a little bit of image bloat doesn’t directly impact the server adversely. However, this doesn’t mean that we shouldn’t worry about bloat at all. Not only should we not waste space without reason, images that are too big cause other problems that you need to be aware of.
The most important consideration is attack surface. Every program that you have on your image is a potential hole for a hacker to exploit. Keeping unneeded software off of your container is the easiest first step to maintaining secure containers.
However, in more general terms, everything on your container will wind up needing maintenance at some point. The more software you have installed, the more maintenance you will be subject to. You might think, “If I don’t use it, how does it cause maintenance issues?” Well, most software is written by a software team, not just a single individual. I have noticed that, if something is available to use, some member of the team will eventually find an excuse to use it. So, the more software that you leave on your container, the more tools your team will eventually make use of. Additionally, those team members may not even remember to document which operating system tools they are using. Therefore, it is best to start off with the most minimal set of tools you can, and then only add when absolutely necessary. Then your team will think twice before adding something, and— more importantly—it will be added explicitly to your Dockerfile, which makes it easier to spot.
Use Alpine Builds
Alpine Linux started out as a Linux distribution for embedded systems. It was based off the original Linux Router Project which turned so many home routers into full-blown Linux-based servers. Originally small enough to fit on a floppy drive, Alpine Linux has expanded in scope, but still focuses on the original goal of making an extremely scaled-down version of Linux that can fit almost anywhere.
For Docker images, you rarely care about the rest of the operating system. You don’t need most of the operating system tools. You don’t need to log in to your box remotely. You don’t need system logging. All that is taken care of by your container host. You just need the bare minimum operating system to get your app up and running. For this, Alpine Linux is just about perfect. The current Alpine build weighs in at under 6 megabytes of disk space. You can add what you need from there using their “apk” package management system.
Note that, in order to trim down size, Alpine works a little differently from most distributions. Rather than having a bunch of separate commands installed, Alpine has a single binary, called “busybox,” which operates all of the core functionality. All of the command names are there, but they are all symlinked to busybox, which actually performs the task (it uses the name of the binary you invoked to know which task to run).
One issue with Alpine Linux, however, is that many of the defaults are still geared towards running on extremely small machines, and may not be appropriate for your system. For instance, the default stack size on Alpine Linux is just 80kB (it is normally 2MB on normal Linux machines 250× the size). This means that, unless you change this value, recursive algorithms may break.
In a similar fashion, busybox is not a perfect replica of your system toolchain and the C library doesn’t have all of the bells and whistles of GNU’s C library. But, nonetheless, usually Alpine Linux gives you everything you need and nothing you don’t.
Additionally, many of the standard Docker images have both a regular Debian version and an Alpine version. For instance, for PostgreSQL, the regular “postgres:13″ docker image weighs in at 314MB. However, the Alpine version, “postgres:13-alpine” is only half of that. Even more dramatic, “ruby:2.7.2″ is 842MB, while “ruby:2.7.2-alpine” is only around 52MB.
You can build your own Docker-based images by simply starting your Dockerfile with “FROM alpine.”
Starting from Scratch by Using Static Binaries
You can go even further than Alpine Linux builds. You can install applications that have no operating system whatsoever. Remember that, in Docker, the operating system kernel is supplied by the host operating system. Therefore, you don’t need to worry about that part. If your app doesn’t touch any other service, if it never calls out to the shell, runs another command, etc., there’s really no reason for any piece of the operating system to be on your container at all.
In order to do this, you need to use a special base image in your Dockerfile, the “scratch” image. If you start your Dockerfile with “FROM scratch”, it will start you with a completely empty image. Then, if your entire application is self-contained, you can literally just copy in your application to the root of the filesystem.
In order to get this to work, your application must literally have no dependencies on anything else. This means that it needs to be “statically linked.” That is, instead of using system libraries already installed on the computer, the application will literally have all of those libraries placed directly into the application file.
Most compiled languages provide some sort of support for this. Usually it is done by adjusting the options that are sent to the linker at build time. This does not work for most scripting languages, at least not without a lot of extra work. In any case, it is possible to get extremely tiny containers that literally have only the application file by itself.
Keep Up with Updates
You should always check to make sure your Dockerfile is using the latest version of its base image. However, I don’t like to use the “latest” tag to do this. That tag reduces the visibility of what exactly is on your image. Instead, check regularly to make sure that the version you are building from really is the latest.
If you need to upgrade a specific package, you can always do so with the distribution’s package manager.
Scan Your Images
Docker has added a new tool called “docker scan” which will scan your image for known vulnerabilities. While this tool doesn’t replace standard security practices, it is a great failsafe that can be easily implemented in your deployment toolchain. You can schedule automated scans of your Docker images and be alerted immediately about newly discovered security issues.
Only One Task
Docker images should be geared for one task and one task only. The goal of a container image is to provide only the minimal context needed to run one application. Trying to run a web service and a caching server out of the same container goes against the standard thinking about containers. Do one thing, do it well, and include only the things you need to do that task.
Don’t Overuse Layers
If you have a complicated setup, it is best to copy an install script onto the image and then run that script. This gets all of the changes into a single layer. If you have a Dockerfile with countless RUN commands, remember that every RUN command gets its own filesystem layer. By combining all of your commands into a script, you can get Docker to build a whole set of changes into a single layer.
Alternatively, you can use multi-stage builds in your Dockerfile. A multi-stage build allows you to build inside one container, then copy everything all at once to the final container. However, the techniques for doing this are beyond the scope of this article.
Keep them Ephemeral
Remember that the whole idea of Docker containers is that containers can be created, destroyed, brought up, taken down, all with a minimum of fuss. Be sure that you design your container applications and systems so that all this building and tearing down does not adversely affect your app.
Make them Configurable
Finally, you should make your images configurable via environment variables. Don’t hardcode configurations or secrets into your containers. It is best for those to be managed separately. The image contains the code, the environment variables contain the configuration.
For instance, you shouldn’t have separate images for your development, staging, and production environments. Instead, the same image should go through these different processes. The only thing that should change is the configuration, which should be managed through the environment. That way, you know that the very code that you are running in production is the exact same code that you ran in testing.
Add an Init Process if Needed
If, for some reason, you are getting a lot of zombie child processes stacking up, you may need to add an init process to your container. Zombies occur when processes don’t properly clean up when child processes stop. On ordinary Linux systems, the “init” program cleans up after everybody. However, on Docker, there is no init process. You can add one if needed, though. “Dumb-init” is a program that you can install and use on your image if this is a problem for you. Find out more information about dumb-init here.
As with anything, there are good and bad ways to use tools. Fortunately, with Docker, it is hard to go too wrong. Docker naturally enforces at least moderately good behavior on its users. However, the more you know about your tools, the better they work for you. Following these guidelines will help you master the process of Docker image creation.
You may also want to look some of Jonathan Bartlett’s other posts on how to use Docker:
Part 1: How the Docker revolution will change your programming, Since 2013, Docker (an operating system inside your current operating system) has grown rapidly in popularity. Docker is a “container” system that wraps the application and the operating system into a single bundle that can be easily deployed anywhere. In this series, we are looking under the hood at Docker, a infrastructure management tool that has rapidly grown in popularity over the last decade.
Part 2: A peek under the covers at the new Docker technology The many advances that enable Docker significantly reduce a system’s overhead. Docker, over and above the basic container technology, also provides a well-defined system of container management.
Part 3: Working with Docker: An Interactive Tutorial Docker gives development teams more reliable, repeatable, and testable systems, deployed at massive scale with the click of a button. In this installment,, we look at the commands needed to start and run Docker, beginning with containers.
Part 4: Docker—An introduction to container orchestration This tutorial will focus on Docker’s swarm because it comes installed with Docker and uses the same standard Docker files.