Docker is a tech­nol­o­gy for container-based vir­tu­al­iza­tion of software ap­pli­ca­tions. Docker’s main­stream container-based approach has trans­formed ap­pli­ca­tion de­vel­op­ment in recent years. It has affected all the different areas of de­vel­op­ment, including how ap­pli­ca­tions and com­po­nents are built, how software services are dis­trib­uted, and moving them from de­vel­op­ment to pro­duc­tion. With Docker, all these processes run dif­fer­ent­ly than they did before.

But more has changed than just the de­vel­op­ment processes – the software ar­chi­tec­ture has, too. It has moved away from mono­lith­ic overall solutions and toward clusters of coupled light­weight “mi­croser­vices”. This has in turn rendered the resulting overall systems more complex. In recent years, software like Ku­ber­netes has become es­tab­lished for managing multi-container ap­pli­ca­tions.

The de­vel­op­ment of container-based vir­tu­al­iza­tion is far from over, so it remains an exciting field. In this article, we will explain how Docker works as an un­der­ly­ing tech­nol­o­gy. Fur­ther­more, we will look at what motivated the de­vel­op­ment of Docker.

Note

The name “Docker” has several meanings. It is used as a synonym for the software itself, to designate the open source project on which it is based, and a U.S. company that operates various products and services com­mer­cial­ly.

A brief history of Docker

The software orig­i­nal­ly released under the name “Docker” was built based on Linux Container (LXC) tech­nol­o­gy. LXC was later replaced by the Docker’s own lib­con­tain­er. New software com­po­nents have been added as Docker has continued to grow and become the standard for container-based vir­tu­al­iza­tion. Things like con­tain­erd, a container runtime with the default im­ple­men­ta­tion runC, have emerged from Docker’s de­vel­op­ment. Today, both projects are managed by the Cloud Native Computing Foun­da­tion (CNCF) and the Open Container Ini­tia­tive (OCI).

In addition to the Docker team, leading tech­nol­o­gy companies such as Cisco, Google, Huawei, IBM, Microsoft, and Red Hat are involved in the de­vel­op­ment of Docker and related tech­nolo­gies. A more recent de­vel­op­ment is that Windows is now also used as a native en­vi­ron­ment for Docker con­tain­ers in addition to the Linux kernel. Here are some of the major mile­stones in Docker’s evo­lu­tion­ary history:

Year Docker de­vel­op­ment mile­stones
2007 cgroups tech­nol­o­gy in­te­grat­ed into Linux kernel
2008 LXC released; builds on cgroups and Linux name­spaces like Docker did later on
2013 Docker released as open source
2014 Docker available on Amazon EC2
2015 Ku­ber­netes released
2016 Docker available on Windows 10 Pro via Hyper-V
2019 Docker available on Windows Home via WSL2
Tip

At the end of the article, we will go into detail about what motivated the de­vel­op­ment of Docker and similar vir­tu­al­iza­tion tech­nolo­gies.

What is Docker?

Docker’s core func­tion­al­i­ty is container vir­tu­al­iza­tion of ap­pli­ca­tions. This is in contrast to vir­tu­al­iza­tion with virtual machines (VM). With Docker, the ap­pli­ca­tion code, including all de­pen­den­cies, is packed into an “image”. The Docker software runs the packaged ap­pli­ca­tion in a Docker container. Images can be moved between systems and run on any system running Docker.

Quote

Con­tain­ers are a stan­dard­ized unit of software that allows de­vel­op­ers to isolate their app from its en­vi­ron­ment [...]” - Quote from a Docker developer, source: https://www.Docker.com/why-Docker

As is the case with virtual machine (VM) de­ploy­ment, a primary focus of Docker con­tain­ers is to isolate the ap­pli­ca­tion that is running. Unlike VMs, however, a complete operating system is not vir­tu­al­ized. Instead, Docker allocates certain operating system and hardware resources to each container. Any number of con­tain­ers can be created from a Docker image and operated in parallel. This is how scalable cloud services are im­ple­ment­ed.

Even though we talk about Docker as one piece of software, it is actually multiple software com­po­nents that com­mu­ni­cate via the Docker Engine API. Fur­ther­more, a handful of special Docker objects are used, such as the afore­men­tioned images and con­tain­ers. Docker-specific workflows are composed of the software com­po­nents and Docker objects. Let’s take a look at how they interact in detail.

Docker software

The basis for Docker software is the “Docker Engine”. This is mainly used to manage and control the con­tain­ers and their un­der­ly­ing images. Special tools are used for func­tion­al­i­ties beyond that. These are mainly needed for managing ap­pli­ca­tions that consist of groups of con­tain­ers.

Docker Engine

Docker Engine runs on a local system or server and consists of two com­po­nents:

  1. The Docker daemon (Dockerd). This is always running in the back­ground and listens for Docker Engine API requests. Dockerd responds to ap­pro­pri­ate commands to manage Docker con­tain­ers and other Docker objects.
  2. The Docker client (Docker): This is a command line program. The Docker client is used to control the Docker Engine and provides commands for creating and building Docker con­tain­ers, as well as creating, obtaining, and ver­sion­ing Docker images.

Docker Engine API

The Docker Engine API is a REST API. It in­ter­faces with the Docker daemon. Official “software de­vel­op­ment kits” (SKDs) for Go and Python are available for in­te­grat­ing the Docker Engine API into software projects. Similar libraries also exist for more than a dozen other pro­gram­ming languages. You access the API with the command line using the Docker command. Fur­ther­more, you can access the API directly using cURL or similar tools.

Docker tools

When you use virtual machines, you often use systems con­sist­ing of several software com­po­nents. In contrast, container vir­tu­al­iza­tion with Docker favors clusters of loosely coupled mi­croser­vices. These are suitable for dis­trib­uted cloud solutions that offer a high degree of mod­u­lar­i­ty and high avail­abil­i­ty. However, these kinds of systems are becoming very complex quickly. To manage con­tainer­ized ap­pli­ca­tions ef­fi­cient­ly, you use special software tools known as “or­ches­tra­tors”.

Docker Swarm and Docker Compose are two official Docker tools that are available for or­ches­trat­ing container clusters. The “Docker swarm” command can be used to combine multiple Docker Engines into one virtual engine. The in­di­vid­ual engines can then be operated across multiple systems and in­fra­struc­tures. The “Docker compose” command is used to create multi-container ap­pli­ca­tions known as “stacks”.

The Ku­ber­netes or­ches­tra­tor, orig­i­nal­ly developed by Google, is more user-friendly than Swarm and Compose. It has es­tab­lished itself as the standard and is widely used by the industry. Hosting companies and other “Software as a Service” (SaaS) and “Platform as a Service” (PaaS) solution providers are in­creas­ing­ly using Ku­ber­netes as their un­der­ly­ing in­fra­struc­ture.

Docker objects

Workflows in the Docker ecosystem are a result of how Docker objects interact with each other. They are managed by com­mu­ni­cat­ing with the Docker Engine API. Let’s take a look at each type of object in detail.

Docker image

A Docker image is a read-only template for creating one or more identical con­tain­ers. Docker images are ef­fec­tive­ly the seeds of the system; they are used to bundle and deliver ap­pli­ca­tions.

Various repos­i­to­ries are used to share Docker images. There are both public and private repos­i­to­ries. At the time of writing, there are more than five million different images available for download on the popular “Docker Hub”. The Docker commands “Docker pull” and “Docker push” are used to download an image from a repos­i­to­ry or share it there.

Docker images are built in layers. Each layer rep­re­sents a specific change to the image. This results in a con­tin­u­ous ver­sion­ing of the images, which allows a rollback to a previous state. An existing image can be used as a basis to create a new image.

Dock­er­file

A Dock­er­file is a text file that describes the structure of a Docker image. A Dock­er­file is similar to a batch pro­cess­ing script; the file contains commands that describe an image. When you run a Dock­er­file, the commands are processed one after the other. Each command creates a new layer in the Docker image. So you can also think of a Dock­er­file as a kind of recipe used as the basis for creating an image.

Docker container

Now let’s move on to the main concept in the Docker universe: Docker con­tain­ers. While a Docker image is an inert template, a Docker container is an active, running instance of an image. A Docker image exists locally in a single copy and only takes up a bit of storage space. In contrast, multiple Docker con­tain­ers can be created from the same image and run in parallel.

Each Docker container consumes a certain amount of system resources for it to run, such as CPU usage, RAM, network in­ter­faces, etc. A Docker container can be created, started, stopped, and destroyed. You can also save the state of a running container as a new image.

Docker Volume

As we have seen, you create a running Docker container from a non-mod­i­fi­able image. But what about data that is used within the container and needs to be retained beyond its service life? Docker volumes are used for this use case. A Docker volume exists outside of a specific container. So several con­tain­ers can share one volume. The data contained in the volume is stored on the host’s file system. This means that a Docker volume is like a shared folder on a virtual machine.

How does Docker work?

The basic working principle of Docker operates similarly to the pre­vi­ous­ly developed vir­tu­al­iza­tion tech­nol­o­gy LXC: Both build on the Linux kernel and perform container-based vir­tu­al­iza­tion. Both Docker and LXC combine two con­tra­dic­to­ry goals:

  1. Running con­tain­ers share the same Linux kernel, making them more light­weight than virtual machines.
  2. Running con­tain­ers are isolated from each other and have access only to a limited amount of system resources.

Both Docker and LXC make use of “kernel name­spaces” and “control groups” to achieve these goals. Let’s take a look at how this works in detail.

Linux kernel

The Linux kernel is the core component of the GNU/Linux open source operating system. The kernel manages the hardware and controls processes. When running Docker outside of Linux, a hy­per­vi­sor or a virtual machine is needed to provide the func­tion­al­i­ty of the Linux kernel. On macOS, xhyve, a de­riv­a­tive of the BSD hy­per­vi­sor bhyve, is used. On Windows 10, Docker uses the Hyper-V hy­per­vi­sor.

Kernel name­spaces

Name­spaces are a feature of the Linux kernel. They partition kernel resources and thus ensure processes remain separate from each other. A namespace process can only see kernel resources of that same namespace. Here is an overview of the name­spaces used in Docker:

Namespace De­scrip­tion Ex­pla­na­tion
UTS System iden­ti­fi­ca­tion Assign con­tain­ers their own host and domain names
PID Process IDs Each container uses its own namespace for process IDs; PIDs from other con­tain­ers are not visible; thus, two processes in different con­tain­ers can use the same PID without conflict.
IPC Inter-process com­mu­ni­ca­tion IPC name­spaces isolate processes in one container so that they cannot com­mu­ni­cate with processes in other con­tain­ers.
NET Network resources Assign separate network resources such as IP addresses or routing tables to a container
MNT Mount points of the file system Restricts the host’s file system to a narrowly defined section from the container’s point of view

Control groups

Control groups, usually ab­bre­vi­at­ed as cgroups, are used to organize Linux processes hi­er­ar­chi­cal­ly. A process (or group of processes) is allocated a limited amount of system resources. This includes RAM, CPU cores, mass storage and (virtual) network devices. While name­spaces isolate processes from each other, control groups limit access to system resources. This ensures the overall system remains func­tion­al when operating multiple con­tain­ers.

What are the ad­van­tages of Docker?

Let’s take a look at the history of software de­vel­op­ment to un­der­stand the benefits of Docker. How is and was software built, delivered, and run? What parts of the process have changed fun­da­men­tal­ly? Software is the coun­ter­part to hardware, the physical computer. Without software, the computer is just a lump of matter. While hardware is fixed and un­change­able, software can be recreated and cus­tomized. The in­ter­ac­tion of the two levels results in this wondrous digital world.

Software on a physical machine

Tra­di­tion­al­ly, software has been created to be run on a physical machine. But we quickly hit a wall when we do this. Software can only run on certain hardware, for example, it requires a certain processor.

Fur­ther­more, more complex software usually does not run com­plete­ly au­tonomous­ly, but is in­te­grat­ed into a software ecosystem. This includes an operating system, libraries, and de­pen­den­cies. The right versions of all the com­po­nents must be available for them to interact correctly. There is also a con­fig­u­ra­tion, which describes how the in­di­vid­ual com­po­nents are linked to each other.

If you want to run several ap­pli­ca­tions on one machine in parallel, version conflicts quickly arise. An ap­pli­ca­tion may require a version of a component that is in­com­pat­i­ble with another ap­pli­ca­tion. In the worst-case scenario, each ap­pli­ca­tion would have to run on its own physical machine. What is true is that physical machines are expensive and cannot be scaled easily. So if an ap­pli­ca­tion’s resource re­quire­ments grow, it may need to be migrated to a new physical machine.

Another problem arises from the fact that software under de­vel­op­ment is used in different en­vi­ron­ments. A developer writes code on the local system and runs it there for testing. The ap­pli­ca­tion goes through several test stages before going into pro­duc­tion, including a test en­vi­ron­ment for quality assurance or a staging en­vi­ron­ment for testing by the product team.

The different en­vi­ron­ments often exist on different physical machines. There are almost always dif­fer­ences in the operating system, library, and con­fig­u­ra­tion versions. How can you reconcile all of them? Because if the en­vi­ron­ments differ from each other, tests lose their meaning. Fur­ther­more, a system must be replaced if it fails. How can you ensure con­sis­ten­cy? It is hard to deal with these problems on physical machines.

Virtual machines as a step in the right direction

The afore­men­tioned problems related to physical machines led to the rise in the pop­u­lar­i­ty of virtual machines (VMs). The basic idea is to integrate a layer between the hardware and operating system or the host operating system and guest operating systems. A VM uncouples the ap­pli­ca­tion en­vi­ron­ment from the un­der­ly­ing hardware. The specific com­bi­na­tion of an operating system, ap­pli­ca­tion, libraries, and con­fig­u­ra­tion can be re­pro­duced from an image. In addition to com­plete­ly isolating an ap­pli­ca­tion, this allows de­vel­op­ers to bundle several ap­pli­ca­tions in an “appliance”.

VM images can be moved between physical machines, and multiple vir­tu­al­ized operating systems can be run in parallel. This ensures the ap­pli­ca­tion is scalable. However, operating system vir­tu­al­iza­tion is resource intensive and is overkill for simple use cases.

The ad­van­tages of container vir­tu­al­iza­tion with Docker

The images used in container vir­tu­al­iza­tion do not need an operating system. Container vir­tu­al­iza­tion is more light­weight and provides nearly as much isolation as VMs. A container image combines the ap­pli­ca­tion code with all the required de­pen­den­cies and the con­fig­u­ra­tion. Images are portable between systems, and the con­tain­ers built on them can be re­pro­duced. Con­tain­ers can be used in various en­vi­ron­ments, such as de­vel­op­ment, pro­duc­tion, testing, and staging. Layer and image version control also provide a good deal of mod­u­lar­i­ty.

Let’s summarize the key benefits of Docker-based vir­tu­al­iza­tion of ap­pli­ca­tions as opposed to using a VM. A Docker container:

  • does not contain its own operating system and simulated hardware
  • shares an operating system kernel with other con­tain­ers hosted on the same system
  • is light­weight and compact in terms of resource usage compared to a VM-based ap­pli­ca­tion
  • starts up faster than a virtual machine
  • can be run in multiple instances of the same image in parallel
  • can be used together with other container-based services via or­ches­tra­tion
  • is ideally suited for local de­vel­op­ment
Go to Main Menu