Docker 101 - Containerizing our first application.
August
30th,
2017
·
Estimated reading time: 35 minutes.
In this post we will learn what Docker is, how to use it, when it is appropiate to use it and we’ll make our first docker container based on a real-world production scenario.
Containers can be conceptually thought of as an advanced version of chroot or a lighter alternative to virtualization.
A container is an isolated user-level instance of the OS. From the point of view of the applications running inside these instances the container behaves exactly like a real computer, and the application will have access only to the resources that are explicitly assigned to it.
Containers share the running kernel and general system calls of the host OS, which means that it is significantly less demanding than traditional virtualization resource-wise. The downside to this is that you can’t have an application running in a container with a different OS than the host.
Pros and cons of using containers.
Using containers has a few inmediate advantages:
We can package applications along with their dependencies, creating a portable version of the app and eliminating the dreaded “but it works on my machine“-scenario. This helps eliminate the proverbial walls between the Operations departament and Development’s.
It is significantly lighter than traditional OS virtualization, enabling a higher compute density in the same hardware.
Reduced the effort required to maintain the runtime environment, along with its complexity. Since the applications can be treated as an opaque self-contained bundle, the package and the behaviour are the same in testing and in production.
Since the infrastructure is declared as code, we benefit from the advantages of IaC: Infrastructure versioning, code as reliable documentation and straightforward deployment of new environments.
Of course, we must mention the disadvantages too:
Containers run on top of an additional abstraction layer compared to bare-metal.
Containers share the running kernel with the host. A bug/glitch in the running kernel affects all the running containers.
Docker is a software product that implements container support. Thanks to Docker, we can package applications along with their dependencies and libraries and execute them in an isolated environment.
What containers, and by extension Docker, are not.
Containers are not a general-purpose solution to the application packaging problem. While containerizing an application has many advantages, it will often not be a straightforward process and the time and effort invested might prove too high.
To use an obvious example, would it make sense to distribute Skype in a container-like package? No.
Skype is a monolithic application, and splitting it into separate container-like components makes no practical sense.
It would need to have access to the host OS’s audio server. On PulseAudio you could work around it by establising a network audio sink on the host, but we would have to run an additional audio server.
A container can not natively communicate with the display server. There are workarounds for this but they involve either X11 forwarding through the network or sharing the X11 socket and breaking the isolation paradigm
If Skype would benefit from the advantages of containerization (ease of deployment, isolation, being incorporated into a continuous integration pipeline) then these tradeoffs might be worthwile.
Installing Docker
We are going to be using Docker 17.06.1-ce on Ubuntu 17.04.
First step is making sure we have the tools to add HTTPS repositories to our package sources.
Let’s add Docker’s official GPG key so we can verify packages, afterwards we double-check that the GPG key matches.
There are three Docker release rings: stable, edge and test. Unless you really need functionality or fixes not present in the stable ring, you should stick to the stable production releases.
If you would like to switch release rings, add the word edge or test after the word stable in the previous command.
Now, let’s refresh the package list and install docker.
Docker containers currently have to be managed by the root user (or the equivalent in your chosen platform). Due to Docker’s design and capabilities, there is no way around this requirement.
There are plans to eventually remove this restriction, but at the time of writing this post there is no actual roadmap on this matter.
As a result of the above concerns, only trusted users should be allowed to have access to the Docker daemon or any of the related utilities.
Optional but recommended post-install steps.
Removing the need for sudo.
If you don’t want to use sudo when you use the docker command, create a Unix group called docker and add users to it. When the docker daemon starts, it makes its socket read/writable by the docker group.
Warning: The docker group grants privileges equivalent to the root user. For details on how this impacts security in your system, see Docker Daemon Attack Surface.
Creating the docker group and adding your user to it is a straightforward process.
You will need to log out of the session completely, and log back in.
Configure the Docker daemon to start on boot.
Auto-completion of commands for Docker.
Docker contains a variety of commands and the image and containers are identified by their IDs or, if we manually specify so, their friendly names. It is possible to integrate shell auto-completion with Docker commands and parameters, making our job a little easier.
We are going to implement this for Bash on Ubuntu 16.04.
First of all, we need the bash-completion package. In the rare case that it’s not already installed in our distribution, we can install it via the package manager.
Last step is to download the completion script and place it in /etc/bash_completion.d/
Every new instance of bash will make use of the newly extended completion funtionality.
Running our very first container, and a look behind the scenes.
Hello World.
To test our Docker installation we are going to run a container supplied by Docker, its only function being displaying a welcome message.
It results in the following output in the terminal.
To be able to understand what just happened, we need to take a step back and examine the process of creating a container.
Docker workflow and container lifecycle.
A container is composed of two layers: the base isolated environment in which the application or service will be executed, image in Docker parlance; and the application or service themselves that we are going to run. Both of these are specified in a Dockerfile, which is nothing more than a plain-text file named Dockerfile.
For example, let’s examine the hello-world Dockerfile.
All instructions in a Dockerfile refer to the process of composing the parent image, except the CMD instruction which specifies the command that will be executed when the container is run.
FROM is a reserved keyword that establishes the environment in which the image will be built on. scratch is a minimal Docker environment, but we could specify another Docker image as a base such as an Ubuntu or CentOS environment.
COPY copies the specified file to the destination, in this case hello to the root of the filesystem.
CMD executes a binary called hello when the container is run.
Let’s examine a more complex example, a Python 3.4.3 only runtime environment based on Debian Jessie.
Be careful to not confuse RUN with CMD. RUN is executed when building the parent image, and CMD is the instruction that is ran when launching the container. Think of CMD as the command that launches your service, and RUN executes image-building steps.
If we build this image, Docker will include everything in the current directory and then execute these instructions.
After the process finishes, we can create a container based on the image and run it.
As stated previously, container storage is volatile by definition. That is, the changes are not saved and the state is discarded on shutdown. If we were to run a process that needs persistent storage, like a database server, this would not be ideal to say the least.
Docker offers a solution for this problem: mounts.
Mounts
There are three kinds of mounts, two of which we can use to offer storage that will persist outside of the container.
Volumes: are managed by docker, exist outside of the application’s lifecycle and are portable. Good for sharing data among containers, and the preferred way of storing data. Volumes are stored on the host filesystem, under /var/lib/docker/volumes.
Bind mounts are traditional folders on the host machine that are available inside the container on a specific path. Good for sharing data between the host system and the container.
tmpfs mounts: These are RAM filesystems that behave exactly like traditional tmpfs mounts and will be discarded on shutdown. Should only be used for temporary files, and it offers the speed advantage of being stored entirely on RAM.
Criteria for choosing between volumes, bind mounts or tmpfs.
Note: You may encounter the -v or –volume syntax on older documentation referring to data storage in Docker. This syntax has been deprecated, and it’s oficially replaced by the –mount syntax which we will use throughout this post. An extended explanation is available on the official documentation.
Volumes can be shared between containers, are portable and are easier to back up and restore. They are able to be populated with the appropiate contents on deployment, and can be marked read-only. A common example would be hosting a website, and allowing the user to upload modifications via FTP. By sharing the volume, the apache2 process is able to read /var/www from the same place that the FTP server is able to write; at the same time we are keeping everything isolated.
On a Dockerfile you would declare a Volume with the following syntax: VOLUME /var/log.
Bind mounts work best in a scenario in which we need to preserve data from the container. For example, we need to share source code with the container, keep the container’s /etc/resolf.conf synced with the host’s or keep API secrets up to date inside the container. It could also be used to write the container’s logs into specific folders on the host.
Bind mounts cannot be declared from a Dockerfile, since the directories specified are not portable by definition and might not exist at runtime on the host machine. To use a bind mount, you must launch the container with the –mount parameter as we will see in the example at the end of this post.
tmpfs mounts are best used when the application generates non-persistent state data, and we need to eliminate the disk as a potential bottleneck when storing and accesing these files.
Networking.
Although we can deploy our own custom networks, Docker installs 3 networks automatically: bridge which is the default network for all containers and is isolated from the host, none which means that the container will have no network connectivity and host will bind the container to the host’s network stack.
host networking means that the container will be reachable from outside the host, and that the network is not isolated. For instance, a web server that runs in port 80 inside the container is automatically reachable through the host’s port 80.
Docker Engine natively supports two kinds of networks: A bridge network, limited to a single host; and an overlay network, which can span across multiple hosts.
In this post we are going to stick to the default bridge network, to deploy our test application in a isolated form.
Our company, WidgetMaker Inc, ships embedded devices that run a custom Linux distribution. To deploy our application code to these devices, we use a legacy app that was custom made for our needs a few years ago. This app requires:
Ubuntu 12.04 with a specific set of libraries and versions to function properly.
The application needs hardware access to a serial device to flash the generated binaries to our physical widgets.
It needs network access to show a web control panel, through which the application is controlled. Security was not a concern at the time of the design, which means a malicious actor could easily access the control panel and interfere with the process.
It will not work properly if multiple instances of the application are run at the same time, meaning that we can only deploy one widget at the time. The current workaround to perform this process in paralell is to have a few physical machines replicating the required configuration, and to manually operate them.
Retooling this app, while technically feasible, would be impractical for both budget and time-to-market reasons and our current budget is better spent elsewhere.
Integrating this application into the general DevOps pipeline would allow us to save time and money:
The specific environment the app needs is now inmutable and easily recreated. If there is a sudden hardware failure, we are minutes away from having a replacement ready to work.
We can isolate the network, and only allow access following secure criteria.
We could automate the deploying process: Instead of a human operator triggering the flashing procedures, we could trigger the flashing process when the human plugs the widget into the serial adapter and perform an action to notify the user when the deployment is complete.
We can add automated unit testing to the deployment procedure: Through the use of tools like Jenkins we can make sure that every single widget operates properly after the deployment, and guarantee that the units we ship to our customers are working correctly.
Designing the container.
Determining the requirements.
Ubuntu 12.04, with a series of specific libraries and settings.
Passthrough of /dev/ttyACM0 from the host OS.
Isolated network, with port 80 accessible only from our development machine.
The container needs access to the folder hostOS:/workspace/widgetOS/deployment-image/ to get the latest version of the applications files for flashing purposes.
A technically better way to do this would be to have the app code and the binaries stored on a repository that we could clone on launch, to ensure that every component is up to date. Since we want to learn how to inject files and folders for this example, we will not.
Building the base image.
Docker Hub already contains a pre-built Ubuntu 12.04 image, but we are going to build it manually from scratch.
If you wanted to use the Docker Hub image, you would just run docker pull ubuntu:12.04 to download it. In this particular case the image does come from Canonical themselves, but Docker Hub offers no guarantees and anyone could upload an image and pretend that it is trustworthy.
Since this is a production scenario, it is not recommended to use Docker Hub unless we can fully certify that the image comes unaltered from a trusted source.
We are going to use debootstrap to install a base Ubuntu system into /tmp/precise-pangolin, and then we’ll make Docker use it as a base image before making any modifications.
First we need to import the relevant GPG keys to be able to authenticate the packages. In this particular case, we need to install the package ubuntu-keyring, and pass a parameter to debootstrap so it used the old keys for Ubuntu releases.
Now debootstrap will download all the base packages. After the process is finished, we import this into docker to use it as the base image.
If we list the images, we can see that our newly created Precise Pangolin image is ready to use.
This will become our standard Ubuntu 12.04 base image. We are going to use it as a parent image, and we will customize it to our particular app needs.
Applying the necessary customizations with a Dockerfile.
We are going to write a Dockerfile that will incorporate the necessary requirements for our app, and they will be applied to the base Ubuntu image.
Docker builds the image through layers: think of layers as version-control, if you make a change Docker only has to rebuild from that change onwards since the previous layers remain unchanged.
Two of our requirements can’t be coded into the Dockerfile: shared folders and shared devices. We will instead satisfy them at launch time.
For the purposes of recreating this scenario here are the contents of the file requirements.txt, which is present in the same folder as the Dockerfile.
The following are the Dockerfile contents, along with comments explaining what the different directives are and how they help us fullfil the app’s requirements.
Building and launching the container.
To build the container, we need to initiate the build process from the Dockerfile directory. We are going to call this container widget-deployer.
Note: The build process copies every file in the current directory into the Docker image.
Sharing devices and folders
Adding a shared folder between the host and the container with a bind mount.
As we have seen already, mounts can’t be specified inside the Dockerfile for portability reasons. Thus, we will specify a read/write mount when launching the container. In our case, we want to share the /workspace folder.
Using the following syntax we can mount a folder inside the container.
--mount type=bind,source=<absolute path on host>,target=<destination on container>
Sharing a physical device with the container.
To allow passthrough access to a physical device we need to specify it at runtime:
--device <device path>
Running the container.
Launching the container is a very straightforward process. Once built, all we have to do is:
Once launched, we can inspect the running container’s metadata with the inspect command.
Our app is now ready to use.
Running more than one instance of the app.
If we wanted to run many instances of the app in parallel, we would need:
An individual serial adapter per instance, passed through with the –device parameter. /dev/ttyACM1 for example.
To forward the exposed webserver port to a different port on the host, per instance. For example, we forward port 80 to port 1080.
The commandline would look like this.
docker run --device /dev/ttyACM1 --mount type=bind,source=/workspace,target=/workspace -p 1080:80 widget-deployer
And examining the running list of containers would show that the port is forwarded.
Debugging a container: Stepping inside through a shell.
To launch a bash shell inside the container and inspect it’s contents, we can run the container and specify a custom command to run.
Keep in mind that any manual changes made are lost on container termination, and they do not apply to newly launched instances.
Cleaning the images and temporary files we have created.
Conclusions.
Containers are a powerful and speedy addition to our devops workflow, and while they are not a one-size-fits-all solution, they do offer significant advantages over other application isolation alternatives.
In a future post we will explore how to use Docker Swarm to launch a complex network of containers, including web apps, load balancers and databases.