One of Docker’s most enticing features is rapid application deployment, meaning that once we generate a Docker image, it contains the application itself and all the runtime dependencies. The heavy lifting part of the deployment ─ configuring the OS, upgrading packages, installing libraries, setting up the web, and maybe application servers ─ is done only once. After the resulting Docker image is tested and verified to work, it will give us fast and reproducible deployments on any environment Docker is running. In theory, and pretty much in practice, deployment is as easy as pulling down a Docker image and starting it, reducing wait time from minutes to mere seconds.
Dtk is one of the projects we’re working on, and it has a complex deployment procedure with a lot of moving parts. It’s not unusual for deployment-from-scratch times to be over 20 minutes. With all of that in mind, Docker seemed like a great fit for Dtk. The first obstacle we ran into while working on the ‘dockerization’ was Puppet: The whole deployment procedure was written in it. We saw this as a problem mostly because Docker the tool and the community surrounding it are pretty opinionated (not that there’s anything wrong with that). One of the opinions was that configuration management tools like Puppet should be used only to manage the hosts where Docker is running (if at all). Dockerfiles, which are used to describe and build Docker images primarily use shell commands or scripts, even though technically we can use any technology/language, Including Puppet, to assemble the images.
At this point, we had two options: Either invest considerable time and energy into converting and testing our Puppet-based procedure to bash commands, or reuse the existing, battle-tested Puppet code to build the images. Encouraged by the examples we found while researching, we went with the second option.
Preparing to build the Docker image began with identifying which parts of the application were more ‘stationary’ than others. Those are usually OS packages for databases, servers and other runtime dependencies along with their configurations which don’t change very often; they will go into our ‘base’ image. The base image is not built frequently, usually once per release. Its main purpose is to speed up the final image build because the base can be cached and reused. The final application image will be built on top of the base one. Luckily our Puppet code was already separated into four logical ‘stages’, three of which took care of the stationary parts and the fourth one pretty much sets up the application code and dependent libraries (Ruby Gems to be more specific).
The simplified version of the Dockerfile for the base image looked something like this:
FROM ubuntu:14.04 MAINTAINER dduvnjak <[email protected]> ADD docker/install_requirements.sh / RUN /install_requirements.sh # copy puppet modules and manifests to the image RUN mkdir -p /etc/puppet/modules COPY dtk-provisioning/modules /etc/puppet/modules COPY docker/manifests /tmp/manifests # apply the puppet manifests RUN puppet apply --debug /tmp/manifests/stage1.pp RUN puppet apply --debug /tmp/manifests/stage2.pp RUN puppet apply --debug /tmp/manifests/stage4.pp
This base image (getdtk/baseimage) includes Ruby/RVM, a configured Nginx, Passenger etc., but it’s missing the application code. We’ll add that part while building our final image, which takes a lot less time than the base (a good thing because we’re building it every time a new commit is pushed to the git repo).
A simplified version of the final image (getdtk/dtk-server) Dockerfile:
FROM getdtk/baseimage:0.15 MAINTAINER dduvnjak <[email protected]> # this puppet manifest will set up the application code RUN puppet apply --debug /tmp/manifests/stage3.pp # do some cleanup RUN apt-get clean && apt-get autoclean && apt-get -y autoremove RUN rm -rf /etc/puppet/modules /tmp/* /var/lib/postgresql/ /var/lib/apt/lists/* /var/tmp/* # kick off the main startup script CMD ["/addons/init.sh"]
These Dockerfiles, with a little help from a few bash scripts, gave us a new deployment option that was a lot quicker (from over 20 minutes to a minute) and much more reliable. Plus, we kept our beloved Puppet code. In fact, it was so good, it phased out our regular deployment method, and we went full-on Docker.
What have we learned?
– Docker is awesome
– Docker plays nice with Puppet
– If you’re already using configuration management tools, you might not have to throw everything away to move to Docker
In case you’d like to try some of these examples, feel free to take a look at Dtk’s GitHub repo at dtk/dtk-server. And if you want to learn more about Dtk (short for DevOps Toolkit), which is an awesome open source tool itself, you can find a getting started guide, documentation and examples over at dtk.io.