Docker for Development at nine.ch

nine Team Jul 24, 2017
Docker for Development at nine.ch

At nine.ch, we develop many tools to automate our business and to efficiently interact with our customers. These services usually communicate with at least one other service. In most cases, they need to access data in a database or request data from a remote service.

Developing such services is a challenge, as, ideally, a software engineer has all dependent services running locally. Plus their dependent services, and so on.

Previously, we relied on a rather complex setup of Vagrant-based virtual machines, which were provisioned by Puppet, and Boxen for managing any local dependencies such as databases. Having this up and running usually took quite some time and was error prone. Not only did we often forget to launch one dependent service, but also keeping the virtual machines and our Puppet scripts up to date was never a pleasure, to say the least. Stuff also broke with updates of macOS quite often. Besides that, provisioning of the VMs took quite some time.

All of this resulted in the situation where there were constantly differences between each developer’s setup. Time and time again, those differences became so significant, that what worked on one developer’s machine did not work the same on another developer’s computer. (And certainly did not work the same on Jenkins).

Our new setup is now almost completely based on Docker and Docker Compose. Even though we currently do not use these technologies for our production services, we were able to eliminate a lot of our pain-points. The next few paragraphs describe in detail how our new development environment is made, so that developing our Ruby based applications is efficient and reliable.

Although we did this with Ruby in mind, the solution at hand is not limited to Ruby development at all. It is applicable to most other software stacks as well, such as Python, Java or even PHP. The setup can be implemented on Linux with native Docker, on macOS with Docker for Mac and most probably also on Windows using Docker for Windows (though we have not tested this one yet). We use the Docker Community Edition.

Managed Environments

Our hopes with building a new development setup included the reduction of the ramp-up time for new developers, having the ability to test complex dependencies between our services with ease and reducing friction between the (usually macOS based) developer workstations and our staging and production systems (all of which are Linux based).

To achieve our goal, we identified three pain-points to tackle:

  • Varying software versions and configurations across systems

  • Managing all the involved dependencies

  • Varying execution platforms

All this shall be implemented without compromising on efficiency and established development techniques (e.g. leveraging the hot-redeploy feature of frameworks that support it).

At the time we began to look out for a solution to these challenges, Docker had already passed the ultimate peak of it’s hype phase and was slowly graduating into a reliable and recognized product. Besides personal experiments, no member of our team had worked intensively with Docker before. Most hadn’t even written a single Dockerfile. This was about a year ago and about to change quickly.

The Solution

We started by exploring and experimenting different styles and combinations for building our Dockerfiles. Most of us started reading all those “Don’t Do This” and “Do That” blog posts about Docker and eventually read a book or two on the topic. We then had a very heated discussion about “the ultimate way” how to structure our projects.

That’s when we agreed to our simple but powerful solution:

  • docker-compose build --pull must wrap the application under development in a Docker image.

  • docker-compose up app starts the service at hand and all it’s dependencies. No additional source code must be checked out. When “app” is up, everything required for the application to use is running and initialized.

  • docker-compose -f docker-compose.test.yaml run --rm app starts just the bare minimum, that is required to run the tests, and runs all the tests. The return code “zero” indicates successful test execution, a non-zero exit code indicates a failed test.

  • We don’t yet run our services in production with Docker, so “small” and “optimized” Docker images are not yet our priority. But every service is encapsulated in it’s own Dockerfile and only one process is allowed per Docker container.

To successfully implement the basic rules above, some other rules have been agreed upon:

  • The main application, i.e. the one currently under development, shall be called app.

  • The app states its immediate dependencies. Each dependency states its dependencies in turn, and so on.

  • CI will look for a file called docker-compose.test.yaml and will run it instead of any other tests.

  • If a service consists of multiple processes, run the same image multiple times with different commands. Remember: One process per container!

  • Use environment variables for configuration.

  • If databases have to be initialized, this is done automatically.

  • When a service depends on another service in order to launch successfully, such as a database or message broker, it waits for that other service to be available before launching.

Versions and Configurations

Probably the biggest challenge we faced was to manage the versions of the runtime environment, libraries and external dependencies such as databases.

Previously, we were using Boxen to manage all the services that had to be running on our machines. Our own applications came with a Procfile to start everything that came with the application. Any further dependencies had to be started manually, like those installed on our machines itself or those that had to be checked out separately. And some of our applications even came with a Vagrant machine that might had to be launched as well.

So before running any application in development, we constantly had to ask ourselves: Is my local machine up-to-date? Have I checked out the latest version of all dependent services and are all of them running? Have I updated and fully started any of the dependencies further down?

Nowadays, we just launch docker-compose pull, and it will take care of fetching all the required dependencies with the right version. We specify the version of our dependencies, so we always get the expected version. For example, when we require PostgreSQL with version 9.5, we’d ask for postgres:9.5-alpine and always get the expected version. And if someone decides to update to PostgreSQL 9.6, all developers will get the new PostgreSQL version as soon as they pull the latest changes to the docker-compose.yml file from the repository. Now we can even have a PostgreSQL 9.5 and 9.6 running in parallel! In fact, this is something we do for the integration tests of our nine-manage-databases tool.

Dependencies

The most important improvement is our “one command to launch everything” primitive, docker-compose up app. Most of our tools can be launched this way now. Exceptions are CLI tools, i.e. tools that aren’t supposed to run permanently. But even those can profit from the new command.

Let’s look at an example: We have a CLI tool called nine-manage-vhosts that enables our customers to easily create new virtual hosts and request a Let’s Encrypt certificate. Previously, we had no way to test the process of requesting such certificates other than deploying a new version of the CLI to a server that’s reachable from the internet. Now, docker-compose up will launch boulder, the official Let’s Encrypt server, in a Docker container. All of a sudden, were we able to test the whole process of requesting and revoking Let’s Encrypt certificates right from our developer machines, even when we’re offline! We were even able to write integration tests which create a new vhost and request a certificate from the dockerized Let’s Encrypt test server. These tests now run on our CI server during any build.

With the old setup, when developing on an application that depended on another application of our own, we were forced to check out the code of that tool on our machines, because there was no other way of launching that other application. This also meant, that we had to update all the repositories of the dependencies before starting the development on a single application. Nowadays, when we build the Docker Images for the tests on our CI server, we also push those images to our private Docker Registry.

When launching an application, Docker will take care of fetching all the dependencies from the registry. We still have to take care to update the dependencies, because of Docker’s aggressive caching. But now it’s just a docker-compose pull, instead of cd’ing to all the repos and running git pull --rebase in all of them.

Note: One important thing to do right on your CI is that you always pull fresh copies of all dependent images from Docker Hub, so that you always get the latest security updates. Just add the --pull argument like this: docker-compose build --pull. You should also build all your Docker Images on a regular basis for the same reason!

Platforms

As already mentioned in the introduction, our previous setup was very specific to macOS. By using Vagrant we tried to cover the differences between the Linux and the macOS world. Boxen installed dependencies locally that were available for macOS, such as PostgreSQL, MariaDB or Redis.

We usually just launched our applications on macOS directly, because launching a VM for any application would’ve taken a lot of time, so we only did that when it was absolutely necessary.

Often enough this way of developing, i.e. running the apps under macOS, lead to surprises on our CI, in the best case, or on our staging or even production environments in the worst case. All of these environments are Linux-based after all.

Because of the aggressive caching of Docker, once an application got packed into an image, the cost of rebuilding the same application again and again is very low. A build usually does not take any longer than updating the Ruby dependencies took on our machines anyway. Furthermore, some dependencies required us to have huge libraries such as QT/Webkit locally installed, which took a lot of space and had to be kept up-to-date as well.

Nowadays, we rarely run our applications directly on macOS anymore. It’s just too much effort making sure that any dependency is installed and running.

Efficient Development

We talked a lot about improving ramp-up time when starting (or continuing) development on an application. But as already mentioned in the introduction, despite all the benefits in getting up and running, we must not sacrifice the efficiency during the development work per se. The reason for this is simple: We usually spend more time implementing a new feature than getting ready to develop the new feature.

One of the most crucial functionalities for us is the ability to change code and immediately see the effects in our application without having to manually reload / re-initialize the application. Ruby on Rails can automatically reload the application when certain files change. (Such concepts are also known as Hot Reload and are available in other programming languages and frameworks as well.)

When Ruby on Rails runs in a Docker container, it doesn’t have access to the original files by default. So it wouldn’t notice when they change either. In order to make that possible, we must understand Docker’s concept of volumes. Volumes can be mounted anywhere into a Docker container. There are many kinds of volumes, but for this case of particular interest are mounted volumes.. These are basically a folder, that is mounted from the host system into the container. In other words, the files in those mounted folders are getting synchronised between the host system and the Docker container, i.e. when I change a file on the original filesystem, the change is reflected into the Docker container, and vice-versa.

Leveraging those mounted folders allows us to change the application’s code outside of the container and have the changes to be instantly available inside the container.

Look into the showcase application to see an example for such mounts.

Tripwires

It has also been mentioned already, that Docker changed fast in the past few years and only just began to stabilize slowly. For us, this means that there are still some rough edges here and there. A few of those are worth mentioning:

Mount Carefully

We often ran into the problem that something from outside of the container was mounted inside the container and has overwritten some key files in there.

For example, a developer launched Rails on his machine. He later launched the same Rails application also with Docker. But he forgot, that the local directory is mounted into the Docker container. The Rails application in the Docker container did not launch, because it found a PID file of the application running outside of the container. This was unexpected for that developer, and he had to adjust the mounted paths.

Lesson learned from this: Mount carefully and only what you need!

Sync Problems with Volume Mounts in Docker for Mac

When a mount consisted of a lot of files, for example when it contained the logs directory, then the sync between the host system and the container was very slow. This killed the I/O performance in the container and the containerized application slowed down drastically. (A page load which usually takes a few milliseconds took seconds to complete all of a sudden.)

The learning was to not blindly sync all the folders in an application, but only those required. (Mostly the folders with source code and assets in them.)

Note: This a known problem that applies to Docker for macOS and Windows, but does not apply to Docker on Linux.

Learning Curve

For most of the team, getting into Docker and its concepts wasn’t trivial. This argument probably won’t be valid in a few years time, but currently you can’t expect any software engineer to have already worked with Docker.

And even though the documentation got a lot better, it’s still hard to grasp the concepts at first.

Best Practices

It has been mentioned a few times that Docker is a young technology that still changes fast. We arrived at some “Best Practices” that work for us, most of them were mentioned in this article. Furthermore, Docker maintains a set of best practices for writing Dockerfiles.

Not all advices you get in blog posts are valid, or are still valid. Blog posts written only a few years back may give you advice that is totally invalid today. (This blog post might not be an exception!). And there are still a lot of “myths” out there! So watch out and verify what you read with the original documentation or conduct your own experiments.

Also, keep in mind that we don’t yet deploy our applications to production using Docker! Getting there will take some additional effort and we’ll have to go over most of our current Docker practices.

Docker Registry

In the beginning, we thought that having our own Docker Registry would not be necessary. Soon enough, we proved ourselves wrong. Maintaining a Docker Registry is actually fairly easy, depending on your needs. As we don’t deploy from our registry, high availability is not a must, and neither are backups. (We can populate our registry any time again from the sources.)

Yet, Docker images can grow huge, and cleaning out your registry once in awhile is highly recommended!

Tell Docker to be More Ignorant

This tip will be very helpful if you don’t know it already! When doing a docker build, Docker will take all the content of your directory and send it to the Docker daemon. If you have a lot of content in your application (e.g. node_modules, logs, compiled assets) that should not be part of the final Docker image, then add those files and folders to the .dockerignore file. This will speed up your Docker builds significantly.

Showcase

I’ve created a small showcase project which contains a simple Ruby On Rails application that connects to a database.

The first step is always to write a Dockerfile. It’s not spectacular, but there are a few tidbits already.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# /Dockerfile
FROM ruby:2.4

ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -q &&  apt-get -qq install -y \
  # required by our "waitforpg" script
  postgresql-client \
  # required by rails, see https://github.com/rails/execjs
  nodejs nodejs-legacy npm \
  # always clean up after the work is done
  && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Copy helper scripts
COPY docker/* /usr/bin/

# Get ready for the app
RUN mkdir -p /app
WORKDIR /app

# Copy Gemfile & Gemfile.lock separately,
# so that Docker will cache the 'bundle install'
COPY Gemfile Gemfile.lock ./
RUN bundle install --jobs 2

# Copy the application
COPY . ./

# We've copied the entrypoint script to /usr/bin above
ENTRYPOINT ["/usr/bin/entrypoint"]

# Finally, we'll start the development server by default
CMD ["bin/rails server"]
  • On line 14 all our helper scripts are copied right into /usr/bin. This makes them available on the whole system without prefixing any path. Just make sure that they have the executable flag set, i.e. chmod +x scriptname!

  • On line 22 we have copy the Gemfile and Gemfile.lock files, so that bundler can be run (on line 23) before the whole application is copied into the Docker image. Docker will cache this step, so that it will not perform this step again unless Gemfile or Gemfile.lock change. This saves a lot of time when rebuilding the Docker image.

  • On line 29 we put a custom entrypoint script in place. It makes sure, that our main CMD only launches when the PostgreSQL server accepts connections and when the database is migrated to the latest version.

There’s only one thing which is important in the entrypoint script, and it’s the exec instruction on the last line. It will make sure, that the entrypoint-process is replaced with the process given in CMD. If this is missing, the process given in CMD will be launched as a subprocess and signal handling might not work as expected.

1
2
3
4
5
6
7
8
#!/bin/sh
# entrypoint.sh

waitforpg
setup_pg postgres

# Prevent zombies
exec $@

Most of our “magic” happens in the docker-compose.yml file anyway:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
version: '3'
services:
  app:
    build: .
    image: docker-registry.internal/ninech/docker-example
    volumes:
      - .:/app
    ports:
      - 3000
    depends_on:
      - postgres
    environment:
      - POSTGRES_PASSWORD=frankenstein
  postgres:
    image: postgres:9.5-alpine
    environment:
      - POSTGRES_PASSWORD
  • On line 4 we tell docker-compose to build the app from the current directory.

  • On line 7 we tell docker-compose to mount the current directory as /app directory in the container. Therefore, all changes to the files outside of the container will be synchronized into the container.

  • On line 10 we list all the dependencies of app. docker-compose will automatically launch all these dependencies.

  • On line 17 we define that the POSTGRES_PASSWORD is shall be passed as environment variable, but as we don’t put the value, docker-compose will look for the value and will use the value on line 13.

In comparison, the docker-compose.test.yaml file is very simple:

1
2
3
4
5
6
7
version: '3'
services:
  app:
    build: .
    image: docker-registry.internal/ninech/docker-example
    entrypoint: "/usr/bin/test_entrypoint"
    command: "bin/rails test"

It’s that short, because the application will use sqlite for launching the tests and therefore PostgreSQL is not needed.

The whole application can be forked at Github.

Conclusion

Our new setup allows us to quickly get to work. It enables us to onboard new-hires faster, as they basically just need to get git, an editor and Docker running on their machines and then they’re ready. Additionally, the development works the same cross-platform and each engineer can now choose his operating system of choice without compromise.

To get there needs time. It’s a gradual process: Whenever we started developing on an application that was not yet Dockerized, this was (and still is) the first we now do. We became fairly good at this, but it took days for each application in the beginning!

It’s crucial to establish a common understanding on how to work with Docker, because there aren’t any commonly accepted standards yet. Getting there required us to reiterate often on previous decisions. Eventually we agreed on how ‘we do Docker’, at least for now.

Tripping over the issue that mounting volumes containing a lot of files slows down containerized application in Docker for Mac painfully reminded us, that Docker is still in its early days. But things improve fast and Docker matures at a steady pace.

Further Work

We definitely want to have Docker running our production workloads soon. We’ve already started to look into container management platforms, of which OpenShift is currently our favorite.

Because Docker images are very static in their nature, we’re also looking into automatically scanning our images for vulnerabilities. But for now, we simply ensure that we build our images often.


 

Would you like to use Docker for your application?

At the TechTalkThursday in March 2018, David spoke about «Docker for Developers» and gave an overview of the use of Docker in container technology. The video presentation including the transcripts is available under the following link.

Watch the video presentation now