Patterns for automating containers, operations, and orchestration

tl;dr: Docker offered great convenience to developers, but amplified long-standing operational challenges, especially around scaling, discovery, and managing persistent data. I focused the team on working closely with customers, using meetups as focus groups, and treating every conversation as a research opportunity to understand the problems people faced.

Based on those conversations, leveraging experience from across the company—including support and solutions engineers—and through iterations of prototyping and production releases with customers, we identified patterns to address these challenges. Some existing solutions to these challenges faced unique constraints on Joyent’s Docker platform. Those constraints forced clearer thinking about the separation of concerns between infrastructure and application operations—wherever the containers run—that we came to call the Autopilot Pattern.

Contents:

I’ve been on a bit of a mission lately to simplify operations and demystify “orchestration.”

Some people make orchestration sound like magic, but when I press them on it they describe the operational activities that need to be done over the lifecycle of their applications. This includes examples like registering containers with the load balancer as they scale things up and de-registering them as they scale down, but it expands from there.

A few slides about orchestration magic and running Docker in production from my talk on learning devops from sci-fi movies (though they really are better with the narration):

Separation of concerns

Almost everybody seems to have different workflows, processes, and touchpoints, and many of those are constrained or shaped by policies, legacy, and priorities. The Autopilot Pattern seeks to simplify that magic by putting it in context to the lifecycle of application components and drawing a bright line between the separate concerns of the infrastructure and application.

That was a key point in my post on app-centric micro-orchestration. Yeah, I choked on that title too, but it’s the shortest and most descriptive name I could think of.

By making our applications effectively stateless to the scheduler, we can make even the most complex applications simple enough to manage just by setting the scale of their components.

Separating the concerns of infrastructure and application operations/orchestration dramatically simplifies both, but two common challenges are persistence and discovery.

Persistence

How do you make stateful apps stateless, or handle all the operational bits that often get read from disk? That’s where my post on persistent storage patterns for cloud applications comes in:

The conventional wisdom is that containers are great for stateless applications, but inappropriate for stateful apps with persistent data. If this is true, it’s not because the technology isn’t up to it, it’s because the patterns for managing persistent data and stateful applications are not commonly understood. The challenge isn’t so much about how to persist state, but how to do so without compromising the agility and automation that we love about containerization in the first place.

That post goes on to dissect the different persistence needs we have in our apps—from configuration, secrets, databases, and shared blobs—and patterns for how we manage those in modern apps.

The following slides are excerpted from a talk I did at DockerCon:

One item from my narration of that talk: upgrading databases through replication exercises the same mechanisms used to recover from node/storage failure, leading to significantly improved operational stability and making a lot of other operational activities easier, like moving DBs to different hosts with different performance characteristics.

Those advantages are not without drawbacks, especially at scale. Features to support provisioning a new application disk image on a host without replacing any data volumes would be hugely valuable for modern operations, and it’s a proposed feature for Triton.

Discovery

Much earlier, I dug into the issues of discovery in distributed applications, making the case that discovery and connection management are core concerns that affect the app’s failure modes and consistency behavior:

Passive discovery patterns are those that separate the application from these decisions, leaving the application passive in both the choice of what back ends to connect to, and passive in resolving failures that may result. Active discovery patterns move those decisions into the application so it can have an active role in choosing the backend and working around failures it may encounter.

Good patterns for solving discovery are not yet widespread, but the solutions are accessible. And an increasing number of people are working on second-order discovery problems including versioned service mesh.

Autopilot Pattern

Over a number of prototypes (example: automating Couchbase) and working with countless customers to automate their apps, we identified repeatable patterns that improved operational outcomes, team velocity, and scalability. To validate and demonstrate the pattern, we took on MySQL automation and Tim Gross shared the details at Velocity. We picked MySQL as the app to focus on specifically because of the challenges it poses to orchestration modern cloud contexts.