Microsoft at Open Source Summit Europe 2024: Driving collaboration and innovation
Connect with other open source enthusiasts at Open Source Summit Europe 2024…
The questions started around KubeCon San Diego. Maybe because we had just released Helm 3. Or, maybe because a few operator tools had been put up for adoption by CNCF. Whatever the cause, I started receiving questions about Helm and operators. And most of the questions seemed to imply that these two technologies were engaged in an epic duel.
At first, I was bewildered by this comparison. It was as if people were suggesting that this year’s Super Bowl would feature a showdown between FC Barcelona and the New York Yankees. But a few months into the new year, I am still being asked variations of the same question: “Who is going to win: Helm or operators?” This is my answer.
To start with, let’s dive into the purpose of each technology. What problems does Helm solve? What about operators? From there, we’ll look at the areas of overlap. In the end, we’ll turn back to the question, and ask which technology “wins.”
The very first version of Helm was released on Nov. 2, 2015. Kubernetes was at version 1.1.0 and the very first KubeCon was about to take place. But even in these early days, Helm proclaimed its vision:
Helm provides package management for Kubernetes
We published an architecture document that explained how Helm was like Homebrew for Kubernetes.
From the earliest days, Helm was intended to solve one big problem: How do we share reusable recipes for installing (and upgrading and uninstalling) things on Kubernetes? We looked at operating system package managers like Homebrew, Apt, RPM, and Chocolatey, and we saw parallels in Kubernetes. Even up through Helm 3, this has consistently been our vision.
Today, Helm has over a million downloads a month, and we are aware of dozens of Helm-based tools, and thousands of publicly available charts. The conclusion I draw from this is that Helm, as a package manager, has been a success.
Almost exactly one year after Helm 0.0.1, Brandon Philips, then CTO of CoreOS, posted one of the most brilliant blog posts of the Cloud Native era: Introducing Operators: Putting Operational Knowledge into Software.
Philips pointed out that we often relied upon humans to manage the runtime needs of applications. But with a system like Kubernetes, much of the material once set down in run books and user manuals could now be transformed into code.
An Operator is an application-specific controller that extends the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user. It builds upon the basic Kubernetes resource and controller concepts but includes domain or application-specific knowledge to automate common tasks.
CoreOS illustrated this new design pattern in an operator designed to manage an etcd cluster.
Operators contained the institutional knowledge necessary to manage some, if not all, of the operational aspects of a workload inside of Kubernetes. As Philips suggests:
An Operator builds upon the basic Kubernetes resource and controller concepts and adds a set of knowledge or configuration that allows the Operator to execute common application tasks. For example, when scaling an etcd cluster manually, a user has to perform a number of steps: create a DNS name for the new etcd member, launch the new etcd instance, and then use the etcd administrative tools (etcdctl member add) to tell the existing cluster about this new member. Instead with the etcd Operator a user can simply increase the etcd cluster size field by 1.
Philips is absolutely correct: SREs and DevOps engineers spend far too much time manually re-running sequences of commands. The operator design pattern provides a compelling solution: write per-application tooling that codifies common management tasks.
Over the last few years, the notion of an operator has matured, partly due to the completion of Custom Resource Definitions (CRDs) and partly due to the Kubernetes API maturing. These days, authors of operators are likely to talk about how they wrote CRDs and custom controllers to manage applications. While the terminology has changed slightly, Philips’ vision is very much still at the heart of today’s operators.
Today, there are more than 100 operators available at varying degrees of stability. Leaders in the cloud-native space, including RedHat, IBM, and Microsoft, have released operators. The operator pattern is clearly a successful part of the cloud-native ecosystem.
Helm is a package manager for Kubernetes. Operators are design-pattern-driven pieces of code that encapsulate knowledge for running an application. Yet as I noted at the article’s outset, there are questions floating around about which one is “the winner.”
With two markedly different technologies, why are we attempting to pit them against one another? Is this not like asking which sports team is better, the New York Yankees or FC Barcelona? In many discussions with a variety of people, I began to understand why people have arrived at the conclusion that Helm and operators are competitors. There are two causes:
For starters, let’s cover the terminology issue. A skim through the documentation for each project will turn up a number of common terms. For example, both will talk about installing or creating resources inside of a Kubernetes cluster. While the terms are the same, though, the meaning is subtly different.
When Helm users talk about installing a thing, we mainly mean something like this:
I want to find a chart and be able to pass in some configuration and have that chart installed into my cluster without ever having to edit Kubernetes YAML.
To that end, Helm has focused on a standard packaging format, a template language for parameterization, and a system designed to easily locate and install off-the-shelf packages.
When the user of an operator talks about installing, what they mean is more like this:
I want to create a resource or resources as Kubernetes YAML and have those things spin up and maintain an application for me.
Thus, if you look at the instructions for installing the etcd operator, you will see that you are given guidelines and examples for creating your own Kubernetes YAML file, which will (when installed into the cluster by another tool) create and maintain an etcd cluster.
When Helm users talk about “management,” they are thinking of being able to use a simple tool to see what applications are running, which resources belong to whom, and then perhaps be able to upgrade, rollback, or delete individual applications.
In contrast, when it comes to operators, “management” is often used to talk about the “day two ops” of an application: Managing data integrity, scaling an application up or down, or automatically recovering from a failure.
We could go on to other terms, but the core message is clear: We have a limited (and frequently overloaded) vocabulary that sometimes causes us conceptual headaches. But when we talk it out, we realize that we’re dealing with different ideas.
While both Helm and the operator pattern have their sweet spots, we can definitely push our tools beyond their intended use cases. In fact, we might be able to push the Helm Chart metaphor into performing some operator-like tasks or write an operator that also does some of Helm’s package management tasks for itself. In so doing, we might be able to better compare the two technologies. But this is somewhat like asking the soccer player and the baseball player to engage in a kicking contest to see who wins. One is definitely better prepared.
I have seen some truly remarkable Helm charts. One, in particular, was over one megabyte of YAML and could orchestrate hundreds of components configured in an innumerable variety of ways. The chart could not only be used to install things, but also to repair broken clusters and keep all of these systems in sync. (It also used a special chart installer to encapsulate some extra logic.) As amazed as I am by this chart, I do see it as pushing beyond the bounds of what Helm is designed to do.
I know of some operators as well that push up against Helm on the installation and upgrade story, with custom installers that execute in-cluster (sort of like CNAB packages) and provide a similar workflow to Helm’s install/upgrade/delete story. Again, there is nothing wrong with this. But it stretches well beyond Brandon Philips’ definition of an operator.
In both cases, though, we must ask ourselves: are we stretching our tools because it is the best overall strategy? Or are we stretching our tools because we have gotten dogmatic about those tools (or, perhaps, because we fear to learn more tools)? We are doing a disservice to the Kubernetes community when we become so entrenched in our chosen tool or tools that we start rewriting them to repel other tools.
Operators and Helm charts have been working together since the early days of these technologies. For example, there are currently around sixty operators installable via the Helm Hub. There are even a few Helm operators (notably WeaveWorks’ Flux operator for Helm) in which Helm functionality is provided using an operator that links directly to the Helm APIs. These are clear indications that the two technologies can work well in concert without having to push each technology past its limits.
One might be tempted to look at the numbers and draw conclusions. Helm has more charts, more available tooling, and more users. But this would lead to an incorrect conclusion. After all, Helm is designed to make it easy to write charts and distribute them. Operators, in contrast, are tremendously difficult to write because by design they encapsulate complex operational knowledge. While the typical Helm chart is a few hundred lines of YAML, the typical operator is thousands of lines of code. Thus we would expect the technology to evolve slowly, with emphasis on covering the most widely used services.
On the contrary, one might be tempted to look for a win based on a notion of “architectural purity.” On more than one occasion, I have heard fans of operators tout them as superior to Helm charts because they are based on CRDs and controllers (two central architectural features of Kubernetes). Helm is, in this estimation, “just templated YAML.” This argument also isn’t decisive. It ignores whether a problem is solved and begins with an argument by fiat that the solution to the problem must use controllers and CRDs. The problem Helm tries to solve is not better solved with CRDs and controllers (though there are available tools that use controllers to install Helm charts). Neither CRDs nor operators are necessary pieces for telling the package management story. Adding them simply increases the complexity (and the attack surface) of a package management system.
Again, it feels like we’re back to the question of which sports team is best, which leads us to a concluding analogy.
In the early 2000s, artist Howard Schatz published a book of photographs of the top athletes from a multitude of sports. From gymnastics to basketball to sumo wrestling, Schatz posed them standing side-by-side. Each of these athletes had honed their bodies to achieve prowess in their chosen sport. Yet the contrast in size and shape could not have been more pronounced. Could we have looked at those athletes, picked one, and said, “you are the best athlete”? Not in any meaningful way. The best we could do is say “you are the best in your sport.”
Operators have a different focus than Helm. Each has strengths and weaknesses. And there is definitely some overlap between the two. But we should be unsurprised to see the two tools work well together. After all, they are complementary rather than opposed.
Collectively, we would be better served by forgoing the mentality of dueling technologies. Instead, we should focus on using the combination of these technologies to truly make Kubernetes an easier platform to operate.