Kubernetes Operators - introduction

Kubernetes Operators extend Kubernetes functionality to help us reduce DevOps manual work by allowing us to programmatically manage Kubernetes resources, simplify complex configurations or installations and interact with external resources.
For more practical experience I’d suggest checking out the next article Kubernetes Operators - example or if you want the official quickstart guide you can visit the Operator SDK - Building Operators in Go.
Though, it’s generally a good idea to have some upfront knowledge from this page before proceeding with implementation.
What are Operators?
Operators in the end are just applications that manage Kubernetes. We create them to manage tedious tasks of managing Kubernetes resources that are not just a simple yaml file creation, but more.
How they work is that, when they start they check for the existence of the resource, if it does not exist they create it, otherwise they confirm that they are in “expected state”, if they are not, they try to correct and put them in the desired state. This logic is done in the reconciliation loop (you can look at it like a sync function). Afterward, the Operator is listening for events on any state changes on those resources so it can repeat the reconciliation process. Communication with the Kubernetes is done through it’s API, more precisely with the SDK.
The image bellow shows the interaction between these entities.

Component that is managed by an Operator is called an Operand. Operators can have one or a few Operands, but it’s usually one. The Operator can also bidirectionally communicate with the Operand. This means that not only can it install and update the Operand, but it can receive communication back from the Operand about its status and report that as well.
Custom Resource Definition (CRD) are type of Kubernetes object that allows users and administrators to extend the Kubernetes platform with their own resource object beyond what is defined core API. Meaning, while a Pod is a built-in native API object of Kubernetes, CRDs allow cluster administrators to define MyOperator as another API object and interact with it the same way as with native objects.
An example of a CRD of NginxOperator
apiVersion: operator.example.com/v1alpha1
kind: NginxOperator
metadata:
labels:
app.kubernetes.io/name: nginx-operator
app.kubernetes.io/managed-by: kustomize
name: sample
spec:
replicas: 1
forceRedploy: ""
Reconciliation loop
As described above, Operators function on the premise of reconciling the current state of the cluster with the desired state set by the user. They do this by listening and reacting to events triggered on the Operands (resource/s the Operator manages), like creation, updates and deletes in its Operand namespace. It’s up to the developer of the Operator to select which events are of interest.
When an event of interest is caught by the Operator, it starts the reconcile loop for handling it. In bellow diagram are steps are the main part of an operator in which your custom logic manages resources of the Operator based on it’s CRD configuration.
- Check for an existing CRD Operator object. As we know, the operator’s CRD contains the configuration for the operator and should never be changed by the operator itself. If no object is found, the operator should just exist with error. That error will show up in the operator logs and indicate that something is wrong.
- Check for the existence of the relevant resources in the cluster, in our case, if there is no Deployment resource, we then create it.
- If relevant resources already existed in the cluster, we check if the resources are configured with configuration in the Operator CRD. If not, we update the resource to reflect CRD configuration.
Functionality of these steps will differ based on your custom logic, this is not mandatory but a general view on the process.
Event trigger types
- Level-based triggering, just notifies on the event and requires the operator to inquire the whole state
- Edge-based triggering, gets data it needs without requiring operator to check the whole state
Architecture design of the client Go library
Even the most simple hello-world controller is hard to showcase due to the amount of code happening behind the scenes, because the reconciliation loop is only a tiny part of the whole logic.
If you want to check out the entire logic through code, the most simply example is to check out the repository
sample-controller and files main.go
and controller.go
(which has 400 lines of code).
The repository is using the client-go library.
The following diagram referenced from the documentation of the sample-controller can show you the architecture of the client-go library that every Operator implements (so don’t worry, you won’t need to implement all of that logic 😉 ).

You see that there isn’t any reconcile name on the diagram, that’s because the name on the diagram for that function is Process item.
What this architecture achieves is that it reduces Kubernetes API server load by minimizing API calls, ensures that local cache is always up-to-date with cluster changes and helps controllers react to change in real time.
From the image we can say that the main entities do the following:
- Reflector - Fetches and updates the store about watched Kubernetes resources (syncs with k8s resources)
- Indexer - Provides a way of storing and retrieving objects based on keys (by namespace/resource) acting like a cache database
- Informer - Wraps the Reflector and Indexer then handles retries and triggers user registered function that react on Create, Update and Delete
Frameworks
Because building and maintaining Operators is not easy, plus having a standard is always preferred, there exist frameworks for building Operators with one of those being the Operator Framework or Kubebuilder.
Operator Framework provides a lot of tooling, standards, boilerplate/schema generators and many more to make the development and maintenance of an Operator relatively easy.
The framework, also, defines the Capability Model which categorizes operators based on their functionality and design.

Note that you don’t need to have your Operator fulfill all of these capabilities, it can be just a level 1 which just installs the resource, this is just a sort of label
Because of that, there exists a kinda “store” where you can publish your own Operators: Operator Hub.

In the next chapter Kubernetes Operators - example we will be using the Operator framework, and it’s Operator SDK to build one.
Another great component from the framework is the Operator Lifecycle Manager (OLM) that provides you with easier management of the Operator, best described with the official description:
"OLM is a component of the Operator Framework, an open source toolkit to manage Kubernetes native applications, called Operators, in an effective, automated, and scalable way. OLM extends Kubernetes to provide a declarative way to install, manage, and upgrade Operators and their dependencies in a cluster."
Just like you would try to deploy a new version of your application, you would try to do the same with the Operators, though note that Operators also manage their resources, and you don’t want to disrupt them during that process.
References
For the end, I would post links to articles, documentation or presentation that would help in bridging the gap in explaining how Operators precisely work and some of the nuances, though generally this should be general overview of what Operator is, what is it for and how it works.
Writing Kube Controllers for Everyone - Maciej Szulik, Red Hat (Beginner Skill Level)
CODE4104: Let’s build a Kubernetes Operator in Go! with Michael Gasch & Rafael Brito