Skip to content

kubernetes

Build a managed k8s cluster

👷 This is a work in progress

Build Me A Cluster - The Managed Edition

So you have been asked to build a Kubernetes cluster. What does it mean to build a cluster? At the very least, you will need a spec for your node pool (AKS and GKE)/node group (EKS), the persistent storages and your pick of CNI providers.

Kubernetes cluster

I would argue, however, that to building a cluster does not end there. In fact, what many think of building a cluster, is only the beginning because a cluster ultimately has to run workloads and those workloads have to be deployed securely and once up and running, they have to be monitored and observed. One good reason to think about deploying workloads right at the outset is because to manage and secure the cluster, the cluster operator must deploy certain workloads right after build (more on this later).

On the subject of workloads alone, you have to consider the following:-

  1. Establish security baselines.
  2. Choosing and configure the continuous delivery channels for workloads.
  3. Securing workloads deployment to the cluster.
  4. Pre-configuring for observability, monitoring and telemetry.

Security baselines

Setup logging and monitoring

You will need to set up audit logging/api sever logs for the clusters upon creation, and you will likely be archiving these logs into object storage (S3/GCS/Azure Blog Storage etc.) and perhaps event setup analytics and alarms for them.

Kubernetes cluster with logging and monitoring

Enable PodSecurity

Enable the PodSecurity Admission controller if not enabled by default and configure the default policy for the cluster.

Policy engine

Provision a policy engine such as Kyverno or Open Policy Agent (OPA) to ensure workloads comply with known best practices and are in keeping with the organization's or LOB best practices.

Kubernetes cluster with logging and monitoring

Secure the workload container images

When it comes to deploying workloads, you first thing you would like to do is ensure container images come from trusted sources. To that end, you are very likely to be pulling images from trusted container registries only. In environments with higher security requirements - think financial institutions, defense department contractors - you may not be allowed to reach out to public registries at all and all requests will need to go to an internally hosted container registry such as JFrog Artifactory. Even if the requirements are not as stringent, you will likely be using a cloud provider's container registry (ECR/ACR/GCP Artifact Registry) and configure virtual image repos to pull securely from one or more image repos. This configuration should become part of the cluster build process. In short, you have to ensure the images can come only from trusted sources. Kubernetes container image sources

Secure access to cloud resources

Using Kubernetes secrets to hold credentials required to access cloud resources is a practice frowned upon. There is the ever present danger of your secrets being exposed or stolen. Then there is the additional burden of rotating credentials periodically and recreating/syncing secrets. You want your workloads to just have access and this is done using workload identity. Workload identity is the way forward and it requires granting IAM roles to the cluster owner/operator, so they can create Kubernetes service accounts that give access to their workloads access to cloud resources that their workloads require.

Kubernetes init containers - the Datadog example

EDIT - July 11 2024 - Adding explicit links to external docs

When I look at the examples of init containers and how they are used, I am left dissatisfied because the examples feel contrived. So I decided to furnish a real example. This is how Datadog (DD) does it which you can read here. Hold off on reading this documentation if you are yet unfamiliar with Kubernetes operators. It will only confuse you.

The init container's job

The DD instrumentation library is injected into an application container by way of an init container. For example, suppose you have containerized Java application. Before your application container is run, the DD init container is run, which adds the required Java library - as a jar file - to the filesystem and terminates. Then the actual application container runs and it starts using the jar file.

Why not add the jar file during container build?

You may be asking yourselves why this could not have been done at application container build-time? The simple reason: you may not be in a position to recontainerize the application. It may well be a 3rd-party application container which needs to be instrumented.

The Admission Controller's role (optional)

To continue with the Java application example, the gist is that when the DD agent runs in the cluster, it also runs a DD Admission Controller (DDAC) which registers itself with the Kubernetes control plane which intercepts Pod creation requests to the Kubernetes API server before persistence of the Pod objects. This is where the Pod's spec is modified to insert an init container if it contains some annotations that the DD Admission Controller is looking for. Pods without those modifications will be left alone. The annotations tell the DDAC which Pods need to be modified to have the init container inserted in the spec and furthermore, what version of the jar file will be injected.

While I have used Java applications as an example, the same technique is applicable for application containers where the application may have be written in .Net/Python/Golang etc.

  1. The Datadog Admission Controller - https://docs.datadoghq.com/containers/cluster_agent/admission_controller/?tab=datadogoperator
  2. Datadog automatic instrumentation with local library injection - https://docs.datadoghq.com/tracing/trace_collection/library_injection_local/?tab=kubernetes
  3. Datadog tutorial: Instrumenting a Java application https://docs.datadoghq.com/tracing/guide/tutorial-enable-java-admission-controller/#instrument-your-app-with-datadog-admission-controller

kubectl - the underappreciated tool for the Kubernetes developer

EDIT: Grammatical errors fixed.

It's a trap

Admiral Ackbar

kubectl - a dev's perspective

For those who have gotten into the world of Kubernetes tooling, kubectl remains an essential and perhaps underrated tool, not just in its usage but in how it can inform us even when we are not using it directly (like in a bash script) but also in what it can teach us about how to use it to as a guide to when making calls to the Kubernetes API server. The first in this series of articles entry is not a beginner-level entry but it will surely set the tone for more articles down the line.

The TL;DR version

  1. kubectl ... --v=9 is the Kubernetes developer's underrated friend - it reveals much about which Kubernetes API server endpoints are being called, with what parameters and the actual HTTP requests to and responses from the API server.
  2. kubectl transforms the actual JSON/YAML used when creating or editing Kubernetes resources and likewise when fetching Kubernetes resources and displaying them, in unexpected ways and the only good way to observe these differences is by increasing the verbosity of the output. What I am saying is that what kubectl displays is not necessarily what the API server served up.

The God-is-in-the-details version

Have you ever tried to, say get a list of all resources of a kind in a namespace? Say you would like to get a list of all ConfigMaps in a namespace.

Suppose I have 2 ConfigMaps in a namespace, the first of which is

apiVersion: v1
data:
  hello: world
  yoda: do or do not. There is no try
kind: ConfigMap
metadata:
  labels:
    app: blog
  name: hello-world

and the second of which is

apiVersion: v1
data:
  palpatine: There is a great disturbance in the force
  vader: I find your lack of faith disturbing
kind: ConfigMap
metadata:
  labels:
    app: blog
  name: sithisms

and we kubectl apply both of these to a namespace list-resources-ns to create them.

Note the .metadata.labels in both ConfigMaps : app: blog

Once the ConfigMaps have been created, let us fetch both of them simultaneously from the cluster by selecting them using the labels we applied to them app: blog.

kubectl get configmap --selector app=blog -n list-resources-ns 

which gives us the output

NAME          DATA   AGE
hello-world   2      25m
sithisms      2      22m

but now if we change the command above to

kubectl get configmap --selector app=blog -n list-resources-ns -oyaml

we get something like this (.resourceVersion, .creationTimestamp values and the like not withstanding, those would vary in your case)

yaml apiVersion: v1 items: - apiVersion: v1 data: hello: world yoda: do or do not. There is no try kind: ConfigMap metadata: creationTimestamp: "2024-06-30T17:46:35Z" labels: app: blog name: hello-world namespace: list-resources-ns resourceVersion: "11757" uid: 01c03ca8-da39-44ba-991b-7dc4818440cd - apiVersion: v1 data: palpatine: There is a great disturbance in the force vader: I find your lack of faith disturbing kind: ConfigMap metadata: creationTimestamp: "2024-06-30T17:49:39Z" labels: app: blog name: sithisms namespace: list-resources-ns resourceVersion: "11806" uid: 81f99d38-3de8-460f-911f-0d954f165ed1 kind: List metadata: resourceVersion: ""

So far no surprises. Except for what is going on behind the scenes.

  1. First of all, unsurprisingly, kubectl is issuing an HTTP GET to the Kubernetes API server. In fact the GET is issued to the relative URL /api/v1/namespaces/list-resources-ns/configmaps?labelSelector=app%3Dblog&limit=500
  2. Secondly, the API server is returning not a YAML but JSON and kubectl converts the JSON to a YAML. This shouldn't surprise most devs: YAML is better suited to configuration files but JSON is better suited for transporting data compared to YAML since YAML is pretty finicky about indentation. So, kubectl converts the JSON to a YAML before showing you the output.
  3. Thirdly, and this might surprise many, you would expect that the API server returns the JSON-equivalent of the above YAML to have been returned.
{
    "apiVersion": "v1",
    "items": [
        {
            "apiVersion": "v1",
            "data": {
                "hello": "world",
                "yoda": "do or do not. There is no try"
            },
            "kind": "ConfigMap",
            "metadata": {
                "creationTimestamp": "2024-06-30T17:46:35Z",
                "labels": {
                    "app": "blog"
                },
                "name": "hello-world",
                "namespace": "list-resources-ns",
                "resourceVersion": "11757",
                "uid": "01c03ca8-da39-44ba-991b-7dc4818440cd"
            }
        },
        {
            "apiVersion": "v1",
            "data": {
                "palpatine": "There is a great disturbance in the force",
                "vader": "I find your lack of faith disturbing"
            },
            "kind": "ConfigMap",
            "metadata": {
                "creationTimestamp": "2024-06-30T17:49:39Z",
                "labels": {
                    "app": "blog"
                },
                "name": "sithisms",
                "namespace": "list-resources-ns",
                "resourceVersion": "11806",
                "uid": "81f99d38-3de8-460f-911f-0d954f165ed1"
            }
        }
    ],
    "kind": "List",
    "metadata": {
        "resourceVersion": ""
    }
}

except that what the API server actually returns is

{
  "kind": "ConfigMapList",
  "apiVersion": "v1",
  "metadata": {
    "resourceVersion": "15095"
  },
  "items": [
    {
      "metadata": {
        "name": "hello-world",
        "namespace": "list-resources-ns",
        "uid": "01c03ca8-da39-44ba-991b-7dc4818440cd",
        "resourceVersion": "11757",
        "creationTimestamp": "2024-06-30T17:46:35Z",
        "labels": {
          "app": "blog"
        },
        "managedFields": [
          {
            "manager": "kubectl-create",
            "operation": "Update",
            "apiVersion": "v1",
            "time": "2024-06-30T17:46:35Z",
            "fieldsType": "FieldsV1",
            "fieldsV1": {
              "f:data": {
                ".": {},
                "f:hello": {},
                "f:yoda": {}
              }
            }
          }
        ]
      },
      "data": {
        "hello": "world",
        "yoda": "do or do not. There is no try"
      }
    },
    {
      "metadata": {
        "name": "sithisms",
        "namespace": "list-resources-ns",
        "uid": "81f99d38-3de8-460f-911f-0d954f165ed1",
        "resourceVersion": "11806",
        "creationTimestamp": "2024-06-30T17:49:39Z",
        "labels": {
          "app": "blog"
        },
        "managedFields": [
          {
            "manager": "kubectl-create",
            "operation": "Update",
            "apiVersion": "v1",
            "time": "2024-06-30T17:49:39Z",
            "fieldsType": "FieldsV1",
            "fieldsV1": {
              "f:data": {
                ".": {},
                "f:vader": {}
              }
            }
          }
        ]
      },
      "data": {
        "palpatine": "There is a great disturbance in the force",
        "vader": "I find your lack of faith disturbing"
      }
    }
  ]
}

Don't believe me? Try issuing the following

kubectl get cm --selector app=blog -ojson --v=9

The differences are many and profound. Here they are in tabular form.

Differences As printed by kubectl As sent by API server
.kind List ConfigMapList
.items[*].kind ConfigMap Field absent
.items[*].metadata.managedFields Field absent Field present

Now, why does this matter? It may not matter most of the times except when you are a dev creating a client-side tool, either using one of the officially supported kubernetes client libraries or maybe just use curl or any number of HTTP libraries; and you are looking to parse the JSON returned by the API server, you could easily be flummoxed as I was.

I was using the Kubernetes Python client library to fetch a set of secrets in a cluster, checking if the data in each secret was valid and updating only the secrets that needed to be updated and then updating the secrets to the cluster. Except the client kept insisting that apiVersion was absent and kind had not been set in the Secrets object. It was not until I pulled up kubectl and did a kubectl get --selector ... --v=9 that I realized what had gone wrong and the Secret objects - in the Python client it is called the V1Secret class - was indeed incomplete because I had constructed the V1Secret objects directly from the .items[*] JSON-objects and they did not have .kind and .apiVersion set.