How TomTom achieves Kubernetes multi-tenancy with Capsule

With the availability of well-architected and well-orchestrated Cloud Managed Control plane for Kubernetes (e.g. AKS for Azure, EKS for AWS, …) it became very straightforward deploy a Kubernetes cluster to run some workload. In TomTom historically each engineering team has the autonomy to operate, explore and modernise the service they own, and Kubernetes recently is the de facto standard adopted solution for running workloads in the company.

This level of autonomy helped team with fast innovation but also created a scenario of cluster sprawl, where most of the clusters are utilised to run only a specific workload, and are treated as ephemeral.

Whilst launching a Kubernetes cluster is an easy task, managing, operating and updating Kubernetes requires focus and team capacity. To address these challenges Developer Experience introduced a new Kubernetes managed platform, with a few main goals:

remove the additional efforts and distraction from the engineering team that have to run and operate their own Kubernetes infrastructure
accelerate the time to production, by offering a well-architected and ready-to-use platform based on Kubernetes
consolidate and optimize the compute usage, by bin-packing when possible workloads together, even from different engineering teams

Among all the challenges such a platform generates, one of the most interesting one was How to achieve multi-tenancy in Kubernetes?

Kubernetes documentation itself, recognises that the multi-tenancy concept can’t be mapped just with a Kubernetes Namespace. Engineers need more flexibility, hence several implementations have been proposed to address this challenge.

In this article we deep dive a bit on the capabilities of the technology we adopted: Capsule from Clastix (now a CNCF Incubator product)

Capsule in TomTom

Capsule is a policy-based framework, a sort of policy engine on steroids for Kubernetes. Also, it “does one thing well” as we all like the well-known and appreciated Unix principle. It introduces the concept of Tenant in Kubernetes without reinventing the Kubernetes semantics, but instead introducing the new construct by leveraging upstream Kubernetes capabilities, so that the user will be granted a “slice of compute” in a specific Kubernetes cluster.

On top of the native Kubernetes capabilities, Capsules add some additional and important features to properly define tenant boundaries, and protect multiple tenants when using shared cluster level resources.

Specifically, I would like to emphasize some nice features and design decision Capsule offers, that contribute on convincing us to adopt it.

1. Developer experience

Developer experience is a paramount in TomTom. We have chosen Capsule keeping this in mind. Capsule - indeed - enables engineers to share clusters without impacting their experience, and without requiring a high level of knowledge of Kubernetes.

Additionally, Capsule provides a nice interface, namely Capsule Proxy ,to connect to the Kubernetes API. Capsule Proxy is a (almost) pass through proxy for the Kubernetes Api Server that takes care of intercepting and filtering the calls so that a tenant can run the usual kubectl commands without receiving errors due to lack of privileges when trying to access cluster level resources.

The most common use-case is the Namespace resource: with Capsule the tenant will be able to list namespaces (even if it’s a cluster level resources) but it will see only the namespaces owned by its own tenancy. Same goes for other resources, like Nodes, StorageClasses, IngressClasses.

Capsule Proxy, can be exposed via Ingress resources and leverage all the other Kubernetes capabilities such as AllowList and Cert-Manager.

With this we are able to offer a seamless experience in logging in a Kubernetes Control Plane (via Capsule Proxy) with a KubeConfig that doesn’t contain any key-material. Endpoints are exposed with valid Public Certificates (no need for CA chain in the KubeConfig) and authentication happens via Azure Kubelogin exec plugin.

This is the simplest KubeConfig you might see today

apiVersion: v1
kind: Config
clusters:
  - cluster:
      server: https://capsule-proxy.***.example.com
    name: sp-cra-gchiesa
contexts:
  - context:
      cluster: sp-cra-gchiesa
      user: tomtom-user
      namespace: sp-cra-gchiesa-default
    name: sp-cra-gchiesa
current-context: sp-cra-gchiesa
users:
  - name: tomtom-user
    user:
      exec:
        apiVersion: client.authentication.Kubernetes.io/v1beta1
        args:
          - get-token
          - --login
          - azurecli
          - --server-id
          - 6dae42f8-4368-4678-94ff-3960e28e3630 # <-- well-known server-id for Azure AKS service
        command: kubelogin

2. Governance and resource ownership attribution

While we were evaluating soft-tenancy technologies, we noticed they typically introduce different semantics, or additional complexity by creating a separate control plane for each tenant. These outcome in a set of resources that needs to be synced with the central control plane, and add more complexity on observability side for platform teams.

Additionally, with a dedicated control plane per tenant, you open the possibility to the tenant to deploy anything they want. In TomTom, we have a set of standards and best practices, and we want to enforce a strong governance on the technologies used in the platform. So, while we foster the experimentation and innovation, we want to ensure that the platform is still properly governed.

With Capsule, each tenant share the same control plane, and each tenant has only access to the capabilities (Custom Resource Definitions) already available in the cluster. This allows a better control of what is deployed in each Kubernetes cluster, and reuse when possible shared service to support multiple tenants.

We have as well a defined process to request additional capabilities: when a tenant needs a specific operator or additional CRDs, then the platform team can roll out the changes in minutes. Or even the tenant itself can propose them by opening Pull Requests on the platform repositories.

When a lot of namespaces, services, workloads are deployed by multiple tenants, it’s extremely important to be able to identify ownership on those. We leverage Capsule additionalMetadata feature to enrich every important resource created by tenants with the information we want to ensure is always attached to each resource. We enforce, for example, labeling pods and service with the tenant that owns them, the related cost center (so we can offer a nice breakdown with kubernetes cost management tools), the Slack channel for contacting the team with priority in case of coordination, and more.

3. Enhanced capabilities

We leverage other capabilities available in Capsule that are not Kubernetes native but still very useful.

Ingress protection: each tenant can only create ingresses with hostname matching a pre-defined pattern. Since we run External DNS as shared service in each cluster, is important that different tenants don’t conflict each-other with same hostnames.

Distributing resources on all tenants: each tenancy can retrieve some internal cluster metadata via configmaps. They are made available by Platform Team by leveraging the GlobalTenantResource capability in Capsule.

4. GitOps ready

Capsule can be driven via a small set of well-defined Kubernetes CRDs. These will represent the tenancy boundaries in a declarative way. The Capsule manager - deployed in each cluster in the platform - takes care to reconfigure the tenancies as soon some changes is applied to the Tenant Custom Resource objects.

How do we use Capsule?

In our platform all the configurations are delivered in a GitOps fashion. The platform is multi-tier, so we do maintain configurations at tier level and specific per-cluster configuration at cluster level. Capsule perfectly fits this scenario. We structured our git repository so that each cluster has knowledge about its tenant and continue reconciling the configuration for them.

See the example of repository layout for distributing tier and cluster configuration.

.
├── CODEOWNERS
├── README.md
├── charts
│  ├── [...] <-- charts that are maintained by Platform team to aggregate and parametrise tier, cluster or tenancy customisations
└── tiers
    ├── playground
    │   ├── config
    │   │   ├── capsule.yml
    │   │   └── other-manifests-tier-level-config.yml
    │   └── [azure subscription/aws account]
    │       ├── [cluster-1]
    │       │   ├── argo-projects
    │       │   │   └── argo-project-gchiesa.yml
    │       │   ├── config # cluster level config, if needed
    │       │   └── tenants
    │       │       └── tenant-gchiesa.yml <-- Capsule Tenancy deployment
    │       └── [cluster-n]
    │           ├── argo-projects
    │           │   ├── argo-project-team-a.yml
    │           │   └── argo-project-team-b.yml
    │           └── tenants
    │               ├── tenant-team-a.yml
    │               └── tenant-team-b.yml
    ├── prod-tier-1
    │   ├── config
    │   │   ├── capsule.yml
    │   │   └── other-manifests-tier-level-config.yml
    │   └── [azure subscription/aws account]
    │       └── [cluster-1]
    │           ├── argo-projects
    │           │   └── argo-project-team-c.yml
    │           ├── config
    │           │   └── manifests-for-cluster-level-config.yaml
    │           └── tenants
    │               └── tenant-team-c.yml
    │	[...]
    └── prod-tier-n

NOTE: in the snippets above and below you will see also “Argo Project” mentioned. This because as part of the tenancy we offer as well an ArgoCD project to the tenant, so that they can deploy their applications in a GitOps fashion.

Additionally, we want to ensure we can roll out updates to all Capsule tenants, and as well we want to be able to “bootstrap” a tenant with some additional resource. An example is the Namespace: we allocate a default namespace for the tenant.

To achieve this we deploy Capsule manifests via Helm.

Each tenant creation results in 2 Helm installations: a tenant helm release for allocating the tenancy, and a tenant-post-install helm release where we allocate additional resource we want to make available to the newly created tenant.

See the example of repository layout that implements charts for Capsule Tenancy and Capsule Tenant Post Install.

.
├── argocd-project
│   ├── Chart.yaml
│   ├── templates
│   │   ├── _helpers.tpl
│   │   ├── appproject.yaml
│   │   ├── repository.yaml
│   │   └── root-app.yaml
│   └── values.yaml
├── tenant
│   ├── Chart.yaml
│   ├── templates
│   │   ├── _helpers.tpl
│   │   └── tenant.yaml
│   └── values.yaml
└── tenant-post-install
    ├── Chart.yaml
    ├── templates
    │   ├── _helpers.tpl
    │   └── default-namespace.yaml
    └── values.yaml

This is particularly convenient, for example, whenever we want to enrich tenant’s information (e.g. additionalMetadata) We can just update the tenant helm chart (bumping its version) and the reconciliation process will update all the tenants with the new information.

Example of live production tenant manifest

The following yaml is an example of Tenant Resource in our cluster.

The interesting aspect is how we introduce traceability of the operations via additionalMetadata that will be associated to the entities created (namespace and services in this case) by the tenant.

Platform team will be able to understand who created it (we use self-service API First approach for provisioning tenants, so you can see the api version and invocation-id for the operation) and the use of Network policies to ensure each tenant is isolated but reachable form the shared monitoring platform available in each cluster.

Click to expand the code snippet

apiVersion: capsule.clastix.io/v1beta2
kind: Tenant
metadata:
  annotations:
    meta.helm.sh/release-name: sp-cra-gchiesa-tenant
    meta.helm.sh/release-namespace: flux-system
    tenant.platform.tomtom-global.com/created-at: 2023-08-16T08:12:27.209003
    tenant.platform.tomtom-global.com/platform-api-invocation-id: 72b5611b-da8f-4417-bc9a-3b2c3da17869
    tenant.platform.tomtom-global.com/platform-api-version: 0.0.241
    tenant.platform.tomtom-global.com/slack-channel: storm-troopers
    tenant.platform.tomtom-global.com/tenant-name: sp-cra-gchiesa
    tenant.platform.tomtom-global.com/tenant-owner: [radacted]
  labels:
    app.kubernetes.io/managed-by: Helm
    helm.toolkit.fluxcd.io/name: sp-cra-gchiesa-tenant
    helm.toolkit.fluxcd.io/namespace: flux-system
    tenant.platform.tomtom-global.com/tenant-name: sp-cra-gchiesa
  name: platform-sp-cra-gchiesa
spec:
  ingressOptions:
    allowedClasses:
      allowed:
        - traefik
      matchLabels:
        app.kubernetes.io/name: traefik
    allowedHostnames:
      allowedRegex: ^[redacted].example.com$
    hostnameCollisionScope: Enabled
  limitRanges:
    items:
      - limits:
          - default:
              cpu: 500m
              memory: 128Mi
            defaultRequest:
              cpu: 500m
              memory: 128Mi
            type: Container
  namespaceOptions:
    additionalMetadata:
      annotations:
        tenant.platform.tomtom-global.com/created-at: 2023-08-16T08:12:27.209003
        tenant.platform.tomtom-global.com/platform-api-invocation-id: 72b5611b-da8f-4417-bc9a-3b2c3da17869
        tenant.platform.tomtom-global.com/platform-api-version: 0.0.241
        tenant.platform.tomtom-global.com/slack-channel: storm-troopers
        tenant.platform.tomtom-global.com/tenant-name: sp-cra-gchiesa
        tenant.platform.tomtom-global.com/tenant-owner: [redacted]
      labels:
        tenant.platform.tomtom-global.com/tenant-name: sp-cra-gchiesa
    quota: 5
  networkPolicies:
    items:
      - ingress:
          - from:
              - namespaceSelector:
                  matchLabels:
                    tenant.platform.tomtom-global.com/tenant-name: sp-cra-gchiesa
              - namespaceSelector:
                  matchLabels:
                    kubernetes.io/metadata.name: ingress
              - namespaceSelector:
                  matchLabels:
                    kubernetes.io/metadata.name: monitoring
        podSelector: { }
        policyTypes:
          - Ingress
  owners:
    - clusterRoles:
        - admin
        - capsule-namespace-deleter
      kind: Group
      name: [ redacted ]
      proxySettings:
        - kind: StorageClasses
          operations:
            - List
        - kind: Nodes
          operations:
            - List
        - kind: IngressClasses
          operations:
            - List
  resourceQuotas:
    items:
      - hard:
          limits.cpu: 2000m
          limits.memory: 4000Mi
          requests.cpu: 1000m
          requests.memory: 2000Mi
    scope: Tenant
  serviceOptions:
    additionalMetadata:
      annotations:
        tenant.platform.tomtom-global.com/tenant-name: sp-cra-gchiesa
        tenant.platform.tomtom-global.com/tenant-owner: [redacted]
      labels:
        tenant-cost-center: [redacted]
    allowedServices:
      externalName: true
      loadBalancer: false
      nodePort: true
  storageClasses:
    matchLabels:
      kubernetes.io/cluster-service: "true"
status:
  namespaces:
    - sp-cra-gchiesa-default
  size: 1
  state: Active

Conclusion

Platform Engineering is the new way of managing infrastructure, optimising its usage and offloading the engineering teams from the burden of running and operating the complexity of it. Capsule is a great tool that helps us to achieve our goals, and we are happy to contribute back to the community with our experience.

Kubernetes multi-tenancy with Capsule