Kubernetes multi-tenancy is a concept that defines how multiple tenants, such as teams or individuals within an organization, share the same Kubernetes infrastructure. Traditionally, to solve the demand for environment access, administrators would create or allow teams to create separate full Kubernetes clusters, ballooning the infrastructure complexity, costs, operational overhead, and potential security concerns as more tenants and the demand for environments per tenant increases. This approach does not enable administrators to scale quickly as the number of tenants and cluster demand increases. This “mushroom farm” problem has led to teams pursuing a multi-tenancy strategy and to develop tooling to support multi-tenancy.
With proper configuration, Kubernetes allows cluster administrators to create a shared infrastructure where multiple tenants can run separated workloads within a single Kubernetes environment. Administrators work to divide infrastructure based on their tenant's requirements to meet their use cases while balancing operational overhead.
Administrators benefit from tenants co-existing on shared resources by reducing the amount of infrastructure requiring development and maintenance, simplifying the operational overhead, and reducing costs.
The drawbacks of relying only on Kubernetes-native features for multi-tenancy are complexity in configuration management, difficulty in scaling across multiple clusters, and a lack of user experience (UX) features. Implementing a self-service, scalable platform for tenants to easily leverage is beyond the scope of what Kubernetes is natively designed to provide, so other tools should be explored to simplify management and enhance the user experience.
This article will explore Kubernetes multi-tenancy in detail, including different use cases, challenges for cluster administrators, and multi-tenancy solutions.
The table below summarizes key Kubernetes multi-tenancy concepts this article will explore in more detail.
Many common scenarios are driving the adoption of multi-tenant architectures using Kubernetes. The sections below will review the three main types of Kubernetes multi-tenancy.
An organization may have many separate teams developing workloads for Kubernetes without choosing to run the cluster infrastructure themselves. A modern pattern for these organizations is to have a dedicated "platform team" responsible for centrally managing unified Kubernetes infrastructure, enabling other teams to deploy their workloads.
In this scenario, administrators may share a cluster amongst multiple teams by logically dividing resources with Kubernetes APIs like RBAC and Namespaces. Administrators can use Kubernetes API objects to limit resources, security privileges, network access, etc., at a per-team level. Each team will be allowed access to deploy and modify only the specific resources they own. The organization benefits from a multi-tenant approach with this use case by mitigating each team's need for a dedicated cluster. Instead, they share infrastructure between teams to reduce costs and operational overhead.
Software developers typically use multiple environment types to build and test their applications. These may include development, testing, and production environments. Each environment serves a different purpose for the team and comes with a unique set of requirements. Creating and managing unique configurations for each environment type on separate infrastructure may be operationally complex and expensive.
Operational overhead can be simplified by setting up multi-tenant clusters for both production and development environments. This will help reduce the burden on teams and individuals creating and maintaining their own development, testing, and production cluster infrastructure.
For example:
These environment types can both be provisioned via multi-tenant clusters, allowing developers to quickly access an elastic development environment for testing purposes, and deploying to a flexible production environment afterwards. While a development and production environment will typically be created on separate clusters, sharing the clusters between multiple tenants will reduce the organization’s operational overhead in creating and maintaining their infrastructure. Having fast and easy access to scalable and flexible cluster resources is critical for developer productivity.
{{banner-1="/utility-pages/banners"}}
Software-as-a-service (SaaS) vendors will often run many customer workloads on shared Kubernetes infrastructure. A multi-tenant design for this use case allows the vendor to scale to a larger volume of customers than attempting to create and manage a cluster-per-customer. There are very few built-in Kubernetes objects available for enforcing strict security boundaries and isolation between tenants, which is a critical requirement for customer-facing environments. Strong security boundaries will require additional tools beyond what Kubernetes is able to provide via built-in objects like namespaces.
Multi-tenancy for customer environments is often a SaaS requirement, particularly when the volume of customers is high. It isn't practical for most organizations to create and operate thousands of Kubernetes clusters.
Users should verify their platform providers have implemented strict security controls to strongly isolate their workloads. Deploying applications with sensitive data to platform providers who implement only namespace-level tenancy may pose a security risk since many other tenants are sharing the same infrastructure with only minor isolation boundaries.
Implementing multi-tenancy with Kubernetes can be implemented in several ways. Each approach has tradeoffs, and administrators must carefully evaluate their use case and tenant requirements to select an appropriate method.
The simplest approach to implementing Kubernetes multi-tenancy is namespace-level tenant isolation. A namespace is a Kubernetes resource acting as a logical separator within a Kubernetes cluster. It defines a boundary for objects deployed "inside" the namespace, such as pods, deployments, services, and access control resources.
Namespace-level multi-tenancy typically involves assigning each tenant their own namespace and granting them access to exclusively deploy objects within that namespace. The tenant can only modify workloads within that namespace without access to other namespaces' resources. This allows each tenant to operate with a degree of isolation from other tenants in the same cluster.
Many aspects of a cluster can be configured to enable namespace-level multi-tenancy, including:
ResourceQuotasand LimitRange: These objects are built-in to the Kubernetes API, and allow administrators to define the default and maximum level of resources tenants can request for a given namespace. ResourceQuota objects can be used to limit the consumption of CPU, memory, disk space, custom resources (like GPUs), and the total number of Kubernetes objects (such as the maximum number of deployments per namespace). LimitRange objects allow administrators to apply sensible default settings, such as a minimum amount of CPU/memory which should be granted to each pod. Combining these objects allows administrators to control resource utilization granted to a tenant's namespace.
The ResourceQuota above sets a strict CPU and memory limit on the “team-1” namespace. New pods will fail to deploy if the namespace’s resource utilization limit is reached. The LimitRange sets a default CPU value for every container created in the “team-1” namespace.
NetworkPolicies: Administrators can control network traffic within a cluster using these objects. NetworkPolicies enable functionality like allowing egress traffic to particular endpoints, allowlisting ingress for specific ports, and granting access for cross-namespace communication. Implementing NetworkPolicies helps isolate the network access of a tenant's workloads, preventing unnecessary communication and reducing the blast radius of a compromised workload. NetworkPolicies are available to administrators using supported Container Network Interface (CNI) plugins, like Cilium.
The above NetworkPolicy will allowlist egress traffic from pods labeled “role: backend” to a specific IP range. The IP range may represent a corporate network (for example), ensuring the “team-1” namespace pods can only communicate within the corporate network without access to the internet.
Because Kubernetes allows for cross-namespace networking, administrators must be careful in how network policies are implemented and who has access to them. Enforcing networking isolation is one of the most challenging aspects of multi-tenancy.
Role-based access control (RBAC): Kubernetes RBAC objects are built-in to the API to allow administrators to customize what Kubernetes API resources can be accessed by tenants and what actions can be performed on the object. For example, tenants may be granted access to create pod objects within a particular namespace but be blocked from accessing resources in other namespaces. RBAC is essential to implementing secure isolation in multi-tenant clusters, as mitigating unwanted access to the Kubernetes API is critical for cluster security.
The above Role object will allow read-only access to pods in the “team-1” namespace. Granting this role to the team-1 tenants will ensure their access is isolated to their namespace only. No other pods from other namespaces are accessible with this Role.
StorageClassesand PersistentVolumes: Administrators may configure many types of storage in their clusters, such as network filesystems, block storage, throughput-optimized storage, etc. Administrators may choose to control which tenants can access certain types of storage and their backup policies, deletion policies, and maximum storage usage. Tenants will also typically need isolated storage devices to ensure data is not exposed to other tenants. Using RBAC along with StorageClass and PersistentVolume resources allows administrators to grant flexible storage access while maintaining isolation at a namespace level.
Container sandboxing: Another useful approach to implementing secure multi-tenancy involves enabling container sandboxing. Containers provisioned on a Kubernetes worker node typically share the worker node's operating system kernel. Sharing a kernel between all containers on a worker node can be a security risk. Any compromised container may also gain unwanted access to the kernel and other running processes (like other containers running on the same worker node). Sandboxing tools can implement a userspace-kernel to capture any system calls executed by the container, restricting access to the node's real kernel. This can prevent unwanted access to critical kernel settings like the system time or filesystem attributes. Since worker nodes can host workloads from multiple namespaces, implementing container sandboxing is essential to designing a secure multi-tenant cluster.
OpenPolicyAgent Gatekeeper: This open-source project is a widely used tool for enforcing policies on objects deployed to a cluster. Administrators can deploy Gatekeeper to enforce rules for tenants, such as requiring CPU limits, label values, and preventing the creation of pods requesting root filesystem access. The policy language is flexible for administrators to enforce rules on any field of any Kubernetes objects, allowing granular control over how objects are deployed to a cluster. Gatekeeper is a standard tool for administrators implementing multi-tenancy.
Namespace-level multi-tenancy is an appropriate option for cluster administrators looking to implement a loosely isolated environment quickly.
However, security is limited with namespace-level multi-tenancy due to how it approaches isolation. While namespaces provide a degree of logical division within a shared Kubernetes cluster, they are not designed to provide complete and secure isolation by default. Kubernetes is highly permissive by default, meaning workloads have broad access to the cluster's resources and functionality unless explicitly restricted. This can be an issue if any configuration options are overlooked.
Administrators are responsible for implementing security measures to restrict access to all cluster elements. This requires careful evaluation of configurations for RBAC, NetworkPolicies, pod privileges, kernel configuration, and any other aspect of the cluster requiring strict isolation. Any misses may introduce vulnerabilities in the cluster.
Relying on administrators to perfectly configure the isolation boundaries- particularly at scale - introduces risk since human error is natural and expected in any environment. A misconfiguration when relying exclusively on namespace-based security controls may have a large blast radius when the boundaries are breached. For example, granting tenants additional network access via network policies may lead to breaches in the cluster's networking security model since Kubernetes allows global network access by default. Misconfigurations in RBAC objects could provide tenants access to objects that should only be managed by the administrator, and lead to unwanted configuration changes impacting the entire cluster. Mitigating the risk of a compromised cluster should be done with stronger isolation tools where security is enforced by default with minimal room for human error.
{{banner-2="/utility-pages/banners"}}
Virtual clusters can create many virtual control planes within a physical Kubernetes cluster. The concept involves hiding the real cluster API Server endpoint from the tenant and exposing a virtualized API Server instead. All requests for the Kubernetes API from the tenant are intercepted by the virtual API Server, which can then provide an abstract view of the cluster's resources.
Virtual clusters strike a balance between namespace-level tenancy — which has security risks due to the lack of strong isolation or secure default settings — and dedicated clusters, which are operationally complex and expensive to maintain.
Virtual clusters allow administrators to leverage shared cluster infrastructure while achieving much stronger isolation than namespaces can provide while benefiting from minimal operational overhead. Developers and testers benefit from access to a robust virtual cluster where they can test cluster-wide resources which is not possible in a namespace multi-tenant model.
The virtual cluster approach is similar to how a physical machine may host many virtual machines (VMs), where the VMs have no insight or visibility into other VMs. Applications running in a VM typically don't have access to the underlying physical machine hardware or configuration, enabling a degree of isolation for security purposes. It also allows users to freely deploy and modify the contents of the VM without interfering with other VMs on the same physical host.
Virtual clusters are created by installing a tool like the Uffizzi Operator in the physical Kubernetes cluster, which can provision virtual clusters. The operator can generate “Kubeconfig'' files for tenants to use with their local Kubectl client. This new Kubeconfig will contain a virtual API Server endpoint specific to the tenant, separate from the actual global API Server endpoint.
Here is an example of a non-virtual Kubeconfig file used by the administrator to connect normally to their real API Server:
And here is an example of a Kubeconfig which administrators may provide to their tenant (in this case, called “team-1”), which points to a different API Server endpoint:
In this example, the “team-1-virtual.api.kubernetes.mycompany.com” endpoint will point to the virtual cluster Operator application running in the cluster, receiving all API calls made by the tenant’s Kubectl client. Standard authentication and authorization security requirements will still apply, like implementing a client certificate to gain access to the API Server endpoint.
The view provided by each tenant’s virtual API Server will hide details about the rest of the physical cluster. The tenant’s virtual API Server will only provide access and information about objects the tenant has created. Since the virtual API Server is a separate component intercepting traffic to the actual API Server, the attack surface to the broader cluster is reduced.
Here is an illustration of the result. The first command-line output shows an administrator using Kubectl to query the list of pods from the cluster. The administrator uses a Kubeconfig containing the real API Server endpoint, and therefore, Kubectl can fetch data on the real state of the cluster. This access is for administrators only and will not be granted to tenants.
We can see pods from all namespaces, including a namespace called “team-1,” where a tenant runs their isolated workload. We don’t want to grant this global access to any tenants, so this access will be for administrators only.
Next, we have a tenant running Kubectl to query pods as well. The tenant was granted access to a separate Kubeconfig file connecting to a virtual API Server endpoint. When Kubectl queries this endpoint to discover all running pods, the virtual API Server only responds with information about pods running in the “team-1” namespace. The tenant cannot view or modify all the other pods running in the cluster.
The virtual cluster concept allows administrators to generate sandboxed environments per namespace where tenants can deploy and modify Kubernetes objects without impacting virtual clusters of other tenants or the underlying physical cluster.
{{banner-2="/utility-pages/banners"}}
Administrators should gather detailed requirements from their tenants to ensure they select the appropriate multi-tenancy strategy. Information about the tenant's use cases and future plans will help administrators to make an informed decision that suits all tenants in the cluster.
Key factors to consider when comparing Kubernetes multi-tenancy strategies include:
A high-quality observability setup is critical for administrators to manage any multi-tenant cluster effectively. Implementing observability involves setting up tooling to enable visibility into metrics, logs, and traces to help administrators gain insight into the inner workings of the cluster and its workloads. This information will be critical for monitoring performance bottlenecks, investigating breakages, incident analysis, and proactively responding to security incidents.
Since administrators are responsible for managing the multi-tenancy platform, they will need data to perform their operational responsibilities effectively. The data administrators need to gather from the cluster may include:
Observability is a key aspect of any multi-tenancy strategy. Administrators need this data to effectively manage and troubleshoot their clusters, while tenants benefit from insight into their running workloads.
Multi-tenancy in a Kubernetes environment is an important concept for any Kubernetes administrator. The key points to consider include which Kubernetes multi-tenancy strategies are available and how to select an appropriate one to fit tenant requirements and mitigate unnecessary risks and administrative overhead.
Another point to consider is implementing "Intermediated Access" to the Kubernetes API Server. Tenants in a multi-tenant cluster can forward their Kubernetes API requests through a proxy like the Uffizzi Controller, which by design limits capability, assumes all of the management responsibility for multi-tenancy, and forwards or “intermediates” requests to the control plane.
This introduces an additional layer of management and security for multi-tenant environments by ensuring tenants have least privilege access to only what they need and ensures they cannot interfere with the workloads of other tenants. With intermediated access Developers have a better UX by being able to securely provision virtual clusters through a simplified control plane and administrators have a higher degree of control over the underlying infrastructure through an intermediary that is designed to securely manage multi-tenancy.
Administrators who are able to implement an effective multi-tenancy strategy will reap high returns on investment for their organizations through reduced operational overhead, reduction in cloud costs, and by dramatically improving the productivity of their development teams.