Kubernetes Resource Quota

March 14, 2024

min read

Kubernetes resource quotas are crucial to managing resource consumption and ensuring fair allocation among tenants in multi-tenant clusters. Administrators can control resource usage and prevent overconsumption by setting up resource quotas on a per-namespace level. This article discusses the benefits of resource quotas in multi-tenant cluster management and the steps to deploy and test them.

By following best practices, administrators can effectively utilize Kubernetes resource quotas to manage multi-tenant clusters and provide a quality experience for all tenants. This article will explore resource quota concepts and best practices Kubernetes administrators can use to optimize multi-tenant cluster environments.

Summary of key Kubernetes resource quota concepts

The table below summarizes the Kubernetes resource quota concepts we will explore in this article.

Concept	Summary
What are Kubernetes resource quotas?	Resource quotas are a built-in Kubernetes object enabling per-namespace limits on resource consumption.
What are the types of Kubernetes resource quotas?	Resource quotas can be configured to restrict the usage of storage, compute, and object count.
How can resource quotas be validated?	This object can be tested by deploying manifests for resource quotas and pods and verifying if the resource quota restricts the creation of overconsuming pods.
What are resource quotas scopes?	Scopes are a resource quota feature for selecting which pods should be targeted for the quota. Multiple resource quotas can exist in the same namespace.
What are limit range objects?	Limit range objects inject default resource values into pods and can also define minimum/maximum values. This helps ensure resource quotas are applied effectively.
What are the limitations of Kubernetes resource quotas?	There are some significant drawbacks to using resource quotas in multi-tenant clusters, such as increased operational overhead when managing multiple tenants.
What are the best practices for Kubernetes resource quotas?	There are many best practices to consider when implementing resource quotas, including gathering tenant requirements, implementing observability to determine resource usage, enabling alerting when limits are hit, and ensuring limit ranges are enabled.

‍

What are Kubernetes resource quotas?

Kubernetes resource quotas are native Kubernetes objects that impose limits on resource consumption. Administrators can set up resource quotas to control resource consumption on a per-namespace level, ensuring each namespace only uses a fair share of resources.

Kubernetes resource quotas are critical for allowing administrators to manage multi-tenant clusters by ensuring each tenant's namespace does not over consume the cluster's resources. A key challenge for multi-tenant cluster administrators is managing an equitable distribution of resources between each tenant and maintaining a reasonable quality of service for all workloads.

Examples of resources that can be limited using resource quotas include:

Pod memory and CPU limits.
Pod memory and CPU requests.
Storage requests for Persistent Volume Claims.
Object count (such as the count of Deployments).

The below example of a Resource Quota object illustrates an example configuration. This object will restrict the total number of CPUs requests in the "team-1" namespace to 12, and the memory is limited to 8Gi.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: cpu-memory-quota
  namespace: team-1
spec:
  hard:
    cpu: "12"
    memory: "8Gi"

The above example will allow any combination of CPU and memory requests for pods deployed to the "team-1" namespace as long as the total values do not exceed the specified Resource Quota restriction. For example, the above Resource Quota will allow the following pod configurations to be deployed to the "team-1" namespace:

12 pods with spec.containers[].resources.requests.cpu = 1
4 pods with spec.containers[].resources.requests.cpu = 3
1 pod with spec.containers[].resources.requests.cpu = 12
8 pods with spec.containers[].resources.requests.memory = 1Gi
etc.

Any combination of CPU/memory requests is allowed if the total number does not breach the limit defined in the Resource Quota. New pods will fail to deploy to the relevant namespace if the limit is reached. The API Server will decline to deploy new pods by using the Resource Quota Admission Controller to determine whether any restrictions are being breached.

Remember that resource quotas and namespace-level isolation are not considered high-security solutions for restricting resource access. Tools like virtual clusters will be more appropriate for administrators with sensitive workloads requiring stricter boundaries for security purposes.

What are the types of Kubernetes resource quotas?

There are three types of resource quotas cluster administrators can implement:

Storage quotas

Administrators can limit the total persistent volume storage capacity requested by a namespace. Limiting storage access is useful when the cluster's storage capacity is limited and must be allocated fairly among tenants.

Resource quotas support limiting storage utilization based on either:

‍Storage size: The amount of storage requested across all pods in the namespace. The following example will restrict the "team-1" namespace's total storage utilization to 500Gi:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-size-quota
  namespace: team-1
spec:
  hard:
    requests.storage: "500Gi"

Number of Persistent Volume Claims (PVCs): Administrators may want to limit the count of PVCs for cost control purposes and infrastructure limitations. For example, creating a PVC in a cloud provider environment can create additional block storage volumes. Creating excessive PVCs in this scenario could lead to hitting cloud provider account limits and uncontrolled costs. Administrators would benefit from limiting the number of PVCs allowed per namespace in a multi-tenant cluster to manage the amount of storage infrastructure consumed by the cluster. The following example will restrict the number of PVC objects to 10:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: pvc-count-quota
  namespace: team-1
spec:
  hard:
    persistentvolumeclaims: "10"

Amount of storage class resources: A cluster administrator may configure multiple Storage Classes associated with different volume types. For example, a "basic" Storage Class may utilize slower magnetic drives for cost efficiency at the expense of speed. A "fast" Storage Class may provide access to solid-state drives for faster storage requirements. Administrators can configure resource quotas to limit access to particular Storage Classes to ensure each volume type is fairly consumed. The following example will restrict usage of the "basic" Storage Class to 900Gi and usage of the "fast" Storage Class to 200Gi:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-class-quota
  namespace: team-1
spec:
  hard:
    basic.storageclass.storage.k8s.io/requests.storage: "900Gi" 
    fast.storageclass.storage.k8s.io/requests.storage: "200Gi"

Compute quotas

A common use case for resource quotas is to limit the amount of CPU and memory resources requested by a namespace. Pods can specify requests and limits for each compute resource, and resource quotas can restrict both.

A pod's compute requests specify the minimum resources required to run. The kube-scheduler will use this value to determine which worker node has available capacity to run the pod. A pod's compute limit specifies the maximum amount of resources the pod can consume before being throttled or evicted from the worker node to avoid disrupting neighboring pods. A pod can consume more resources than specified in the requests up to the specified limit value.

The below example shows how to define pod specifications for requests and limits:

apiVersion: v1
kind: Pod
metadata:
  name: webserver
  namespace: team-1
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        memory: "1Gi"
        cpu: "2"
      limits:
        memory: "4Gi"
        cpu: "6"

Kubernetes resource quotas for compute resources will typically be configured to restrict both requests and limits to maintain a healthy cluster. Restricting the request values will ensure available worker node resources to schedule pods successfully, and restricting the limit values will ensure pods continue operating within reasonable boundaries. The restrictions may be based on the maximum compute resources available in the worker nodes configured for the cluster. Allowing requests and limits beyond the worker node's capabilities will lead to issues like unscheduled pods.

The below example shows a resource quota that restricts requests and limits CPU and memory:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: cpu-memory-quota
  namespace: team-1
spec:
  hard:
    requests.cpu: "4"
    limits.cpu: "8"
    requests.memory: "2Gi"
    limits.memory: "4Gi"

Object count quotas

Kubernetes resource quotas can restrict the number of objects deployed to ensure cluster infrastructure isn't being overconsumed. For example, administrators can restrict the number of Nodeport Services a tenant can create to ensure a particular tenant doesn't overconsume the cluster's worker node ports. Nodeport Services will reserve a port number across all nodes in the cluster, so these ports are finite resources to which administrators should limit access. Another object that resource quotas can restrict is a Load Balancer Service. Creating these Services may trigger the creation of additional cluster infrastructure (depending on the cluster's configuration), so limiting access to them can be important for cost control.

The example below shows a Kubernetes resource quota that restricts both Nodeport and Load Balancer Services:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: service-quota
  namespace: team-1
spec:
  hard:
    services.loadbalancers: "2"
    services.nodeports: "10"

Objects such as PVCs, Secrets, and ConfigMaps can also be restricted.

How can a Kubernetes resource quota be validated?

Let's try deploying a Kubernetes resource quota object and a deployment to consume some compute resources to see what happens. You’ll need a running Kubernetes cluster to deploy these objects. You can use tools like KIND and Minikube to set up a local cluster for testing purposes.

Create a text file called "cpu-quota.yaml" and paste the below contents:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: cpu-quota
  namespace: team-1
spec:
  hard:
    requests.cpu: "2"

This object will apply a CPU request limit in the "team-1" namespace. Let's create the namespace and apply the resource quota:

$ kubectl create namespace team-1

namespace/team-1 created

$ kubectl apply -f cpu-quota.yaml

resourcequota/cpu-quota created

We can verify the resource quota was created by describing the object via Kubectl:

$ kubectl describe resourcequota cpu-quota --namespace team-1

Name:         cpu-quota
Namespace:    team-1
Resource      Used  Hard
--------      ----  ----
requests.cpu  0     2

The output shows the resource quota created and how many CPU requests have been consumed in the "team-1" namespace. This is useful for checking whether a namespace is close to reaching its limit so the administrator may take action like warning the tenant or raising the quota. Enabling alerts with tools like Prometheus is useful in combination with resource quotas.

Now let's create a Deployment in an "nginx-deployment.yaml" file, which consumes 2 CPUs, which our quota allows:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  namespace: team-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "2" # <------ CPU requests are set.

Let's create the deployment:

$ kubectl apply -f nginx-deployment.yaml
deployment.apps/nginx-deployment created

$ kubectl get pods --namespace team-1
NAME                               READY   STATUS    RESTARTS   AGE
nginx-deployment-ffdd5c899-5hcf2   1/1     Running   0          33s

The deployment’s pod has been created successfully. Now let's describe the resource quota object again:

$ kubectl describe resourcequota cpu-quota --namespace team-1

Name:         cpu-quota
Namespace:    team-1
Resource      Used  Hard
--------      ----  ----
requests.cpu  2     2 # <----- The "Used" value has changed.

We can see the "Used" value in the resource quota has changed to reflect the CPU requests consumed by the Deployment we created earlier. The resource quota will not allow any more CPU requests in the "team-1" namespace because the "Used" CPU requests value has reached the restriction we specified when creating the Resource Quota.

We can verify the resource quota will now block any new pods from deploying because the CPU requests will breach the quota. Create a "pod.yaml" file with the contents below:

apiVersion: v1
kind: Pod
metadata:
  name: webserver
  namespace: team-1
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        cpu: "1" # <-- This CPU request exceeds the Resource Quota

Now let's try deploying this pod and observe what happens:

$ kubectl apply -f pod.yaml

Error from server (Forbidden): error when creating "pod.yaml": 
pods "webserver" is forbidden: exceeded quota: 
cpu-quota, requested: requests.cpu=1, used: 
requests.cpu=2, limited: requests.cpu=2

The output shows the new pod will exceed the CPU requests quota, so the pod cannot be created. This demonstrates how a Kubernetes resource quota can restrict resource consumption within a namespace.

Administrators can validate their ResourceQuota objects with tools like Kubeconform and “kubectl --dry-run” to ensure the object schema is matching the Kubernetes API schema. This will help detect configuration issues before objects are deployed to the cluster.

What are resource quota scopes?

Kubernetes resource quota scopes are a helpful tool for applying multiple quotas in the same namespace to different workloads. Kubernetes resource quotas can be configured to apply to particular pods selectively. Administrators may want a quota to restrict certain types of pods while allowing other pods more freedom. The resource quota "scope" feature is an approach for selectively enforcing restrictions on resource consumption.

A resource quota scope defines what pods the resource quota will restrict. A scope can be pod attributes like Priority Class and Quality of Service Class.

Let's implement an example where two resource quotas are configured to target different pod Priority Classes. Create a file called "quotas-and-classes.yaml" with the following contents:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: high-priority-cpu-quota
  namespace: team-1
spec:
  hard:
    requests.cpu: "4"
  scopeSelector:
    matchExpressions:
      - scopeName: PriorityClass # <--- Scope is specified here.
        operator: In
        values:
          - high
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: low-priority-cpu-quota
  namespace: team-1
spec:
  hard:
    requests.cpu: "2"
  scopeSelector:
    matchExpressions:
      - scopeName: PriorityClass # <--- Scope is specified here.
        operator: In
        values:
          - low
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high # <--- Priority Class name is referenced by pods.
value: 10
globalDefault: false
description: "High priority class for critical workloads only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low # <--- Priority Class name is referenced by pods.
value: 1
globalDefault: false
description: "Low priority class for regular workloads.

The above YAMl manifest will create:

A resource quota restricting a "high" Priority Class to 4 CPU requests.
A resource quota restricting a "low" Priority Class to 2 CPU requests.
Two Priority Class objects called "high" and "low".

Next, deploy the objects:

$ kubectl apply -f quotas-and-classes.yaml

resourcequota/high-priority-cpu-quota created
resourcequota/low-priority-cpu-quota created
priorityclass.scheduling.k8s.io/high created
priorityclass.scheduling.k8s.io/low created

Now, when we create a pod, we expect the restriction on CPU requests to vary based on the pod's Priority Class name. Let's test this out by creating a "pod.yaml" defining a pod with a "low" Priority Class:

apiVersion: v1
kind: Pod
metadata:
  name: webserver
  namespace: team-1
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        cpu: "4" # <---- This pod requests too many CPUs.
  priorityClassName: low # <-- Priority Class is checked by the Resource Quota.

Let's apply the above pod and see what happens:

$ kubectl apply -f pod.yaml

Error from server (Forbidden): error when creating "pod.yaml": 
pods "webserver" is forbidden: exceeded quota: low-priority-cpu-quota, requested: 
requests.cpu=4, used: requests.cpu=0, limited: requests.cpu=2

The error message shows the pod with the "low" Priority Class is rejected because a Resource Quota called "low-priority-resource-quota" only allows 2 CPU requests instead of 4 for this pod type. The scope of the Resource Quota is targeting pods with the "low" Priority Class.

To deploy a pod with 4 CPU requests, we need to change the Priority Class of the above pod to "high." This will trigger a different resource quota called "high-priority-resource-quota," which allows 4 CPU requests. The scope of this resource quota will target pods with the "high" Priority Class.

priorityClassName: high # <--- Change from "low" to "high".

Try redeploying the pod:

$ kubectl apply -f pod.yaml

pod/webserver created

Now, the pod is deploying successfully. The resource quota called "high-priority-resource-quota" can select pods with a "high" Priority Class, and this resource quota allows the 4 CPU requests.

What are limit range objects?

Limit range objects are another feature built-in to all Kubernetes clusters. They are used to set default requests and limits for any pods that do not have these values set explicitly.

This object is relevant for users implementing resource quotas because the restrictions of a Resource Quota are only applied if the requests and limits are set in a pod. Kubernetes allows pods to omit the resource request and limit value, but this has the adverse effect of bypassing Resource Quota restrictions.

Limit Range objects help mitigate this problem by enforcing a default value for pod resources in a given namespace. Administrators can implement this object to guarantee that every deployed pod will have resource values set. For example, if we want to use a Resource Quota to restrict memory requests, we can use Limit Range objects to ensure every pod has a memory request value specified.

Limit Range objects can also specify the minimum and maximum resources requested by a pod. While a Resource Quota will enforce restrictions on the sum total of all resource requests in the whole namespace, Limit Range can be more granular by restricting how many resources individual pods can request. This can be helpful to ensure a single pod doesn't monopolize all resources within the namespace's quota and a reasonable minimum default value is applied.

Let's create a Limit Range object and a pod to see how they interact. Create a file called "limit-range-pod.yaml" with the following contents:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-resources
  namespace: team-1
spec:
  limits:
  - defaultRequest:
      cpu: "1" # <--- This injects default CPU values into pods.
      memory: "2Gi" # <--- This injects default memory values into pods.
    type: Container
---
apiVersion: v1
kind: Pod
metadata:
  name: webserver
  namespace: team-1
spec:
  containers:
  - name: app
    image: nginx # <--- No CPU and memory request values are specified.

Now let's apply both objects:

$ kubectl apply -f limit-range-pod.yaml

limitrange/default-resources created
pod/webserver created

Notice that we did not apply any resource requests in the pod's schema. Let's check the pod's attributes by running:

$ kubectl describe pod webserver --namespace team-1

Name:         webserver
Namespace:    team-1
Status:       Running
Containers:
  app:
    Image: nginx
    Requests:
      cpu: 1
      memory: 2Gi

The limit range object injected the CPU and memory resource requests. The values match the defaults we specified in the Limit Range, and this will now ensure all pods have a default value applied when deployed to the "team-1" namespace. Implementing this object will allow resource quotas to enforce CPU and memory request limitations because the limit range fills in the missing values.

Implementing limit range objects is commonly done alongside resource quota objects because their functionalities complement each other.

What are the limitations of Kubernetes resource quotas?

There are drawbacks to Kubernetes resource quota objects that administrators should consider. Let’s take a closer look at three resource quota limitations.

Operational overhead

Creating and maintaining separate resource quotas and limit range objects manually across every namespace increases operational overhead and risks human error. A misconfigured or missing resource quota will significantly adversely impact other workloads in the cluster because certain workloads will operate without limitations on resource consumption. Misconfiguring limit range objects will prevent sensible default resource allocations and will prevent resource quotas from functioning correctly. This is a critical risk for administrators relying on resource quotas as their only approach to resource isolation in a multi-tenant cluster.

Implementing virtual clusters will provide a better experience for administrators because resource quotas and limit range objects can be enabled by default, reducing the operational overhead and the risk of human error. This approach will guarantee every virtual cluster has a default resource quota to limit resource consumption and a Limit range resources to set sensible defaults to maintain a healthy multi-tenant environment.

Access control challenges

Security of resource quotas and limit range objects is also a significant administrative challenge. The only approach natively available in Kubernetes to restrict access to these objects is role-based access control (RBAC). RBAC will allow administrators to prevent tenants from accessing sensitive objects like Kubernetes resource quotas. However, configuring RBAC to avoid unauthorized access in a multi-tenant cluster may be complicated and involve further operational overhead. It is essential to prevent tenants from having access to these object types due to the impact of an unwanted configuration change. Relying exclusively on RBAC to avoid this access is a thin layer of defense.

Limited isolation

Resource quotas can also only restrict a limited type of Kubernetes objects. Custom resource definitions (CRDs) cannot be restricted, and this is an important object type to control in a multi-tenant cluster. CRDs are cluster-wide by default; therefore, unwanted modifications or excessive numbers of CRDs will potentially impact many tenants. Virtual clusters help restrict CRDs to a particular namespace, which Kubernetes cannot do natively. This enables administrators to manage the use of CRDs in a multi-tenant cluster more effectively.

Virtual clusters provide an additional layer of isolation and help prevent tenants from accessing objects outside of their own virtual cluster. Since each virtual cluster has its own control plane, a security boundary prevents cross-namespace API access to foreign Kubernetes objects. This improves the security posture of multi-tenant environments.

What are best practices for Kubernetes resource quotas?

There are some key best practices administrators will benefit from following to ensure resources in their multi-tenant clusters are managed effectively:

Set realistic restrictions based on workload requirements. Administrators should carefully define quotas based on discussions with tenants, workload requirements, and available cluster capacity. Proper cluster capacity planning will be important for ensuring resource quotas are dividing capacity appropriately.
Communicate restrictions to tenants. Administrators should document and communicate restrictions to avoid unwanted surprises where pods fail to deploy. Any changes to restrictions should be communicated in advance to ensure workloads can adapt to changing requirements.
Use observability data to monitor usage. Cluster observability data can provide insight into resource utilization metrics. Administrators can compare this data with Resource Quota configurations for analysis purposes, such as checking which tenants are close to reaching their limits, what resources are close to exhaustion, what pods are consuming the most quota resources, etc. Observability data is critical for administrators to ensure resource allocation is done appropriately and that resource quotas are helping rather than hindering workloads from operating correctly.
Implement alerting for pods for exhausted quotas. Administrators should configure alerts via tools like Prometheus Alert Manager to ensure visibility into exhausted resources and tenants experiencing adverse impacts from improperly configured Resource Quotas.
Use resource quota scopes for more granular control. Enabling multiple Resource Quota objects per namespace to target various pod types based on attributes like Priority Classes will provide more granular control over resource restrictions. By implementing the scope feature, administrators can provide more specific resource quotas for particular pods.
Ensure limit range objects are in use to enforce default resource values. resource quotas are of limited use when resource request values are omitted from pods, so implementing limit range objects is required to mitigate this issue. Requirements from tenants should be gathered to configure limit range values accurately, such as what minimum/maximum resource values are appropriate.
Ensure resource quotas are applied in every namespace where tenant workloads are deployed. Leveraging tools to automate this will reduce human error and operational overhead. Implementing virtual cluster technology will help achieve this objective because each virtual cluster will contain a built-in resource quota, ensuring resource restrictions are available by default. Here is an example virtual cluster Operator from Uffizzi.
Eliminate unauthorized modifications to resource quota objects. Non-administrators should not be allowed to edit resource quota objects. While Kubernetes provides RBAC to manage object access, administrators can also implement additional Controllers/Operators to maintain resource quota objects on the administrator's behalf like the Uffizzi Controller. This type of tool provides “intermediated access” to the Kubernetes API Server, ensuring client requests pass through a Controller first before reaching the control plane. This has the benefit of an added layer of security because requests to sensitive objects like resource quotas must proxy through the intermediary.

Learn about the various Kubernetes sandbox solutions and how they can be useful for app development, testing, and learning purposes.

Read the Guide