Explanation of Kubernetes Pod Priority, PriorityClass, and Preemption

What is Pod Priority in Kubernetes?

Pod priority is a scheduling feature in Kubernetes that allows Kubernetes to make scheduling decisions based on priority numbers by comparing other pods. Let’s take a look at the following two main concepts in pod priority.

Pod Preemption Policy
Pod Priority

Pod Preemption Policy

The pod preemption policy allows Kubernetes to preempt (evict) lower-priority pods from nodes when there are higher-priority pods in the scheduling queue and no node resources are available.

Pod Priority

To assign a certain priority to a pod, you need a PriorityClass.

You can set the priority for a Pod using a PriorityClass object (non-namespaced) with a value.

The value determines the priority. It can be 1,000,000,000 (one billion) or lower. The larger the number, the higher the priority.

The name of the priority class (priorityClassName) will be used in the pod specification to set the priority.

If you do not want the priority class to preempt Pods, you can set PreemptionPolicy:Never. By default, PriorityClass uses the PreemptLowerPriority policy.

Example of Pod PriorityClass

The following example has a PriorityClass object and a pod that uses the PriorityClass.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority-apps
value: 1000000
preemptionPolicy: PreemptLowerPriority
globalDefault: false
description: "Mission Critical apps."
---
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: dev
spec:
  containers:
  - name: web
    image: nginx:latest
    imagePullPolicy: IfNotPresent
  priorityClassName: high-priority-apps

High Priority Classes in Kubernetes System

How to protect critical system Pods from preemption?

Kubernetes has set two default high priority classes

system-node-critical: The value of this class is 2000001000. Static Pods, such as etcd, kube-apiserver, kube-scheduler, and controller manager, use this priority class.system-cluster-critical: The value of this class is 200000000. Plugin Pods like coredns, calico controller, metrics server, etc., use this Priority class.

How do Kubernetes Pod Priority & Preemption work?

If a pod is deployed with PriorityClassName, the priority admission controller will use the PriorityClassName value to obtain the priority value.
If there are many Pods in the scheduling queue, the scheduler will arrange the scheduling order based on priority. That is, the scheduler will place high-priority pods before low-priority pods.
Now, if there are no available resources on nodes to accommodate higher-priority pods, the preemption logic will kick in.
The scheduler will preempt (evict) low-priority pods from nodes to schedule high-priority pods. The evicted pods will receive a graceful termination time of 30 seconds by default. If the Pod has a terminationGracePeriodSeconds set for the preStop container lifecycle hook, it will override the default 30 seconds.
However, if for some reason the scheduling requirements are not met, the scheduler will continue to schedule lower-priority Pods.

Now we know how Kubernetes pod scheduling priority works with PriorityClass and preemption.

Common Questions about Pod Priority

Here are some common questions about Pod priority.

What is Kubernetes DaemonSet Priority?

DaemonSets have priorities just like any other pod. Therefore, if you want your DaemonSet to be stable and not be evicted during node resource crunch, you need to set a higher pod PriorityClass for the DaemonSet.

What is the relationship between Pod QoS and Pod priority and preemption?

Kubelet first considers the QoS class and then the pod priority value to evict pods. This only happens when there is a shortage of node resources.

However, the preemption logic will only kick in when high-priority pods are in the scheduling queue. The scheduler will ignore the QoS of pods during pod preemption. QoS-based evictions occur due to resource crunch without a scheduling queue.

What is the significance of Pod Priority?

When you deploy applications in production on Kubernetes, there are certain applications you do not want to be killed. For example, metric collectors DaemonSets, log agents, payment services, etc.

To ensure the availability of critical task pods, you can create a hierarchy of pods with priorities; when the cluster experiences resource crunch, kubelet will attempt to terminate low-priority pods to accommodate higher-priority pods.

How does PodDisruptionBudget affect preemption?

When the scheduler considers pods for preemption, it tries to respect the PDB constraints as much as possible. It attempts to avoid violating the PDB by ensuring that the minimum number of pods specified in the PDB remains running.

However, if a very high-priority pod needs to be scheduled, Kubernetes will still evict low-priority pods, even if it violates the PDB rules.

END

Explanation of Kubernetes Pod Priority, PriorityClass, and Preemption

Previous Recommendations

Several Ideas for Optimizing Disk I/O PerformanceComplete Nginx Configuration Manual Applicable to Production Environments17 Best Linux Networking and Troubleshooting Commands

Understanding Nginx’s High Concurrency Principles and Configuration Tuning

How to Optimize High Concurrency Services with Over 300,000 QPS

Linux System Security Hardening Guide

Kubernetes 1.27.3 Cluster Deployment Plan

Complete Nginx Configuration Manual Applicable to Production Environments

Related posts

Leave a Comment Cancel reply