Differences and Relationships Between Pod Scheduling, Preemption, and Eviction in Kubernetes

This article shares the concepts related to Pod scheduling, preemption, and eviction, which are mostly theoretical but very important and often asked in interviews. I hope you can read it patiently!

Kubernetes (K8s), as a container orchestration platform, has mechanisms for <span>Scheduling</span>, <span>Preemption</span>, and <span>Eviction</span> that are key to ensuring reasonable allocation of cluster resources, efficient usage, and healthy operation of applications.

Scheduling refers to ensuring that Pods are matched to suitable nodes so that kubelet can run them.

Preemption refers to the process of terminating lower-priority Pods so that higher-priority Pods can be scheduled onto the Node.

Eviction is the process of proactively causing one or more Pods to fail on nodes that are resource-constrained.

Next, we will detail these three concepts:

1. Scheduling

1.1. Overview of Scheduling

The scheduler discovers newly created Pods in the cluster that have not yet been scheduled to nodes through Kubernetes’ watch mechanism. The main task of scheduling is to assign Pods to suitable nodes in the cluster. The scheduler decides where to deploy Pods based on their requirements (such as CPU, memory, storage, etc.) and the resource status of nodes (such as available CPU, memory, node labels, etc.).

1.2 Scheduling Process

The Kubernetes scheduling process mainly includes the following steps:

1. Resource Requirements: Each Pod can declare its resource requirements, such as CPU and memory. This information is described by the resources.requests and resources.limits fields.

2. Node Filtering: The scheduler filters suitable nodes based on the Pod’s resource requirements and the resource status of the nodes. Filtering is done through a series of scheduling policies (such as available resources on nodes, Pod affinity and anti-affinity, etc.).

3. Node Ranking: Once nodes are filtered, the scheduler evaluates the merits of each node. Ranking is based on a set of priority policies (for example, Pod affinity, load balancing, Pod priority, etc.).

4. Pod Scheduling: The scheduler ultimately selects a best node and schedules the Pod to that node. Before being scheduled, the Pod’s status is Pending, and after scheduling, its status changes to Running.

1.3. Scheduling Policies

The default scheduler in Kubernetes is <span>kube-scheduler</span>, and the following are the most commonly used scheduling policies:

  • Pod Affinity and Anti-affinity: Defines whether Pods should run on the same node as specific Pods.
  • Priority and Preemption: Pods can set priorities, and higher-priority Pods can preempt resources from lower-priority Pods.
  • Taints and Tolerations: Nodes can have Taints, and Pods can set Tolerations. If a Pod does not tolerate a node’s Taint, it cannot be scheduled to that node.
  • Resource Limits: Controls the resources allocated to Pods on nodes to avoid resource overload.

2. Preemption

2.1. Overview of Preemption

The preemption mechanism in Kubernetes is used to ensure that high-priority Pods can “preempt” the resources of lower-priority Pods when resources are insufficient, thereby ensuring the operation of critical tasks. The goal of preemption is to ensure that high-priority Pods can be scheduled to nodes in the cluster first.

2.2. How Preemption Works

Preemption occurs under the following conditions:

  • When a high-priority Pod requests scheduling, there are currently insufficient resources.

  • When cluster resources are insufficient, the scheduler will look for resource-intensive, lower-priority Pods and attempt to evict them to free up space for high-priority Pods.

2.2.1. Triggering Preemption

  1. Priority Setting: The priority of a Pod is controlled by <span>PriorityClass</span>. PriorityClass is an API resource that defines the priority of a Pod. The higher the Pod’s priority, the higher the scheduling priority, and the more likely it is to trigger preemption.
  2. PreemptionPolicy: The preemption policy is defined by the Pod’s preemptionPolicy field, with common settings including:
  • <span>PreemptLowerPriority</span>: The default setting, allowing high-priority Pods to preempt low-priority Pods.
  • <span>DoNotPreempt</span>: Preemption is prohibited; Pods can only be scheduled when resources are available and will not preempt other Pods.

2.2.2. Preemption Process

  1. Resource Constraints: When the cluster’s resources (such as CPU, memory) are insufficient, the scheduler evaluates which Pods may be evicted.
  2. Evicting Lower-Priority Pods: The scheduler selects lower-priority Pods that meet the resource release conditions for eviction. The status of evicted Pods will change to Evicted and re-enter the scheduling queue.
  3. Pod Rescheduling: The evicted lower-priority Pods will be rescheduled by the scheduler to other nodes, provided there are sufficient resources.

2.3. Impact of Preemption on Scheduling

  • Guaranteeing High-Priority Tasks: Through the preemption mechanism, Kubernetes can ensure that critical tasks (such as high-priority applications) can obtain resources in a timely manner.
  • Difference Between Preemption and Eviction: Preemption typically refers to the scheduler actively removing lower-priority Pods from nodes, while eviction is an operation automatically triggered by Kubernetes under conditions of resource pressure or unhealthy Pods.

3. Eviction

3.1. Overview of Eviction

Eviction refers to Kubernetes forcibly removing running Pods from nodes, usually due to insufficient node resources (such as CPU or memory) or Pods violating certain limits (such as exceeding resource quotas).

3.2. Conditions for Triggering Eviction

  • Resource Pressure: When a node’s resources (such as memory or CPU) reach a certain threshold, Kubernetes will attempt to free up resources by evicting Pods. Especially under high memory pressure, Kubernetes will prioritize evicting Pods that use a lot of memory.
  • Exceeding Node Resource Limits: When resource consumption on a node reaches configured thresholds, Kubernetes will evict Pods to restore resource balance on the node. Eviction will follow certain priority policies, with lower-priority, non-persistent Pods (such as ephemeral types) typically being evicted first.
  • Failure of Pod Liveness or Readiness Probes: If a Pod’s health check fails, Kubernetes will evict it and attempt to reschedule it to other nodes.
  • OOM (Out of Memory) Killing of Pods: When memory is insufficient on a node, the operating system may trigger the OOM Killer (the operating system’s memory reclamation mechanism) to kill some processes, and Kubernetes will evict related Pods based on OOM events.

3.3. Eviction Policies

There are two types of eviction policies in Kubernetes:

  • Priority Eviction Policy: Prioritizes evicting Pods that consume high resources (such as those with high memory usage), and typically prioritizes evicting lower-priority Pods.
  • Priority-Based Eviction: If a Pod has a lower priority, Kubernetes will prioritize evicting these Pods to ensure that high-priority Pods can obtain resources.

3.4. Eviction Process

1. Eviction Request: When resource pressure on a node increases, Kubernetes evaluates the resource usage of all Pods on the current node.

2. Pod Selection: Kubernetes selects which Pods need to be evicted based on Pod priority, resource usage, and Pod type (such as persistent or ephemeral).

3. Evicting Pods: Once a Pod is selected for eviction, Kubernetes sends an eviction command to that Pod via the API, notifying the scheduler to remove it from the node.

3.5. Impact of Eviction

  • Pod Rescheduling: Evicted Pods will be rescheduled to other nodes, provided there are sufficient resources in the cluster.
  • Pod Lifecycle: After being evicted, a Pod enters a Pending state, and the scheduler will attempt to schedule it to other nodes based on the resource status of the cluster.

4. Interaction Between Scheduling, Preemption, and Eviction

There is a close relationship between scheduling, preemption, and eviction, which together determine the operation and resource allocation of Pods in the cluster. The scheduler is not only responsible for the initial scheduling of Pods but also makes reasonable resource allocation decisions during operation based on Pod priority, resource requirements, and cluster load conditions.

4.1 Scheduling and Preemption

The scheduler may consider the preemption mechanism during initial scheduling (especially when resources are tight). If cluster resources are insufficient, higher-priority Pods will preempt the resources of lower-priority Pods.

4.2 Eviction and Preemption

The preemption mechanism and eviction mechanism are somewhat similar. Eviction occurs automatically under node resource pressure, while preemption is initiated actively by the scheduler.

In resource-constrained situations, preemption and eviction can be used in combination to ensure the operation of high-priority Pods by adjusting priorities and evicting lower-priority Pods.

4.3 Scheduling and Eviction

The scheduler also needs to make decisions when node resources change (such as when node resources are insufficient or when failures occur). If a Pod is running on a node that is about to run out of resources, the scheduler may choose to reschedule the Pod to a node with sufficient resources.

Understanding how these three mechanisms work together helps to better manage Kubernetes clusters, optimize resource allocation, and ensure that high-priority applications can run normally in the cluster.

Previous Excellent Articles:

K8S Command Detailed Summary [Personal Collection Version]| K8S Cluster Deployment | K8S Storage Practical Cases |K8S Certificate Renewal for Ten Years | K8S Deployment of Prometheus | Rancher Deployment and Management of K8S |Jenkins Installation and Deployment | Gitlab Installation and Deployment | Service Mesh Istio Installation and Practice |Building an Enterprise-Level Harbor Repository | K8S Integration with Harbor Repository | Common Docker Commands Summary |Solutions for Docker Unable to Download Images | Three Methods to Install Docker | Summary of Basic Concepts of Docker |Oracle 19C RAC Cluster Setup | Oracle Cluster Management Command Summary | MySQL Cluster Installation and Deployment | MySQL One-Click Backup Script | MySQL Cluster Directory Migration | Redis Three Masters and Three Slaves Cluster Deployment | 150 Common Linux Commands | 8 Interesting Linux Commands | Common Operations of Vim Editor Summary |Detailed Explanation of Firewalld Firewall | Building Internal Yum Repository | Comprehensive Disk Expansion Methods | Out-of-Band Management Knowledge for Servers | Linux File System Selection Guide: XFS or EXT4? | Recommended Optimizations After Installing the Operating System!

Leave a Comment