Kubernetes - Scheduling
Introduction
- Manual Scheduling
- Labels & Selectors
- Resource Limits
- Daemon Sets
- Multiple Schedulers
- Scheduler Events
- Configure Kubernetes Scheduler
Manual Scheduling
- Manual scheduling in Kubernetes involves directly assigning a Pod to a Node without the use of a scheduler by specifying the nodeName field in the Pod’s spec.
- It is not recommended to use manual scheduling as:
- It does not take into account the resource requirements of the Pod or the capacity of the Node.
- It does not provide any failover capabilities.
- It does not scale well.
Labels & Selectors
- Used to label & filter objects based on different categories and criteria.
Annotations: Used to record other details for informational purpose.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
apiVersion: apps/v1 kind: Deployment metadata: name: my-deployment spec: replicas: 3 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-container image: nginx
- spec.
selector.matchLabels
: Specifies the labels that the Deployment will use to find which Pods to manage. In this case, it will manage Pods that have the label app: my-app. - spec.template.metadata.
labels
: Specifies the labels that will be applied to the Pods created by this Deployment. In this case, each Pod will have the label app: my-app.
Taints And Tolerations
- Taints and Tolerations work together to ensure that Pods are not scheduled onto inappropriate Nodes.
- Taints are applied to Nodes and represent conditions that a Pod should avoid.
- Tolerations are applied to Pods and allow (but do not require) the Pods to schedule onto Nodes with matching Taints.
- Taints and Tolerations does not guarantee that taint resistant pods will be always be placed on tainted nodes.
- By default, a taint is already set on the master node to prevent user pods deployed on the master node.
To taint a node:
1 2
kubectl taint nodes node-name key=value:taint-effect kubectl taint nodes node1 app=blue:NoSchedule
To remove taint from a node:
1
kubectl taint nodes controlplane node-role.kubernetes.io/control-plane:NoSchedule-
- There are three taint-effects
- a.
NoSchedule
: The pods will not be scheduled on the node. - b.
PreferNoSchedule
: The system will try to avoid placing a pod on the node, but it is not guaranteed. - c.
NoExecute
: The new pods will not be scheduled on the node and existing pods will be evicted if they do not tolerate the taint.
- a.
To add tolerations to a pod:
1 2 3 4 5 6 7 8 9 10 11 12 13
apiVersion: kind: Pod metadata: name: myapp—pod spec: containers: name: nginx—container image: nginx tolerations: key: "app" operator: "Equal" value: "blue" effect: "NoSchedule"
Node Selectors
Nodes can be labelled with the specific key-value and the pod definition can be updated to use the specific node as shown below:
- To label the node:
1 2
kubectl label nodes <node-name> <key>=<value> kubectl label nodes node1 size=large
- Sample YAML:
1 2 3 4 5 6 7 8 9 10
apiVersion: kind: Pod metadata : name : myapp—pod spec : containers : — name: data—processor image: data—processor nodeSe1ector : size: Large
Node Affinity
- Node Affinity in Kubernetes is a set of rules used by the scheduler to determine where a Pod can be placed.
- It allows you to set conditions on which nodes are eligible to be scheduled based on the labels on the nodes.
- There are two types of Node Affinity:
- Required (
requiredDuringSchedulingIgnoredDuringExecution
):- The scheduler will not place the Pod unless the condition is met.
- This is similar to nodeSelector, but allows for more complex expressions.
- Preferred (
preferredDuringSchedulingIgnoredDuringExecution
):- The scheduler will try to place the Pod according to the condition, but it’s not guaranteed.
- Sample YAML:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
apiVersion: v1 kind: Pod metadata: name: myapp-pod spec: containers: - name: data-processor image: data-processor affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: size operator: In values: - Large
- Taints and Tolerations along with Node Affinity can be used to achieve complex requirements.
Resource Limits
- Resource limits in Kubernetes are used to control the amount of compute resources that a Pod or Container can use.
- There are two types of resource settings:
Requests
:- This is the amount of a resource that the system will guarantee to the container or pod.
- Kubernetes uses this value to decide on which nodes to place the Pod.
Limits
:- This is the maximum amount of a resource that a container can use. If a container exceeds this limit, it might be terminated and restarted, depending on the specific resource and the container’s restartPolicy.
- Resource limits can be set for both CPU and memory.
CPU
is specified in CPU units, with 1 CPU equivalent to 1 AWS vCPU, 1 GCP Core, 1 Azure Core, or 1 Hyperthread on a bare-metal Intel processor.Memory
is specified in bytes, and can be expressed in fixed-point integers or as a decimal using one of these suffixes: E, P, T, G, M, K, Ei, Pi, Ti, Gi, Mi, Ki.
- Default Limits for a Container: 1 vCPU - 512Mi
- A container cannot use more CPU than specified limit.
A container can use more memory than the specified limit, however it is terminated if it continuously uses more memory that the limit.
- Sample YAML: ``` apiVersion: v1 kind: Pod metadata: name: myapp-pod spec: containers:
- name: myapp-container image: myapp resources: requests: memory: “64Mi” cpu: “250m” limits: memory: “128Mi” cpu: “500m” ```
The default and max limits and requests can be set at the namespace level using the
kind: LimitRange
.1 2 3 4 5 6 7 8 9 10 11 12 13
apiVersion: v1 kind: LimitRange metadata: name: resource-limits spec: limits: - default: memory: 512Mi cpu: "1" defaultRequest: memory: 256Mi cpu: "0.5" type: Container
Daemon Sets
- DaemonSets in Kubernetes are used to ensure that some or all nodes run a copy of a Pod.
- This is particularly useful for deploying system-wide tasks such as log collection, monitoring, or network proxies.
- When a new node is added to the cluster, the Pod is added to the node.
When a node is removed from the cluster, the Pod is garbage collected/removed from the node.
- Sample YAML:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
apiVersion: apps/v1 kind: DaemonSet metadata: name: my-daemonset spec: selector: matchLabels: name: my-daemonset template: metadata: labels: name: my-daemonset spec: containers: - name: my-daemonset-container image: my-daemonset-image
Static Pods
- The pods created by the kubelet with the help of the definition files in the designated folder, without intervention of master node(if exists) are called static pods.
- A single node with kubelet installed can create pods with the help of definition files placed in /etc/kubernetes/manifests.
- To get the path to the configuration file:
1
ps -ef | grep kubelet | grep "\--config"
- To get the location of static folder
1
grep -i static /var/lib/kubelet/config.yaml
Static PODs | DaemonSets |
---|---|
Created by the Kubelet | Created by Kube-API server (DaemonSet Controller) |
Deploy Control Plane components as Static Pods | Deploy Monitoring Agents, Logging Agents on nodes |
Ignored by the Kube-Scheduler | Ignored by the Kube-Scheduler |
Multiple Schedulers
We can have multiple schedulers running on kubernetes and we can specify the scheduler name while deploying the pods.
Assigning a particular scheduler for a pod:
1 2 3 4 5 6 7 8 9 10 11 12
apiVersion: v1 kind: Pod metadata: name: annotation-default-scheduler labels: name: multischeduler-example spec: schedulerName: default-scheduler containers: - name: pod-with-default-annotation-container image: registry.k8s.io/pause:2.0
Scheduler Profiles
- Scheduler profiles in Kubernetes allow you to configure the behavior of the scheduler.
Each profile is associated with a separate scheduling queue, allowing you to have multiple configurations of the scheduler running simultaneously.
- A profile contains the following fields:
schedulerName
:- The name of the scheduler associated with this profile. This is used when assigning a scheduler to a Pod, as shown in your excerpt.
plugins
:- This allows you to enable, disable, or reorder the scheduling plugins that are used in the scheduling process.
- Plugins include things like Filter, Score, PreFilter, PostFilter, Reserve, PreBind, PostBind, Unreserve, and Permit.
pluginConfig
:- This allows you to configure the behavior of the plugins.
Sample YAML:
1 2 3 4 5 6 7 8 9 10
apiVersion: kubescheduler.config.k8s.io/v1beta1 kind: KubeSchedulerConfiguration profiles: - schedulerName: default-scheduler plugins: score: disabled: - name: NodeResourcesLeastAllocated enabled: - name: NodeResourcesMostAllocated
Scheduler Events
- Configure Kubernetes Scheduler
Ref: https://jvns.ca/blog/2017/07/27/how-does-the-kubernetes-scheduler-work/