Scheduling

Manual Scheduling

To manually schedule at creation - nodeName:

apiVersion: v1
kind: Pod
metadata:
 name: nginx
 labels:
  name: nginx
spec:
 containers:
 - name: nginx
   image: nginx
   ports:
   - containerPort: 8080
 nodeName: node02

Or create a binding object:

apiVersion: v1
kind: Binding
metadata:
  name: nginx
target:
  apiVersion: v1
  kind: Node
  name: node02

Labels and Selectors

Filter via selectors

Labels in metadata

Can use:

kubectl get pods --selector app=nginx

Taints and Tolerations

  • Taint: Tell pod "dont schedule here"

    • We taint nodes

  • Toleration: "You can schedule here even with taint"

    • Tolerate taint=xyz

kubectl taint nodes
kubectl taint nodes <node-name> key=value:taint-effect

Taint effect defines what would happen to the pods if they do not tolerate the taint.

  • NoSchedule

  • PreferNoSchedule: Best effort

  • NoExecute: Happens to nodes on existing nodes

    • Once taint takes effect, existing node evicts pod unless meets NoEvict

apiVersion: v1
kind: Pod
metadata:
 name: myapp-pod
spec:
 containers:
 - name: nginx-container
   image: nginx
 tolerations:
 - key: "app"
   operator: "Equal"
   value: "blue"
   effect: "NoSchedule"

Master nodes have NoSchedule

Node Selectors

We can add nodeSelectors to a pod, which will help with scheduling:

apiVersion: v1
kind: Pod
metadata:
 name: myapp-pod
spec:
 containers:
 - name: data-processor
   image: data-processor
 nodeSelector:
  size: Large

To label nodes:

kubectl label nodes <node-name> <label-key>=<label-value>
kubectl label nodes node-1 size=Large

Node Affinity

apiVersion: v1
kind: Pod
metadata:
 name: myapp-pod
spec:
 containers:
 - name: data-processor
   image: data-processor
 affinity:
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: size
            operator: In
            values:
            - Large
            - Medium

Other options:

   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: size
            operator: NotIn
            values:
            - Small
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: size
            operator: Exists

Available

  • requiredDuringSchedulingIgnoredDuringExecution

  • preferredDuringSchedulingIgnoredDuringExecution

Resource Requirements

  • Can specify requirements with resource.requests

  • Can specify limits with resource.limits

apiVersion: v1
kind: Pod
metadata:
  name: simple-webapp-color
  labels:
    name: simple-webapp-color
spec:
 containers:
 - name: simple-webapp-color
   image: simple-webapp-color
   ports:
    - containerPort:  8080
   resources:
     requests:
      memory: "1Gi"
      cpu: "1"
     limits:
       memory: "2Gi"
       cpu: "2"

Defaults is no limit, no requirements.

  • If no request, but we have limit, request = limit

  • Should atleast set requests to avoid starting a pod.

If pod uses too much RAM during usage, we will OOM kill.

We can set defaults for a namespace with LimitRange:

We can also set ResourceQuota request and limit for a namespace.

You cant adjust limits on pod without deletion, you can on deployment. Deployment will re-create.

DaemonSets

Run one copy of pod on every node in cluster.

Matadata very similar to ReplicaSet:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: monitoring-daemon
  labels:
    app: nginx
spec:
  selector:
    matchLabels:
      app: monitoring-agent
  template:
    metadata:
     labels:
       app: monitoring-agent
    spec:
      containers:
      - name: monitoring-agent
        image: monitoring-agent

Under the hood uses affinity.

Static Pods

Kubelet can read from /etc/kubernetes/manifests instead of talking to kube-api

We can only use pods, no complex deployments.

Check --pod-manifest-path or (--kubeconfig for staticPodPath:)

We can view these by listing containers:

  • crictl ps

  • nerdctl ps

  • docker ps

Cluster is aware of static pods, but we can't edit them outside manifests.

Kubeadm sets up some services this way.

Multiple Schedulers

We can add custom schedulers.

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: my-scheduler

If using process, name should match systemd service which points at yaml config with --config

If scheduler in pod, simply deploy as normal pod/deployment:

Configure Multiple Schedulers

On pod creation, direct pod to use custom scheduler:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
  schedulerName: my-custom-scheduler
kubectl get events -o wide
kubectl logs my-custom-scheduler -n kube-system

Scheduler Profiles

Scheduling has various stages, each can have associated plugins:

  • Scheduling queue

  • Filtering

  • Scoring

  • Binding

To customize plugins for each phase we have extension points

We can set multiple profiles for one scheduler binary:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: my-scheduler
    plugins:
      score:
        disabled: []
        enabled: []

Last updated