Labels and Taints
To make scheduling more efficient and compatible with Kubernetes, Ocean supports the following Kubernetes constraint mechanisms for scheduling pods:
- Node Selector: Constrains pods to nodes with particular labels.
- Node Affinity: Constrains nodes for pod scheduling eligibility based on node labels. Spot supports hard and soft affinity (
requiredDuringSchedulingIgnoredDuringExecution
,preferredDuringSchedulingIgnoredDuringExecution
). Pod Affinity and Pod Anti-Affinity: This function schedules a pod based on whether other pods run on a node. - Pod Port Restrictions: Validates that each pod will have required ports available on the machine.
- Well-Known Labels.
Spot Labels
Spot labels allow you to adjust Ocean's default scaling behavior. Add them to your pods to control the node termination process or lifecycle.
spotinst.io/azure-premium-storage
The AKS scheduler does not guarantee that pods requiring premium storage will schedule on nodes that support premium storage disks.
The Spot Ocean label spotinst.io/azure-premium-storage
is injected into every node in a node pool that supports premium storage.
We recommended using spotinst.io/azure-premium-storage
on your pods in cases where the pod requires premium storage disks.
This enables pods to be provisioned on the most appropriate nodes for their workloads.
For more information, see Azure premium storage.
spotinst.io/restrict-scale-down
Some workloads are not as resilient to spot instance replacements as others, so you may want to lower the frequency of replacing the nodes they are running on as much as possible while still benefiting from spot instance pricing. For these workloads, use the spotinst.io/restrict-scale-down
label (set to true
) to block the proactive scaling down of the instance for the purposes of more efficient bin packing. This will leave the instance running as long as possible. The instance will be replaced only if it goes into an unhealthy state or if forced by a cloud provider interruption.
spotinst.io/node-lifecycle
Ocean uses the spotinst.io/node-lifecycle
label key to indicate a node's lifecycle. It is applied to all Ocean-managed nodes and has a value of od
(on-demand).
This label is useful for workloads that are not resilient to spot instance interruptions and must run on on-demand instances at all times.
By applying node affinity to the spotinst.io/node-lifecycle
label with the value od
, you can ensure that these workloads are scheduled only on on-demand instances.
spotinst.io/node-lifecycle:spot
affinity is not supported, and unless spotinst.io/node-lifecycle:od
affinity is applied, Ocean will continue to try to provide excess compute capacity (spot instances) for all workloads in the cluster.
spotinst.io/gpu-type
This label helps create direct affinity to specific types of GPU hardware, freeing the user from the need to explicitly set and manage a list of VMs that contain the required hardware. Ocean automatically matches the relevant VMs (currently with AWS and GCP) for workloads having affinity rules using this label. Valid label values are:
nvidia-tesla-v100
nvidia-tesla-p100
nvidia-tesla-k80
nvidia-tesla-p4
nvidia-tesla-t4
nvidia-tesla-a100
(Only for AWS)nvidia-tesla-m60
amd-radeon-v520
nvidia-tesla-t4g
nvidia-tesla-a10
Don't add Spot labels under the virtual node group (launch specification) node labels section. Add these labels to the pod configuration only.
Instance Types Labels
Format: aws.spot.io/instance-<object>
, for example, aws.spot.io/instance-category
Apply these labels to a workload's constraints (nodeSelector, node affinity, etc.) to reflect instance type properties. For example, constrain workloads to run on any M6, M7, or R7 family. This avoids manually listing all instance types per family.
The instance labels are as follows:
aws.spot.io/instance-category
: Reflects the category of the instance (for example., c).aws.spot.io/instance-family
: Reflects the family of the instance (for example., c5a).aws.spot.io/instance-generation
: Reflects the generation of the instance (for example., 5).aws.spot.io/instance-hypervisor
: Reflects the hypervisor the instance uses (for example., nitro).aws.spot.io/instance-cpu
: Reflects the CPU the instance uses (for example., 2).aws.spot.io/instance-memory
: Reflects the instance's memory (for example., 4096).
These labels only launch nodes that match the required pod labels.
Examples
Using restrict scale-down label:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spotinst.io/restrict-scale-down: "true"
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
resources:
requests:
memory: "2Gi"
cpu: "2"
limits:
memory: "4Gi"
cpu: "4"
Using od
nodeSelector:
apiVersion: v1
kind: Pod
metadata:
name: with-node-selector
spec:
containers:
- name: with-node-selector
image: registry.k8s.io/pause:2.0
imagePullPolicy: IfNotPresent
nodeSelector:
spotinst.io/node-lifecycle: od
Using od
nodeAffinity:
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: spotinst.io/node-lifecycle
operator: In
values:
- od
containers:
- name: with-node-affinity
image: registry.k8s.io/pause:2.0
Startup Taints
Cloud service provider relevance: AWS Kubernetes
Startup taints are temporary taints applied to a node during its initialization phase. During this phase, the autoscaler will not scale up nodes for additional pending pods that match this node because it has already acknowledged that the start-up taint will soon be removed. Once removed, any pod without toleration matching the node can be scheduled without launching additional nodes.
When to Use Startup Taints
You may want to deploy a specific pod to a node before deploying other pods to the same node. When that pod is ready or has completed a defined procedure, such as networking, scheduling of other pods will be allowed.
Example: Cilium Cilium recommends applying a taint such as node.cilium.io/agent-not-ready=true:NoExecute
to prevent other pods from starting before Cilium has finished configuring the necessary networking on the node.
The pod used for initialization will have a tolerance to this taint exclusively. Once the node is ready, the application running on the pod will remove the taint from the node.
If the startupTaint
attribute has not been removed for a specific node by the end of the cluster's grace period, a new node will be launched for any pending pods. The grace period starts when a node is created; its default is 5 minutes, and you can configure it in the cluster under cluster.strategy.gracePeriod
.
Configure Startup Taints in the Spot API
AWS Kubernetes only
Prerequisite: Ocean controller version at least v2.0.68
Configure Ocean to consider your startup taints using the startupTaints
attribute at the Ocean cluster and virtual node group levels.
-
Cluster: under
cluster.compute.launchSpecification
-
Virtual node group: under
launchSpec
You must also set the startupTaint
as a regular taint in the userData
for the cluster or virtual node group. This is because Ocean does not add or remove configured startup taints.