EC2 spot instances allow us launching instances at discounted prices, up-to 90% in some cases. AWS offers spot instances to utilize available EC2 capacity in an AZ. Spot instances are great for running short-lived workloads and apps that endure interruption (e.g. build jobs, queue processors, ETL jobs, dev/qa workloads).
In a kOps cluster, node instances are managed with InstanceGroup type manifests. We can create one by simply copying an existing IG, and put in the fields for spot instances according to documentation
$ kops get ig nodes-us-east-1a -o yaml > nodes-us-east-1a.yaml
Edit the resulting file as needed, for example:
--- nodes-us-east-1a.yaml 2022-02-24 21:36:07.935841773 -0500
+++ spot-nodes-us-east-1a.yaml 2022-02-24 21:38:43.075933574 -0500
@@ -1,11 +1,12 @@
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
- creationTimestamp: "2022-02-13T00:44:00Z"
+ creationTimestamp: null
labels:
kops.k8s.io/cluster: docks-www-tutorials.sherzod.com
- name: nodes-us-east-1a
+ name: spot-nodes-us-east-1a
spec:
+ autoscale: false
cloudLabels:
application: kops
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20220131
@@ -13,12 +14,24 @@
httpPutResponseHopLimit: 1
httpTokens: required
machineType: t3a.medium
+ maxPrice: "0.03"
maxSize: 1
minSize: 1
+ mixedInstancesPolicy:
+ instances:
+ - t3.medium
+ - t3.large
+ - t3a.medium
+ - t3a.large
+ onDemandAboveBase: 0
+ onDemandBase: 0
+ spotInstancePools: 3
nodeLabels:
- kops.k8s.io/instancegroup: nodes-us-east-1a
+ kops.k8s.io/instancegroup: spot-nodes-us-east-1a
role: Node
rootVolumeSize: 64
rootVolumeType: gp3
subnets:
- us-east-1a-priv
+ taints:
+ - spot=true:PreferNoSchedule
The final file is here.
Let go over the additions. autoscale: false
tells the cluster autoscaler (which we enabled during installation) to skip this IG. maxPrice
dictates the price threshold for requesting resources from AWS (you can see the spot prices history in the EC2 console). mixedInstancesPolicy
allows us to define several instance sizes to choose from. taints
will become handy when scheduling workloads onto spot nodes (and we want to avoid putting regular workloads right off the bat).
Once we are satisfied with our configuration, we can apply the manifest:
$ kops create -f spot-nodes-us-east-1a.yaml
Created instancegroup/spot-nodes-us-east-1a
To deploy these resources, run: kops update cluster --name docks-www-tutorials.sherzod.com --yes
$ kops update cluster --name docks-www-tutorials.sherzod.com --yes
...
...
Cluster changes have been applied to the cloud.
Assuming AWS fulfills our spot instances request, within few minutes the spot node should appear in our cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-2-0-54.ec2.internal Ready control-plane,master 12d v1.22.6
ip-10-2-2-101.ec2.internal Ready node,spot-worker 22m v1.22.6 <---
ip-10-2-2-42.ec2.internal Ready node 12d v1.22.6
Running workload on spot instances
Let's put a workload onto a spot node, and combine it with one of the out-of-the-box features of kOps - IAM Role based Service Accounts (IRSA). I am going to dedicate a thorough post about it as I mentioned, but for now let's assume we know it. During installation we declared aws-reader
service account within spec.iam.serviceAccountExternalPermissions
, and now we are going to use it. In short, kOps creates a matching IAM role and we are going to need that role's ARN.
# spot-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: awscli
labels:
cloud: aws
spec:
backoffLimit: 1
template:
spec:
containers:
- name: awscli
image: amazon/aws-cli
command: ["aws", "ec2", "describe-instances"]
env:
- name: AWS_DEFAULT_REGION
value: us-east-1
- name: AWS_REGION
value: us-east-1
- name: AWS_ROLE_ARN
value: "arn:aws:iam::205656158400:role/aws-reader.default.sa.docks-www-tutorials.sherzod.com"
- name: AWS_WEB_IDENTITY_TOKEN_FILE
value: "/var/run/secrets/amazonaws.com/serviceaccount/token"
- name: AWS_STS_REGIONAL_ENDPOINTS
value: "regional"
volumeMounts:
- mountPath: "/var/run/secrets/amazonaws.com/serviceaccount/"
name: aws-token
restartPolicy: Never
serviceAccountName: aws-reader
volumes:
- name: aws-token
projected:
sources:
- serviceAccountToken:
audience: "amazonaws.com"
expirationSeconds: 86400
path: token
nodeSelector:
node-role.kubernetes.io/spot-worker: "true"
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "PreferNoSchedule"
We had to put extra bit of information into the manifest, but it's mostly to make IRSA work (see this post by one of the kOps' maintainers). There are solutions out there that mutate the container and inject those variables, but those are out of scope for now.
Notice the nodeSelector
and tolerations
attributes that target the workload towards spot instance. If we run the manifest, it will result in output of aws ec2 describe-instances
command.
Once you apply the manifest and see the logs, you should get a familiar result (assuming you have been running aws ec2 describe-instances
all your life :))
$ kubectl apply -f spot-job.yaml
job.batch/awscli created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
awscli--1-sx294 0/1 Completed 0 36s
$ kubectl logs pod/awscli--1-sx294 | head
{
"Reservations": [
{
"Groups": [],
"Instances": [
{
"AmiLaunchIndex": 0,
"ImageId": "ami-01b996646377b6619",
"InstanceId": "i-0f60693563749ac84",
"InstanceType": "t3.medium",
...
Definitely keep advantage of using spot instances in your k82 clusters, especially if you are working at scale. They are ideal for running development and shared services or applications, paired with fallback mechanisms that handle interruptions. Combined with cluster autoscaler and Horizontal Pod Autoscaling (HPA), one can build elastic fleet in no time without committing to Reserved Instances (RIs) or savings plan.