Exercise
In this exercise, we will use a HorizontalPodAutoscaler resource to automatically increase or decrease the number of replicas of a Deployment based on CPU usage.
Creating a Deployment
Copy the following content into the deploy.yaml file.
apiVersion: apps/v1
kind: Deployment
metadata:
name: w3
spec:
selector:
matchLabels:
app: w3
replicas: 1
template:
metadata:
labels:
app: w3
spec:
containers:
- image: nginx:1.20-alpine
name: w3
ports:
- containerPort: 80
resources:
requests:
cpu: 200m
Then create this Deployment with the following command:
$ kubectl apply -f deploy.yaml
Creating a Service
Copy the following content into the svc.yaml file.
apiVersion: v1
kind: Service
metadata:
name: w3
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: w3
Then create this Service with the following command:
$ kubectl apply -f svc.yaml
Installing the Metrics Server
The HorizontalPodAutoscaler resource uses an external component called metrics-server to collect Pod consumption metrics (CPU / memory). These metrics will then be used to automatically increase or decrease the number of Pods in the Deployment based on load.
First, check if the metrics-server is installed in your cluster:
kubectl get po -n kube-system -l k8s-app=metrics-server
If this command returns a Pod, you can proceed to the next section. If the command returns nothing, it means that the metrics-server is not installed, so you will need to set it up. There are 2 scenarios:
- If you’re using Minikube
launching the metrics-server can be done simply with the following command:
minikube addons enable metrics-server
- If you’re not using Minikube
you need to deploy the metrics-server process with the following command:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Accessing Metrics
After a few tens of seconds, the metrics-server will start collecting metrics. You can verify this with the following command that retrieves CPU and memory consumption of the nodes:
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
workers-3ha6f 50m 2% 628Mi 20%
workers-3ha6x 92m 4% 644Mi 20%
workers-3ha6y 52m 2% 739Mi 23%
Note: this example is from a cluster with 3 nodes
Creating the HorizontalPodAutoscaler Resource
We will now define a HorizontalPodAutoscaler that will be responsible for modifying the number of replicas of the Deployment if it uses more than 10% of its allocated CPU (10% is a very low value chosen simply for this example, in a non-exercise context, this value will be higher).
Create a hpa.yaml with the following content:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-v2
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: w3
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 10
Then create this resource:
kubectl apply -f hpa.yaml
Verify that the HorizontalPodAutoscaler was created correctly:
kubectl get hpa
You will get a result similar to the following:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa Deployment/w3 0%/10% 1 10 0 9s
Note: it’s possible that for a few seconds the value in the TARGET column will be “
Testing
To send a large number of requests to the w3 service, we will use Apache Bench.
With the following command, launch the ab Pod whose role is to send requests to the w3 service from inside the cluster:
kubectl run ab -ti --rm --restart='Never' --image=lucj/ab -- -n 200000 -c 100 http://w3/
From another terminal, observe the evolution of the number of replicas (this may take a few minutes):
$ kubectl get -w hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa Deployment/w3 182%/10% 1 10 4 6m57s
hpa Deployment/w3 97%/10% 1 10 8 7m2s
hpa Deployment/w3 12%/10% 1 10 10 7m17s
hpa Deployment/w3 0%/10% 1 10 10 7m32s
...
Note: the -w (watch) option regularly updates the command output.
You will then observe the decrease in the number of replicas. This phase will however be a bit longer than the one observed during the increase in the number of replicas (by default the hpa waits 5 minutes before decreasing the number of replicas).
Note: it’s possible to define how the number of replicas increases or decreases via the use of .spec.behavior.scaleUp and .spec.behavior.scaleDown properties in the HPA resource specification.
Cleanup
Delete the various resources created in this exercise:
kubectl delete -f deploy.yaml -f svc.yaml -f hpa.yaml