Exercise

In this exercise, we will use a HorizontalPodAutoscaler resource to automatically increase or decrease the number of replicas of a Deployment based on CPU usage.

Creating a Deployment

Copy the following content into the deploy.yaml file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: w3
spec:
  selector:
    matchLabels:
      app: w3
  replicas: 1
  template:
    metadata:
      labels:
        app: w3
    spec:
      containers:
        - image: nginx:1.20-alpine
          name: w3
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: 200m

Then create this Deployment with the following command:

$ kubectl apply -f deploy.yaml

Creating a Service

Copy the following content into the svc.yaml file.

apiVersion: v1
kind: Service
metadata:
  name: w3
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: w3

Then create this Service with the following command:

$ kubectl apply -f svc.yaml

Installing the Metrics Server

The HorizontalPodAutoscaler resource uses an external component called metrics-server to collect Pod consumption metrics (CPU / memory). These metrics will then be used to automatically increase or decrease the number of Pods in the Deployment based on load.

First, check if the metrics-server is installed in your cluster:

kubectl get po -n kube-system -l k8s-app=metrics-server

If this command returns a Pod, you can proceed to the next section. If the command returns nothing, it means that the metrics-server is not installed, so you will need to set it up. There are 2 scenarios:

  • If you’re using Minikube

launching the metrics-server can be done simply with the following command:

minikube addons enable metrics-server
  • If you’re not using Minikube

you need to deploy the metrics-server process with the following command:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Accessing Metrics

After a few tens of seconds, the metrics-server will start collecting metrics. You can verify this with the following command that retrieves CPU and memory consumption of the nodes:

$ kubectl top nodes
NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
workers-3ha6f   50m          2%     628Mi           20%
workers-3ha6x   92m          4%     644Mi           20%
workers-3ha6y   52m          2%     739Mi           23%

Note: this example is from a cluster with 3 nodes

Creating the HorizontalPodAutoscaler Resource

We will now define a HorizontalPodAutoscaler that will be responsible for modifying the number of replicas of the Deployment if it uses more than 10% of its allocated CPU (10% is a very low value chosen simply for this example, in a non-exercise context, this value will be higher).

Create a hpa.yaml with the following content:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-v2
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: w3
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 10

Then create this resource:

kubectl apply -f hpa.yaml

Verify that the HorizontalPodAutoscaler was created correctly:

kubectl get hpa

You will get a result similar to the following:

NAME   REFERENCE       TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
hpa    Deployment/w3   0%/10%    1         10        0          9s

Note: it’s possible that for a few seconds the value in the TARGET column will be “/10%”, while the hpa gathers the resource consumption metrics.

Testing

To send a large number of requests to the w3 service, we will use Apache Bench.

With the following command, launch the ab Pod whose role is to send requests to the w3 service from inside the cluster:

kubectl run ab -ti --rm --restart='Never' --image=lucj/ab -- -n 200000 -c 100 http://w3/

From another terminal, observe the evolution of the number of replicas (this may take a few minutes):

$ kubectl get -w hpa
NAME     REFERENCE       TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
hpa      Deployment/w3   182%/10%   1         10        4          6m57s
hpa      Deployment/w3   97%/10%    1         10        8          7m2s
hpa      Deployment/w3   12%/10%    1         10        10         7m17s
hpa      Deployment/w3   0%/10%     1         10        10         7m32s
...

Note: the -w (watch) option regularly updates the command output.

You will then observe the decrease in the number of replicas. This phase will however be a bit longer than the one observed during the increase in the number of replicas (by default the hpa waits 5 minutes before decreasing the number of replicas).

Note: it’s possible to define how the number of replicas increases or decreases via the use of .spec.behavior.scaleUp and .spec.behavior.scaleDown properties in the HPA resource specification.

Cleanup

Delete the various resources created in this exercise:

kubectl delete -f deploy.yaml -f svc.yaml -f hpa.yaml