{"id":5347,"date":"2026-07-05T12:01:15","date_gmt":"2026-07-05T12:01:15","guid":{"rendered":"https:\/\/geekmungus.co.uk\/?p=5347"},"modified":"2026-07-05T12:29:59","modified_gmt":"2026-07-05T12:29:59","slug":"kubernetes-series-part-4-metrics-and-hpa","status":"publish","type":"post","link":"https:\/\/geekmungus.co.uk\/?p=5347","title":{"rendered":"Kubernetes &#8211; Part 4 &#8211; Metrics and HPA"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">In part 4 of the series we are exploring Metrics.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Kubernetes can collect metrics of attributes like CPU or Memory etc. These metrics are collected by a Metrics Server, you can use the built in Kubernetes one (Metrics Server), or you can use external metrics which are collected by something else, e.g. Prometheus and then exposed into Kubernetes for it to act upon.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">What should you use?<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>CPU or memory-based scaling: Metrics Server is sufficient.<\/li>\n\n\n\n<li>Web applications: Prometheus + Prometheus Adapter is a common choice, allowing you to scale on application-level metrics such as request rate or latency.<\/li>\n\n\n\n<li>Event-driven workloads (queues, streams, messaging): KEDA is often the easiest and most feature-rich solution because it integrates directly with many external systems and supports scaling from zero.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Install Metrics<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before we can access metrics about the Pods (workloads) running, we need to enable the Metrics Server within our Kubernetes cluster.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Within the Kubespray Ansible role, this can be done with updating the group vars within the file:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>inventory\/&#91;mycluster]\/group_vars\/k8s_cluster\/addons.yml<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Change the line from false to true:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>metrics_server_enabled: true<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">There are other options, but we&#8217;re not touching them for now.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And then we reapply the Ansible to enable this feature:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>cd ~\/kubespray\nsource .venv\/bin\/activate\nansible-playbook -i inventory\/k8scluster1\/inventory.ini cluster.yml -K -b -v --private-key=~\/.ssh\/id_ed25519<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Entering the BECOME password (i.e. sudo), as you run it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Assuming the Ansible run works okay then you&#8217;re all set.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How check it is enabled?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To verify if it is working, by far the easiest way, is give it say 2-3 minutes after enabling to allow it to generate some metrics and then run the following:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl top nodes\nkubectl top pods -A<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"473\" height=\"259\" src=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-12.png\" alt=\"\" class=\"wp-image-5349\" style=\"width:473px;height:auto\" srcset=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-12.png 473w, https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-12-300x164.png 300w\" sizes=\"auto, (max-width: 473px) 100vw, 473px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">As you can see we have metrics being shown therefore it would appear to be logging these metrics as expected.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To check the logs of the Metrics Server, you can use something like:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl logs -n kube-system deployment\/metrics-server<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Example Using Metrics<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">So now let&#8217;s try and example which will make use of the metrics we are getting from the Metrics-Server which collects these metrics from the pods themselves.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We&#8217;ll create and apply the following Kubernetes YAML files: <strong>nginx-custom-hpa.yaml<\/strong> (top) and <strong>nginx-custom.yaml<\/strong> (bottom). The first file is the <strong>Horizontal Pod Autoscaler (HPA)<\/strong>, this defines how Kubernetes will detail with the Pods via the Deployment to scale up and scale down based on the thresholds we specify.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>apiVersion: autoscaling\/v2\nkind: HorizontalPodAutoscaler\nmetadata:\n  name: nginx-custom\nspec:\n  scaleTargetRef:\n    apiVersion: apps\/v1\n    kind: Deployment\n    name: nginx-custom\n  minReplicas: 1\n  maxReplicas: 10\n  metrics:\n    - type: Resource\n      resource:\n        name: cpu\n        target:\n          type: Utilization\n          averageUtilization: 25\n  behavior:\n    scaleUp:\n      policies:\n        - type: Percent\n          value: 50\n          periodSeconds: 60\n    scaleDown:\n      stabilizationWindowSeconds: 60\n      policies:\n        - type: Percent\n          value: 50\n          periodSeconds: 60<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code># Deployment Definition\napiVersion: apps\/v1\nkind: Deployment\nmetadata:\n  name: nginx-custom\n  labels:\n    app: nginx-custom\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app: nginx-custom\n  template:\n    metadata:\n      labels:\n        app: nginx-custom\n    spec:\n      containers:\n        - name: nginx-custom\n          image: geekmungus\/nginx-custom:v1.0.0\n          ports:\n            - containerPort: 80\n          resources:\n            requests:\n              cpu: 256m\n              memory: 128Mi\n            limits:\n              cpu: 512m\n              memory: 256Mi\n---\n# Service Definition\napiVersion: v1\nkind: Service\nmetadata:\n  name: nginx-custom-service\n  labels:\n    app: nginx-custom\nspec: \n  type: LoadBalancer\n  selector:\n    app: nginx-custom\n  ports:\n  - port: 80\n    targetPort: 80<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Now we apply the above:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl apply -f nginx-custom-hpa.yaml -f nginx-custom.yaml<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">After a few minutes just make make sure the metrics are being captured correctly. If where it says 0% in the above it is saying &lt;unknown&gt; its either you need to wait a few minutes for the metrics to be collected, you have no actual metrics, or there is a problem with the pod\/metric-server so the values are not appearing as expected.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>kubectl describe hpa nginx-custom\nkubectl top pods\nkubectl get hpa<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"673\" height=\"58\" src=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-13.png\" alt=\"\" class=\"wp-image-5351\" srcset=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-13.png 673w, https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-13-300x26.png 300w\" sizes=\"auto, (max-width: 673px) 100vw, 673px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We can see here that it is showing 0% which is good, that means its collecting the percentage CPU from the Deployment. Before we attempt to stress the Nginx application we&#8217;ve deployed, let&#8217;s dig into what these thresholds actually mean and how they work.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Explanation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Within the HPA configuration there are two main parts to consider <strong>metrics <\/strong>and <strong>behaviour<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>metrics <\/strong>answers: When should I scale?<\/li>\n\n\n\n<li><strong>behaviour <\/strong>answers: How quickly am I allowed to scale?<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Metrics &#8211; When should I scale?<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The HPA configuration says:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>metrics:\n  - type: Resource\n    resource:\n      name: cpu\n      target:\n        type: Utilization\n        averageUtilization: 25<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This tells the HPA controller to: &#8220;Try to keep the average CPU utilisation across all Pods at 25% of their requested CPU.&#8221; Notice this is not the Kubernetes node CPU!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So how does it work out what percentage of pod CPU usage is so it can be averaged. You&#8217;ll notice that we specify the &#8220;requests&#8221; on the pod definition.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>...\nresources:\n  requests:\n    cpu: 100m\n...<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Based on the above it means:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>25m CPU used = 25%<\/li>\n\n\n\n<li>50m CPU used = 50%<\/li>\n\n\n\n<li>100m CPU used = 100%<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">So a worked example is:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Replicas<\/th><th>CPU usage<\/th><th>Average Utilisation<\/th><th>Desired<\/th><\/tr><\/thead><tbody><tr><td>1<\/td><td>20m<\/td><td>20%<\/td><td>Do nothing<\/td><\/tr><tr><td>1<\/td><td>40m<\/td><td>40%<\/td><td>Scale up<\/td><\/tr><tr><td>2<\/td><td>15m + 20m<\/td><td>17.5%<\/td><td>Scale down<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">How HPA Calculates the Replicas<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">HPA uses a simple rough calculation as follows to work out if it should scale:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>desiredReplicas = currentReplicas \u00d7 (currentUtilisation \/ targetUtilisation)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">So if we put in some actual figures to that:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Current replicas = 2\nCurrent CPU = 60%\nTarget = 25%<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Then within the calculation we end up with:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>2 \u00d7 (60 \/ 25) \u2248 4.8<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">So rounding up that&#8217;s 5 replicas required. It is just the desired number of replicas, it doesn&#8217;t decide how quickly it gets to that figure, that&#8217;s the next section.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Behaviour &#8211; How quickly am I allowed to scale?<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">So within the configuration we had the &#8220;behaviour&#8221; section, this decides how quickly it scales up. In effect it says if the calculation above says you need 5 replicas, but you are at 2 replicas. You can only increase by 50% every 60 seconds.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  behavior:\n    scaleUp:\n      policies:\n        - type: Percent\n          value: 50\n          periodSeconds: 60\n    scaleDown:\n      stabilizationWindowSeconds: 60\n      policies:\n        - type: Percent\n          value: 50\n          periodSeconds: 60<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">So in practice this means that if you had 2 replicas, but needed 5, it would first do the calculation<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>50% of 2 = 1<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">So Kubernetes would add another replica to get to 3. Then it waits 60 seconds, and re-evaluates, does a high CPU condition still exist, i.e. one above the average utilisation target of 25%, if yes, it scales up based upon:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>50% of 3 = 1.5<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">So it adds another, so its now at 4 replicas, waits 60 seconds and tries again and so on.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So why? Its to avoid &#8220;thrashing&#8221; for a workload that is spiking up and down. You don&#8217;t want it to immediately scale, because if the high CPU condition is short lived it will immediately add pods only to take them away again.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The same of course is true of the scaling down, you don&#8217;t want a brief drop in CPU workload to suddenly trigger a removal of replicas; especially if replicas have some form of startup time &#8211; if they were to need to be brought back quickly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Generate CPU Load<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s try to make it generate some CPU load, below is a script (load-gen.sh) which can be run to generate some load and force a scale up.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/env bash\n\n# .\/traffic-gen.sh www.google.com 2\n\n#set -euo pipefail\n\nif &#91; \"$#\" -lt 1 ]; then\n    echo \"Usage: $0 &lt;url&gt; &lt;concurrency&gt; &lt;duration-in-seconds&gt;\"\n    exit 1\nfi\n\nURL=$1 # Extract the position argument 1\nCONCURRENCY=\"${2:-50}\" # Extract the position argument 2, if not present use default.\nDURATION_SECONDS=\"${3:-300}\" # Extract the position argument 3, if not present use default.\n\necho \"Generating load against: $URL\"\necho \"Concurrency: $CONCURRENCY\"\necho \"Duration: ${DURATION_SECONDS}s\"\necho\n\nend_time=$((SECONDS + DURATION_SECONDS))\n\nworker() {\n  while &#91; \"$SECONDS\" -lt \"$end_time\" ]; do\n    curl -s -o \/dev\/null \"$URL\" || true\n  done\n}\n\nfor i in $(seq 1 \"$CONCURRENCY\"); do\n  worker &amp;\ndone\n\nwhile &#91; \"$SECONDS\" -lt \"$end_time\" ]; do\n  echo \"Load Generator Running... $(date)\"\n  sleep 10\ndone\n\nwait\necho \"Load test complete.\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Run it with the below, swapping the IP address of your application&#8217;s service IP (Load Balancer).<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>.\/load-gen.sh http:\/\/192.168.101.160 1000 120<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This will generate 1000 worker threads which will attempt to load the application.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can see the load is going up, 46% now, where the target is 25%<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"958\" height=\"255\" src=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-14.png\" alt=\"\" class=\"wp-image-5360\" srcset=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-14.png 958w, https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-14-300x80.png 300w, https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-14-768x204.png 768w\" sizes=\"auto, (max-width: 958px) 100vw, 958px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">After a few minutes, we then see its added another replica, lovely, that is just what we wanted to see. When the script finishes, the load is reduced and it will scale back down again.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"961\" height=\"238\" src=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-15.png\" alt=\"\" class=\"wp-image-5361\" srcset=\"https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-15.png 961w, https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-15-300x74.png 300w, https:\/\/geekmungus.co.uk\/wp-content\/uploads\/2026\/07\/image-15-768x190.png 768w\" sizes=\"auto, (max-width: 961px) 100vw, 961px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">So in this article, we have explored Horizontal Pod Autoscaler (HPA), although a simple example it does show how Kubernetes can manage the performance and resilience of an application automatically by watching metrics and acting accordingly. We&#8217;ll explore this topic a bit more in a later article where you have an application that uses queues, and therefore this method may not act as required in the event of increased CPU workload.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Essentially:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The metrics section defines the goal (e.g. &#8220;keep average CPU at 25%&#8221;).<\/li>\n\n\n\n<li>The HPA controller continuously calculates the desired number of replicas to achieve that goal.<\/li>\n\n\n\n<li>The behavior section defines the rate limits for scaling up or down, preventing sudden jumps that could destabilize your application or infrastructure.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This separation between what state the cluster should reach and how quickly it&#8217;s allowed to get there is what makes HPA both responsive and stable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Additional Information<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.vcluster.com\/blog\/how-to-set-up-metrics-server-an-easy-tutorial-for-k8s-users\">https:\/\/www.vcluster.com\/blog\/how-to-set-up-metrics-server-an-easy-tutorial-for-k8s-users<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/blog.devgenius.io\/kubernetes-deployment-using-kubespray-63e5086237f7\">https:\/\/blog.devgenius.io\/kubernetes-deployment-using-kubespray-63e5086237f7<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/daegonk.medium.com\/kubernetes-metrics-server-c3fb49925aa5\">https:\/\/daegonk.medium.com\/kubernetes-metrics-server-c3fb49925aa5<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In part 4 of the series we are exploring Metrics. Kubernetes can collect metrics of attributes like CPU or Memory etc. These metrics are collected by a Metrics Server, you can use the built in Kubernetes one (Metrics Server), or you can use external metrics which are collected by something else, e.g. Prometheus and then &#8230; <a title=\"Kubernetes &#8211; Part 4 &#8211; Metrics and HPA\" class=\"read-more\" href=\"https:\/\/geekmungus.co.uk\/?p=5347\" aria-label=\"Read more about Kubernetes &#8211; Part 4 &#8211; Metrics and HPA\">Read more<\/a><\/p>\n","protected":false},"author":4,"featured_media":4850,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[10,11],"tags":[],"class_list":["post-5347","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kubernetes","category-linux"],"_links":{"self":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/5347","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5347"}],"version-history":[{"count":10,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/5347\/revisions"}],"predecessor-version":[{"id":5371,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/5347\/revisions\/5371"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=\/wp\/v2\/media\/4850"}],"wp:attachment":[{"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5347"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5347"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/geekmungus.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5347"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}