

Andy Suderman
Andy Suderman is a leading expert in Kubernetes and cloud native technologies, serving as the Chief Technology Officer at Fairwinds, which offers a Managed Kubernetes as a Service.
With a rich background in system architecture and cloud computing, Suderman has been at the forefront of Kubernetes innovation. In this in-depth interview, he shares his experiences, insights and the latest trends in Kubernetes autoscaling.
Autoscaling in Kubernetes is both a science and an art. It involves precise configurations and an understanding of application behavior to ensure optimal performance and resource utilization. In this interview, Suderman discusses the evolution, current state and future of Kubernetes autoscaling, providing valuable perspectives for platform engineers.
Can you please introduce yourself and provide a brief background of your initial encounter with Kubernetes and autoscaling in particular?
I got started with Kubernetes around the time of version 1.4 while I was working at ReadyTalk. Our platform at the time was running on Xen VMs on servers in a physical data center that I maintained, and we were just discovering containerization and K8s.
At the time, the Horizontal Pod Autoscaler (HPA) had been around for a little while. It felt like magic: All of a sudden, our services could scale instantly based on CPU usage, which in the days of VMs and hardware had been a lot more difficult to implement.
Fast-forward a few years, and I’m working at ReactiveOps (later named Fairwinds) running Kubernetes for a bunch of different clients, and HPA combined with cluster autoscaling was commonplace for us. We helped all of our customers implement them for their applications running on the cluster we maintained.
Can you provide a brief history of the evolution of Kubernetes autoscaling?
It all started with HPA, which was introduced early on. This allowed automatic scaling of pod replicas within a deployment. Well, not a deployment at the time, but what would become a deployment. The HPA accepts basically any metrics provider that can be queried via the metrics apiService
, and lets you set a target for that metric per pod.
HPA still exists in a very similar form today, but with enhancements, such as targeting multiple metrics and fine-tuning the scaling behavior. Another interesting bit to note here is that HPA is still the only in-tree (included in the Kubernetes core codebase) autoscaler.
Once pods were autoscaling their replicaCount
, there was an obvious need to dynamically scale the number of nodes in a cluster to accommodate the growing number of pods. The Cluster Autoscaler became the de facto way to do this.
This project went GA in 2017 — so, fairly early on. It started out pretty straightforward: As soon as it saw a pod that was in “pending” status because there wasn’t room for the scheduler to schedule it, the Cluster Autoscaler would add a new node to the cluster. Similar to HPA, this is still very commonly used today, albeit with a lot more fine-tuning, options, cloud provider support, etc..
Over the last couple years, as the limitations of the core mechanisms of HPA and Cluster Autoscaler started to become much more apparent to users, we’ve started to see the release of additional projects that build upon those concepts. KEDA for horizontal pod autoscaling and Karpenter for cluster autoscaling come to mind.
In some cases, these tools completely change the underlying mechanisms of the scaling, but the end goal remains the same: We need to scale pods horizontally, and clusters need to autoscale alongside that.
Additionally, starting around 2018, some folks realized that there was going to be a need to automatically scale pods vertically — that is, to dynamically increase/decrease their requested CPU/memory. The Vertical Pod Autoscaler (VPA) has been around nearly as long as the Cluster Autoscaler, and lives in the same repository.
Digging deeper, was the HPA or VPA inspired by other distributed systems technologies, or was it revolutionary?
The idea of horizontally or vertically scaling infrastructure really isn’t novel — there were ways that we did this in data centers with VMs in the past. They looked very similar to what we see now, except that now we are able to scale containers rather than entire VMs, and Kubernetes APIs provide us with a way to do that more easily.
Cluster Autoscaler, Horizontal Pod Autoscaler, Vertical Pod Autoscaler, Karpenter, KEDA and the list goes on. Can you put these into perspective for us?
The first part to understand is the different types of autoscaling, and then we can categorize the different projects based on that. We have:
- Horizontal pod autoscaling: Scales pod replicaCounts
- Cluster autoscaling: Changes the number of nodes in the cluster
- Vertical pod autoscaling: Changes the resource (CPU/memory) requests of pods
So within those categories, we have the various projects, which I have mentioned earlier but are worth summarizing quickly:
- HPA (horizontal pod autoscaler): The in-tree horizontal pod autoscaler, which can scale on any metric that is provided by the external.metrics.k8s.io or metrics.k8s.io
apiServices
. - VPA (vertical pod autoscaler): An out-of-tree vertical pod autoscaler that can vertically scale pods based on historical CPU and memory usage.
- Cluster-autoscaler: An out-of-tree cluster autoscaler that can add and remove nodes from the cluster based on demand from the number of pods to be scheduled.
- KEDA: A horizontal pod autoscaler that can scale pods based on any number of metrics providers. KEDA builds upon the HPA by providing a higher-level construct called a
ScaledObject
and then managing the underlying HPA objects. - Karpenter: A cluster autoscaler built originally for Amazon Web Services that can not only add and remove nodes from the cluster, but also take into account what node sizes are most optimal for the pods that need to be scheduled. Often referred to as a “smarter” cluster autoscaler.
Can you mention some of the lesser-known Kubernetes autoscaling projects or technologies that platform engineers should pay attention to?
I’m not aware of any other opens source software projects that have more novel solutions for these problems, but I can say that there are lots of commercial projects building on top of (or instead of) the existing solutions. Usually these are geared towards larger companies and can be expensive. A couple that come to mind are Fairwinds Insights, StormForge and Platform9 EMP.
What are the primary and secondary metrics that platform engineers should pay attention to for autoscaling? Is there more art than science to Kubernetes autoscaling?
I think of this in a similar way to how I think about monitoring. What metric is going to actually affect the experience of the end user? This takes me to thinking about the four golden signals: latency, traffic, error rate and saturation.
Latency may not be the best trigger for scaling, and error rate might only be appropriate for an application that is built in such a way that it only starts producing errors when it is overloaded. I think traffic is often a really great metric, but it works best if you understand the traffic rate that a single pod can handle, which requires data or load testing.
To summarize, I think that there’s a lot of “It depends,” so think about the behavior of your application and the experience of your end users or consumers. I will also say that CPU and memory utilization are very infrequently the right metric to scale on. We’ve defaulted to that in a lot of places because it was the easiest way to get started, but it’s probably time to start moving on from that.
As to whether there is more art than science, that’s an interesting thing to ponder. My immediate reaction is that it’s more art than science when you don’t have the data to do science. If you have load testing or real-world data, then you can experiment with different metrics and targets, and measure the outcomes of those experiments. That’s when it becomes science.
What was the motivation to create the Goldilocks project? Can you go into technical details of the project? Do HPA, VPA, etc., complement it?
Around late 2018, we had a lot of customers that were running into issues with their applications not scaling or performing well, and we were strongly recommending that they set their resource requests and limits on their workloads. They would often come back to us and say, “That’s great, but what should I be setting them to?”
This was a very reasonable question that led to us digging through a lot of Datadog metrics and putting together recommendations. At the time, I was looking to learn how to write Go, specifically within the Kubernetes ecosystem, and I also wanted a more automated way to provide these recommendations to clients.
While I was researching how best to go about providing these recommendations, I stumbled on the VPA project. I realized that the VPA already had a recommendation engine built into it, so instead of reinventing the wheel, I decided to write a nice wrapper around that project.
Goldilocks became a controller that would automatically create VPA objects for all of your deployments in your cluster, and then aggregate the recommendations into a single dashboard. This allowed you to get a baseline for resource requests for your pods relatively quickly, without having to sift through historical metrics.
There’s a lot of complex details around how HPA works alongside VPA that won’t fit into this discussion, but the answer is generally — if you’re scaling horizontally on metrics other than CPU/memory, then the two work quite well with each other.
What is the Kubernetes autoscaling road map? Where is the Kubernetes community heading in terms of autoscaling? Is there anything else you would like to add?
One other thing that’s worth keeping an eye on is the multidimensional pod autoscaler (MPA) that Google built. It’s been available in Google Kubernetes Engine for a while, but they recently contributed it back to the Kubernetes autoscaling repository. This autoscaler aims to both vertically and horizontally scale pods based on CPU or memory.
As I mentioned previously, I still believe that, in most cases, CPU and memory are not the best horizontal scaling metric, but for those folks that don’t have another metric, or it is the right metric, then the MPA might be a worthwhile solution to look at.
Another interesting adjacent road map item that is coming in Kubernetes is the ability to dynamically change pod resource requests and limits. This won’t affect how those pods get scheduled onto nodes, but it does open up some interesting opportunities for VPA and other projects to dynamically resize pods.
The post Kubernetes Autoscaling: Q&A With Fairwinds CTO Andy Suderman appeared first on The New Stack.
In this interview, we delve deep into the intricacies of Kubernetes autoscaling with Andy Suderman, CTO of Fairwinds