Clik here to view.

Multicluster Kubernetes gets complicated and expensive fast — especially in dynamic environments. Private cloud multicluster solutions need to wrangle a lot of moving parts:
- Private or public cloud APIs and compute/network/storage resources (or bare metal management)
- Linux and Kubernetes dependencies
- Kubernetes deployment
- etcd configuration
- Load balancer integration
And, potentially other details, too. So they’re fragile — Kubernetes control planes on private clouds tend to become “pets” (and not in a cute way). Multicluster on public clouds, meanwhile, hides some of the complexity issues (at the cost of flexibility) — but presents challenges like cluster proliferation, hard-to-predict costs, and lock-in.
What Are Hosted Control Planes (HCPs)?
Clik here to view.

Kubecon
Hosted Control Planes (HCPs) route around some (not all) of these challenges while bringing some new challenges. An HCP is a set of Kubernetes manager node components, running in pods on a host Kubernetes cluster. HCPs are less like “pets” and more like “cattle.”
- Like other Kubernetes workloads, they’re defined, operated, and updated in code (YAML manifests) — so are repeatable, version-controllable, easy to standardize. But worker nodes, as always, need to live somewhere and networked to control planes, and there are several challenges here.
- They gain basic resilience from Kubernetes itself: if an HCP dies, Kubernetes brings it back, and the load balancer directs API and worker traffic to the new instance. It’s also possible to configure HCP control planes to scale horizontally for robust high availability, but this can involve some new challenges.
- HCPs share workload ingress, load balancing, and other services integrated with the mothership cluster, eliminating the need to integrate these services to individual multicluster control planes. But, that said, the mothership cluster becomes a potential single point of failure for many control planes, so it needs to be built carefully if high availability is required.
Overall, therefore, using HCPs simplifies multicluster significantly — reducing resource consumption, improving utilization, and consolidating the overall Kubernetes footprint. HCPs can, in principle, provide operational efficiencies for use cases ranging from classic multicluster to hybrid, Edge, and IoT. But realizing benefits means overcoming challenges — it depends on how a specific HCP solution is designed, how it’s automated, and on its host cluster.
Technical Challenges for HCPs
Controller/Worker networking. In a typical Kubernetes cluster, the container network interface (CNI) network spans controllers and workers. This makes things simple for consolidated cluster architectures, but more complex if you want to build distributed ones (NATs, firewalls, etc.)
In an HCP cluster, the control plane is a workload running on the mothership cluster, and the workers are typically set up on a separate CNI of their own. So, you need to enable secure connections between the HCP’s API server and the worker nodes’ CNI. And it needs to be robust and simple to set up and manage, because many intriguing use cases (more on this follows) depend on being able to put workers basically anywhere (Edge locations, mobile locations and connections, etc.), potentially on the far side of low-quality and/or insecure network links.
Historically, SSH tunneling has been used to make this kind of connection. More recently, the Konnectivity Kubernetes sub-project provides an efficient solution (though still with some drawbacks — see more below). And Kubernetes provides an abstraction for configuring this kind of external connection (egress-selector). But how the connection is made in practice is how the HCP is implemented: current projects/products do this a range of different ways.
Building and maintaining worker nodes. Setting up worker nodes for an HCP can obviously be done many different ways, depending on the case. These include (partial list):
- Building workers on VMs, on the same public or private cloud fabric as hosts the mothership cluster — either within the same VPC or in other VPCs (for example, to enable deterministic placement of workers and associated persistent data within a desired geographic area or regulatory regime)
- Building workers on VMs, on a different cloud fabric (for example, a private cloud fabric like OpenStack) from that hosting the mothership cluster
- Building workers on bare metal machines, managed by the IaaS that supports the mothership or on a remote (and/or different) IaaS (for example, on an OpenStack cloud equipped with Ironic bare metal management)
- Building workers on remote bare metal machines, either under a centralized bare metal management system or stand-alone, or even …
- Building workers on the mothership cluster itself, using KubeVirt to host VMs+host OS+worker on Kubernetes
Whatever strategy the use case dictates, it’s most attractive, at this point, to think about using Kubernetes’ own Cluster API facility to do this setup, via a CAPI operator installed in the mothership cluster (or elsewhere, for example, on a bastion server) — ideally in all relevant target environments.
Using virtual servers for nodes is now readily possible with private and public clouds, for which CAPI providers exist. This is all straightforward, provided you have a systematic approach to solving the networking challenge noted above. So the “centralized, cloud-resident multicluster” use case is well in hand.
For bare metal, though — essential to Edge and IoT use cases — things are not as standardized, and are actually quite situation-dependent. On OpenStack, where Ironic (bare metal management) is configured, it’s possible for CAPI (via the OpenStack provider) to leverage Ironic to provision bare metal servers.
For those who don’t have an OpenStack cluster, the metal3-io project is a CAPI provider that can leverage standalone Ironic. Providers that employ Canonical MaaS to perform bare metal management are also available, for those who want to use Ubuntu on bare metal nodes. VMware BMA is designed to install ESXi bare metal hypervisors on physical nodes, and for the moment, no CAPI provider is available to make this work.
Moreover, even for the working paradigms, none of this is simple to set up — bare metal management systems typically require a PXE-boot data source and have other requirements local to nodes. So they’re all more for data center applications (especially, where you might need bare metal for performance reasons) and somewhat of a heavy lift — too heavy, likely — for real Edge applications, and certainly for IoT. Again, this is an area requiring community focus to develop broadly useful generic solutions.
Once you’ve provisioned infrastructure, creating HCP workers can again be done in numerous ways, with kubeadm or other tools. There is a lot to be said, however, for making this as simple and small-number-of-critical steps as possible (which kubeadm does not) and for making worker node creation possible (and again, simply so) without needing internet access.
Horizontally scaling HCPs can be a little complicated. Kubernetes can make sure an HCP is restarted if it fails. And this works well so long as whatever caused the HCP to fail doesn’t compromise its persistent storage, and that a load balancer is provided to help route worker node communications and API commands to the new HCP instance.
But if you need to scale an HCP horizontally — for performance or true high availability — there are typically a few caveats. The first is etcd, which is hard to configure and finicky. So, some HCP solutions require that you replace etcd with a SQL data store (for example, Postgres) using Kind.
The second challenge is that you also need to scale the system whereby worker nodes are networked to control planes. Move beyond one controller instance, and you now need to establish a one-to-many mesh of connections between worker nodes and available controllers. And some connectivity methods (like Konnectivity) permit this, but in a very convoluted way.
The “mothership” cluster provides shared services but can become a single point of failure. The basic requirements for a “mothership” cluster for hosting HCPs aren’t hard to meet. You need to provide a load balancer to expose HCP APIs in a way that can survive HCP restarts. And you need to enable persistent volumes so that HCPs can recall state across node restarts.
To meet higher-level availability requirements, however, you need to do more engineering — since the mothership is a single point of failure that may end up hosting large numbers of (possibly critical) control planes. These are the same requisites you’d need to consider when designing/configuring any highly-available workload on Kubernetes.
To review: you will probably want to distribute mothership controllers and workers across an odd number of three or more availability zones (AZ) (to meet Kubernetes quorum requirements), and ensure that storage for persistent volumes is resilient, distributed to remain accessible after AZ failure, and that persistent volumes are mirrored across all AZs (so that, if an AZ fails, restarted control planes will find their storage).
If your mothership cluster will use etcd for state (as opposed to Kind+SQL or another more latency-tolerant state database), you also need to ensure that network links between AZs have no more than 5msec latency (to avoid compromising Raft consensus). Many other best practices for building highly available and secure Kubernetes clusters will also need to be evaluated and applied as needed.
Interestingly, given a highly available mothership cluster, you may actually not need to scale HCPs horizontally, since the mothership virtually assures survivability of functional controllers and workers on which failed HCP instances can be restarted. But if you need HCPs too, to be highly available, you’ll need to go the extra mile and make them horizontally scalable in odd numbers (3+) — and configure things so that each of an HCP’s controllers goes into a different AZ. (Note: if Kind+SQL is used to maintain state, the “always three or more, always an odd number for Raft consensus” requirement may be waived, possibly giving you more freedom in determining how many controllers are needed.)
There are, of course, compromise solutions that can make a mothership cluster very resilient without going to the lengths required for high availability. For example, Mirantis often uses Velero, which is a cluster backup tool that understands workloads, and can take the entire contents of a cluster, including storage, and save a “snapshot” somewhere safe (for example, in an Amazon Web Services‘ S3 bucket). If the cluster fails, you rebuild it, install Velero, and reinstall the snapshot. Still, there’s no free lunch: making Velero work in some scenarios may demand that developers pay some special attention to volume storage and databases.
Automating HCP setup and keeping it simple is hard. Most current HCP solutions leverage a Kubernetes operator to support HCPs as a custom resource. Beyond this, everyone does things their own way, and the operator’s required complexity will reflect the preferred deployment strategy, the Kubernetes cluster model used, communications architecture, HA requirements, and other details. Overall, the challenge is to keep HCP configuration files as simple and short as possible, so that operators don’t need to mind (and of course, modify) 1000 lines of YAML to get an HCP up and running or perform what should be simple operations. And there will always be things the operator can’t know, and that need to be specified in the config file(s). For example, the operator can’t detect that the mothership is integrated with something that implements a load balancer.
Now, that we’ve reviewed challenges, let’s look at how open source can help.
How k0s and k0smotron Do HCPs
Mirantis’ open source projects k0s Kubernetes and the k0smotron HCP operator provide the basics of a Hosted Control Plane solution that adapts well to many different use cases. Here are some important highlights of k0s/k0smotron.
K0smotron will run on any Cloud Native Computer Foundation-validated Kubernetes, so it limits restrictions on how users get and run a mothership cluster. Any public cloud Kubernetes will work, as will DIY “upstream” Kubernetes setups and CNCF-validated Kubernetes platforms from enterprise Kubernetes solution providers.
However, k0smotron only installs HCPs based on k0s Kubernetes, and these will integrate with worker nodes that use the same release of k0s. There are several important reasons for this.
- K0s is a zero-dependencies Kubernetes distribution that installs on any node (controller or worker) via a single binary download and a few short commands — installing k0s and its tools, starting the node in its assigned role (for example, worker), retrieving credentials and connecting to the control plane. Zero dependencies vastly simplifies the job of preparing an operating system for Kubernetes installation (for most popular enterprise-class Linux variants, you can just run updates). And it means the job of building, configuring, starting, and attaching worker nodes is very simple and compatible with virtually any kind of automation tooling (including Cluster API — see below).
- K0s is a minimal (though flexible) multiplatform distribution that can install and run from a single binary on x86-64, ARM64, and ARMv7 hardware. A k0s worker can work quite well on one CPU/vCPU, with as little as 0.5GB of RAM and as little as 1.3GB storage (SSD preferred). So, k0s gives you maximum freedom to build applications out to the Edge and beyond.
- K0s uses Konnectivity to enable control plane/worker separation, even when using k0s to build standard clusters in the normal way. So, “out-of-the-box,” k0s hugely simplifies networking challenges likely encountered in HCP use-cases. In many remote worker-node networking scenarios (NATs, etc.) it just works.
- K0s knows how to horizontally scale Konnectivity while scaling controllers into an HA configuration, so achieving HA just requires scaling out containerized controllers and telling the configured load balancer to forward inbound packets to APIserver and Konnectivity ports appropriately.
- K0s does require Kind+SQL for HA control plane scaling, but as noted above, not using etcd permits waiving etcd requirements for Raft consensus, meaning that 2+ controllers can work, even numbers of controllers are okay, and inter-controller latency requirements are less stringent.
- K0s has a Cluster API operator that works well to marshal infrastructure for worker nodes on public and private clouds, and has introduced a basic generic bare metal deployment provider based on the open source k0sctl cluster-deployment tool. This opens the door to building and operating completely Kubernetes-centric multicluster Kubernetes environments.
Clik here to view.

Figure 1. k0s controller/worker separation and simple, robust controller/worker networking is accomplished with the help of Kubernetes Konnectivity sub-project.
Questions and Considerations about K0smotron
Why Not Standardize All This?
The idea of standardizing HCPs is very attractive. It’s also very difficult — in part because different organizations have different priorities. The biggest issue, though is that the Kubernetes you use in an HCP solution matters a great deal. k0s and k0smotron work well (and simply) for HCPs because k0s was specifically designed to have zero dependencies, run on all kinds of hardware, and implement controller/worker separation with Konnectivity. This made building k0smotron simpler. It made building and integrating with Cluster API simpler. And it gives users maximum support for building and running HCP applications without forcing them to master and build around a lot of complexity.
What Are HCP Technical and Business Drivers?
Despite the above-mentioned challenges, Hosted Control Plane solutions have huge potential benefits.
HCPs support many use cases efficiently, including multi-cluster, multi-cloud, hybrid cloud, edge, and even IoT — all in self-managed or third-party-managed contexts. Centralized control planes (and thus, centralized operations) with distributed workers anywhere is a powerful enabling proposition, whether your goal is providing self-service Kubernetes for teams or designing edge applications to run on workers in thousands of locations or customer sites.
Among other architectural benefits, it gives you a great deal of freedom in where computing happens and where data resides, while allowing economies of scale and centralized control for operations and development, since platform engineers, operators, developers and others all control clusters through the mothership. All this contributes to greater cost-efficiency, time-efficiency, security, and standardization vs. conventional strategies, with independent clusters in many environments.
Consolidation and Simplification
Consolidating hosted control planes on Kubernetes this way (and using Cluster API for infrastructure management) — treating Kubernetes as your cloud — has huge advantages, too. Obviously, it’s faster and simpler. You have one API to control. One “platform engineering and ops environment” to tune up. One automation model — Kubernetes HCP cluster and related manifests — to maintain and version control. This leads to clarity, standardization, optimization, and reuse. It reduces or eliminates the “added skills burden” of having to manage infrastructure(s) directly. And all this conspires to reduce operations overheads, enable greater efficiency and speed, and minimize security and compliance risks.
Standardized Platform Services
Consolidating with HCPs means no longer needing to build and maintain a ton of independent clusters: a huge source of toil, complexity, risk, and cost. Reducing your overall Kubernetes footprint simplifies platform engineering and encourages standardization on one set of platform services that all HCPs and applications can share. And this, in turn, encourages adopting standard tools and workflows that DevOps and developers can share. Result: everyone gets to focus on what is really important — platforms are continually optimized, developers are more productive, applications are more secure, compliant, resilient.
Improved Access, Security, Compliance
In the consolidated model, security is simpler to architect and administer — increasing clarity and visibility of security states, and reducing the possibility of errors and omissions leading to vulnerabilities. Host cluster access controls protect HCP APIs. You have one IAM framework to manage (Kubernetes RBAC) and one secrets store (Kubernetes Secrets). Kubernetes and container runtime policies can be standardized across the host environment and HCP configurations.
Control and flexibility
Because HCPs are effectively just Kubernetes workloads, you can configure them very flexibly — much more flexibly than most public clouds would permit configuring on-demand clusters. For example, you can set up Kubernetes API feature-flags as you prefer — public clouds usually don’t permit this.
Simplified Updates and Upgrades
Compared with the chaos of maintaining independent clusters all over the place, the consolidated HCP model makes updates (and upgrades) much more manageable and less risky. To update HCP control planes, you just update their manifests, and reapply to load new control plane containers.
A Real-World HCP Use Case Example
A public sector organization recently approached Mirantis for help with a business challenge: how could they leverage k0s and k0smotron to provide Kubernetes clusters on demand to multiple customers?
They wanted their solution to comprise only open source components, and were impressed by k0s+k0smotron’s simplicity. In the wake of Broadcom’s acquisition of VMware and fears of licensing cost increases, the organization was looking for a measured way to reduce dependence on proprietary technology. Nevertheless, they wanted to utilize existing vSphere — a known quantity — to host their solution on VMs.
This was, of course, no barrier: k0s runs well on VMware/ESXi virtual machines, and a Cluster API provider for VMware is available. Additionally, because the organization had implemented vSphere across multiple failure domains, Mirantis could leverage this to make the multicluster solution highly available: another customer requirement.
Clik here to view.

Figure 2. Abstract architecture for k0s+k0smotron-based multi-tenant/multi-cluster solution.
The solution implements a single k0s “mothership” cluster, serving end-customers with k0smotron-managed k0s child clusters. The mothership would be configured behind a shared load balancer, with standard services for ingress, security, logging/monitoring/alerting, and telemetry, enabling the service provider to monitor the solution and optionally call upon Mirantis as needed for managed operations and proactive maintenance.
Also specified was a CI/CD framework facilitating GitOps-style operations of the mothership. LMA/telemetry and GitOps could also be installed in child clusters (using a variant manifest) and offered to end-customers as value-added services. In most functional respects (scale aside) this is equivalent to a public cloud Kubernetes service, but running on virtualized private infrastructure.
A more concrete description of the HA architecture is shown in Figure 3. The mothership cluster control plane is deployed in HA mode, with one node in each AZ. As noted previously, this necessitates low latency links between AZs for etcd Raft consensus. Child clusters can also be deployed in HA mode, with containerized controller nodes and worker nodes distributed to all AZs — a useful value-added service for customers requiring cluster and application high availability. The mothership cluster is provisioned with backup and recovery and external shared resilient storage.
Image may be NSFW.
Clik here to view.
Clik here to view.

Figure 3. Solution architecture distributed across three availability zones (AZs) for mothership high availability. End-customer child cluster control planes can optionally be deployed as HA (as a value added service). Worker nodes for a given child cluster can also optionally be distributed across AZs for application high availability.
Operating the solution is straightforward:
- Operators (via an automated process) create k0s child clusters using k0smotron and the Cluster API operator, using a VMware provider to provision worker nodes, and updating the load balancer with the child cluster’s Kubernetes API server address(es). A single identity provider is used to manage access to the mothership and child clusters, leveraging k0s’ powerful RBAC mechanism — an admin role is provisioned for the end customer.
- Automation retrieves the child cluster access key and delivers it to the end customer. For the sake of privacy and data protection, the key can then be deleted from the mothership cluster.
- The end customer takes control of their child cluster and enables further access/privileges via the child cluster’s API or kubectl.
Conclusion
Hosted Control Planes mark a paradigm shift in Kubernetes management. The benefits of resource efficiency, centralized orchestration, and newfound use cases demonstrate the transformative power of this approach. As organizations embrace evolving deployment scenarios, Hosted Control Planes emerge as a cornerstone in the journey toward efficient, scalable, and secure Kubernetes orchestration.
Join the Conversation at KubeCon + CloudNativeCon
Curious to dive deeper into Hosted Control Planes and how k0smotron is driving this evolution? Attend the Kubecon+CloudNativeCon EU panel on Hosted Control Planes, with representatives from Apple, Clastix, Red Hat, and Mirantis. Mirantis’ booth number is J27: Come by and engage with industry pioneers, explore real-world applications, and be part of the dialogue shaping the future of Kubernetes.
The post Simplify Kubernetes Hosted Control Planes with K0smotron appeared first on The New Stack.
Hosting Kubernetes control planes as pods can enable (and simplify) operations for multicluster and Edge use cases. But, bring along some new requirements and concerns. And standards may be slow in coming.