Going all-in on Kubernetes at home

This post is part of a series: The "Home" Lab.

It’s December again, and with the holidays upon us, it’s the time of year in which work winds down and I get the chance to refactor destroy the home lab and from its ashes create something new.

If you’re in an early stage of your career, having a small lab environment is a great way to play with the tools of the trade and get exposure to the kinds of problems you’ll have to solve as an infrastructure engineer. You can get started with something small by using your own laptop with Colima.

In past iterations, I relied on VMware to provide a stable foundation with shared storage that I would deploy virtual machines on top of, for things like Kubernetes clusters. This worked well and was very reliable, but locked me into having an architecture that included a base layer of virtual machines. This meant that unless I planned on using nested virtualization, I could then not deploy any kind of architecture that pushed virtualization higher up in the stack.

This year, I’ve made the decision to purge VMware from my servers and embrace Kubernetes on bare metal. With VMware’s recent acquisition by Broadcom complete, I don’t have long-term conviction in the product.

More importantly though, while vCenter and ESXi remain a dependable platform, I’ve come to believe that Kubernetes makes more sense as the foundational layer in a datacenter environment. Projects such as Kata Containers and Kubevirt provide a means to either use virtualization to further isolate containers, or to run traditional virtual machine workloads complete with live-migration functionality. Having the capability to manage my virtual machine and containerized workloads using the same platform is a strong selling point and depends on having Kubernetes be installed on bare metal.

In my day job, I work exclusively with cloud APIs, specifically AWS. I want my home lab compute environment to offer a similar level of abstraction that makes deploying services easy. I do not want to manage any more than N machines, where N is the number of physical boxes in my environment.

After evaluating several options, I ultimately decided to go with OpenShift Container Platform as my Kubernetes distribution of choice. Here are a few of the reasons why:

  1. A great user interface to be the foundation of my “on-prem” cloud platform. Just because I’ll be using Operators and Helm Charts doesn’t mean that there’s not value in a polished graphical interface for introspection and observability.
  2. Secure-by-default behavior around not requiring containers to run as a non-root user. I recall a time when running services as root was expressly frowned upon, and I still feel uncomfortable doing so even in a container. (At the time of writing, userns was not widely supported.)
  3. CoreOS as an immutable declaratively-configured operating system will help me avoid drift on the 3 pet servers.
  4. OpenShift Data Platform can provide a shared ReadWriteMany filesystem with replication and good resilience. I don’t pay a lot of attention to this environment, so I don’t want the storage layer to fall apart from bitrot too easily. I stand to benefit from not going full DIY here.

Choosing OpenShift is not without risk because this is a paid enterprise product with an unofficial policy of being free for non-production use beyond the 30-day free trial. We’ll see if it lasts until next year, given some of the changes at Red Hat. If it doesn’t work out, I’ll have to write another post next year about some of the alternatives, including Amazon’s EKS Anywhere, or Metal3. By then, I hope to have more experience with the underlying Ceph implementation to do it from scratch. In truth, I am more interested in building on the higher level constructs than the base, so by choosing an enterprise product, I’m making the foundational layer much more trivial.

So, what does my physical environment look like? Well, it’s not actually a “home” lab anymore. Being a resident of New York City, our 1500 square foot apartment simply isn’t spacious enough to host the 3 machines and switch that occupy 4 rack units of space. I co-located my machines in an inexpensive local datacenter, and my wife is much happier not having a very loud, menacing closet.

Fictional visualization of how our closet looked

Since my compute footprint is in a real datacenter, I’m able to take advantage of their BGP services and have them announce a /24 IPv4 block and a /40 IPv6 block for me and route the traffic to my edge router-switch. Having these IPs will be useful for later because I intend to have Kubernetes allocate those IPs to pods and virtual machines.

In the next posts in this series, I’ll talk about the installation process, setting up ArgoCD, and deploying some foundational services to make deploying workloads onto the internet easy. Finally, I plan on building a custom operator that controls the deployment of this website.