High Availability baremetal kubernetes cluster from scratch

This comprehensive guide will allow you to quickly and easily set up a highly available bare metal kubernetes cluster from scratch. I write down my knowledge and what I have learned in this guide so it may help others who are in my position.

As of this writing the latest version of kubernetes is version 1.19.3


To start, you will need linux machines. This guide was written with ubuntu server 20.04 in mind but you can use whatever your prefered server linux distrubution is. We want to set aside 3 of these linux machines to act as our master nodes, the rest will be worker nodes. Preferebaly for high availability you would want at least 3 master nodes. Why not 2? Well the kubernetes documentation says it best:

"Do not use a cluster with two master replicas. Consensus on a two-replica cluster requires both replicas running when changing persistent state. As a result, both replicas are needed and a failure of any replica turns cluster into majority failure state. A two-replica cluster is thus inferior, in terms of HA, to a single replica cluster."

Master nodes do not need to have as many resources allocated to them as worker nodes, so if you're working with virtual machines you can devide the resources as follow:

Resources Master node Small Worker node Big Worker node
CPU 2 cores 4 cores 6 cores
Memory 4 - 6 GB 8 - 12 GB 12-16 GB
Storage 50 - 120 GB 60 - 120GB >120GB

CPU: More of a Proxmox issue, but some services running on kubernetes need to use native CPU and not an emulated (kvm) cpu. Depending on the expected workloads you might need more cpu.

Memory: Try to keep memory allocation consistent across a cluster. If a big node with more memory fails then it will need a lot of free memory on other nodes to rebalance to. If a smaller node with less memory fails it's less redistrobution happening, but smaller nodes might not meet the requirements of every service you plan to run. So keep this in mind when allocating memory.

Storage: A good thing to remember is where your persistent storage lives, if you have your kubernetes nodes sharing storage with your persistent services then make sure you have enough storage available on each node, if your persistent storage is located somewhere seperate from your nodes then they can get away with less storage.

I prefer to set up my machines with hostnames that indicate their role in the cluster, so master nodes will have hostnames such as master-node1 and worker nodes will have similar hostnames like k8s-worker1. It's also preferable to either have static IPs for your nodes or some kind of dns resolution, in this guide we will make use of nodes with static IP's.

If possible, ensure that your master nodes run on different machines, or as kuberentes calls them: zones. That way if one machines or "zones" fail you still have masters up and running, and is the main reason why you are interested in highly available clusters in the first place.

So you have your nodes installed with ubuntu (or CentOS or OpenSuse) and are ready to start.

Installing kubeadm

First, we setup our environment on each node. For kubernetes to work we need to disable swap on all machines[1]:

sudo swapoff -a

To ensure swap stays off even during reboots we must edit out the swap entry in fstab:

sudo vim /etc/fstab

Now we can install our needed packages including docker.

sudo apt update && sudo apt install -y docker.io apt-transport-https curl

For ubuntu server and desktop we'll need to add the correct repositories and add the kuberentes signing key:

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

Then add the Xenial Kubernetes repository:

sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

Affect your changes with an update:

sudo apt-get update

And finally install kubeadm, kubelet, and kubectl. You'll need all three of these packages if you want to initialise youir cluster and join nodes to it.

sudo apt install -y kubeadm kubelet kubectl

If you are not planning on setting up a highly available cluster and just want to get started you can just initialise your cluster now:

sudo kubeadm init

and once it has successfully initialised you can add a pod network of your choice [2] (see following chapters for that)

Setting up High-Availability

We want to set up our HA Cluster so that if one master node fails another takes over without disruption. This is the essence of HA. For this we will need a virtual IP that can move between the nodes. We will use keepalived for this. Read more on keepalived here: https://www.redhat.com/sysadmin/ha-cluster-linux

For arguments sake let us imagine that you have set your master nodes as follow:

We'll choose our virtual ip as

First, lets get keepalived and haproxy installed. Thanks goes to Andrei Kvapil for their guide on this [3].

sudo apt install keepalived

Then we'll write a new configuration file in /etc/keepalived/keepalived.conf. Ensure that that the interface value matches your network interface (use ip a to see it) otherwise keepalived won't start. You should also create some sort of password for keepalived to connect with other keepalived instances.

Primary Master

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 1
    priority 255
    advert_int 1
    nopreempt
    authentication {
        auth_type AH
        auth_pass <Choose a password eg: K33p@live>
    }
    virtual_ipaddress {
        192.168.0.10
    }
}

Backup Masters

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 1
    priority 254
    advert_int 1
    nopreempt
    authentication {
        auth_type AH
        auth_pass <Choose a password eg: K33p@live>
    }
    virtual_ipaddress {
        192.168.0.10
    }
}

Note For multiple clusters on the same sub-network ie. Dev, Staging, the respective keepalived clusters require a unique virtual_ipadress, virtual_router_id and authentication fields per Kubernetes cluster.

Now you can start and enable keepalived:

sudo systemctl start keepalived
sudo systemctl enable keepalived

Run `ip a` again to ensure that keepalived is running, you should see a new virtual IP along side your node IP.

For extra safety you can also add a loadbalance, haproxy, to the stack. This adds a layer between keepalived and the kube-apiserver. [4] Install haproxy:

`sudo apt install haproxy`

Then edit the haproxy config under `/etc/haproxy/haproxy.cfg` (It is recommended to make a backup of your existing haproxy.cfg file like so `cp haproxy.cfg haproxy_backup.cfg`)

Add these lines to the bottom of the config file:

frontend k8s-api
    bind 0.0.0.0:8443
    bind 127.0.0.1:8443
    mode tcp
    option tcplog
    default_backend k8s-api

backend k8s-api
    mode tcp
    option tcplog
    option tcp-check
    balance roundrobin
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server node1-master 192.168.0.101:6443 check
    server node2-master 192.168.0.102:6443 check
    server node3-master 192.168.0.103:6443 check

So now our frontend facing part of the kubernetes api is accessable on port 8443. Keepalived will monitor the cluster IP, and haproxy will monitor the availibility of the kubernetes api-servers.

If, for example, node1-master has issues with its kube-apiserver then haproxy will reroute the traffic to other master nodes' kube-apiserver. And if node1-master fails completely keepalived will redirect to another node completely.

You also might want to check out the timeouts for haproxy otherwise kubectl exec commands will be cancelled after the timeout period:

timeout client    4h
timeout server    4h

Remember: we configure our kubectl and kubelets to connect to port 8443 instead of port 6443.

Initialising your cluster

If you implemented keepalived you will need to create a kubeadm-config.yaml file.

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
apiServer:
  certSANs:
  - "192.168.0.10"
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
controlPlaneEndpoint: "192.168.0.10:6443"

Basically it tells kubeadm where it should bind its endpoint and where other kubeadm instances can find the cluster. You can read a more indepth guide on how the kubeadm config file works here:

https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2.

If everything had been successful until now you should just have to initialise your cluster:

`sudo kubeadm init --config kubeadm-config.yaml`

You'll know the initialisation has succeeded when you are greeted with the following output:

  To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

  You can now join any number of control-plane nodes by copying certificate authorities
  and service account keys on each node and then running the following as root:

  kubeadm join 192.168.0.10:6443 --token  \
    --discovery-token-ca-cert-hash sha256: \
    --control-plane

  Then you can join any number of worker nodes by running the following on each as root:

  kubeadm join 172.23.139.10:6443 --token  \
      --discovery-token-ca-cert-hash sha256:

Your Kubernetes control-plane has initialized successfully! Copy your output somewhere, you will need it to add the rest of the master nodes (control-plane) and worker nodes to the cluster.
Run the kubectl get nodes command to confirm your master is running. You'll notice that the master is "NotReady". This is because we haven't added a pod network yet.

We'll be using flannel as our pod network but you can use any of the other supported pod networks if you like [2][5]:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

If you followed this guide you have successfully created the first master node of your new kubernetes cluster! Repeat these steps for the next two masters, then you can start adding your worker nodes! Congratulations!

Sources

This guide was made using the following resources:

Bonus material: