Category: Engineering

Stuff I’m working on…

  • Roll Your Own Kubernetes Cluster

    I always seem to have a drawer-full of Raspberry Pi’s in the garage that aren’t doing anything, so I thought I’d hook them to an old router and try to make them into a kubernetes cluster.


    A side goal was to have the nodes auto deploy sort of like a mini cloud provider. To head towards auto install I took a cue from the rpi-imager. When you plugin a rpi with a blank SD card it will run a little code from eeprom and after looking around for something to boot will eventually hook eth0 to the network using DHCP. Then it downloads a boot.img from downloads.raspberrypi.org.


    That boot.img contains the rpi-imager binary so after a reboot the imager appears with a preconfigured list of rpi hardware, OS types and install target. – Cool – but not very automatic and I don’t really want to pull the node images from the internet everytime. Also when I go to redeploy, I’d rather not have to pull out each SD card and wipe it first.

    I sort of settled on running ubuntu/noble and wanted to use cloud-init to configure a couple things like unique hostname and timezone and to install k3s-agent on the nodes.

    HTTP_BOOT

    The little bit of magic that hooks up the DHCP network connection and pulls the rpi-imager boot.img is known as HTTP_BOOT. It’s a bit of code and settings that live in the eeprom of the pi. You can update the eeprom with a custom URL for HTTP_BOOT something like this boot.conf file:

    [all]
    BOOT_UART=0
    WAKE_ON_GPIO=1
    POWER_OFF_ON_HALT=0
    BOOT_ORDER=0xf461

    [gpio8=0]
    SIGNED_BOOT=1
    BOOT_ORDER=0xf7
    NET_INSTALL_ENABLED=1
    HTTP_HOST=172.28.0.3
    HTTP_PATH=rpi-imager

    The [gpio8=0] block tells the pi to pull a boot.img from http://172.28.0.3/rpi-imager (but only when the general purpose I/O pin #8 is grounded – more on that later). HTTP_BOOT actually pulls two files from that URL – the boot.img and a signature file called boot.sig.

    Out of Band Control

    My “cluster” is a board with an old router zip-tied down and three rpi4 mounted on finish nails I drove through the plywood. I ran jumpers from GPIO8 and the GRND pins on the pis to a bread-board with momentary buttons. To reboot a pi4 you can ground the “RUN” pad in the J2 jumper block (I’ve got a ground wire with a Dupont prong on it to touch the run pad).

    To cause an automatic reinstall, I hold down one of the buttons and ground the run pad of the corresponding pi. I could automate my out of band controller with relays and maybe something like an esp32 to drive it.

    Signed Boot Objects

    When the pi boots up and pulls boot.img it doesn’t just trust that random code from the internet it checks the boot.img against boot.sig. The eeprom contains a public key and boot.sig was created using the Raspberry Pi Foundation’s private key.

    To make a custom boot.img or even a boot.conf you need your own key pair and run extra commands to sign things. So let’s do that and sign the boot.conf:

    mkdir ~/keys
    cd ~/keys
    openssl genrsa -out private.pem 2048
    openssl rsa -in private.pem -outform PEM -pubout -out public.pem
    cd -
    rpi-eeprom-digest -k ~/keys/private.pem -i boot.conf -o boot.conf.sig
    cp ~/keys/public.pem boot.conf boot.conf.sig /var/www/html

    Here I’ve used the private.pem key to create a signature file for boot.conf (boot.conf.sig). Then I copied the public key, and boot.conf files onto a nginx server I’m using as my boot server (172.28.0.3).

    Updating the eeprom on the pi

    I log into one of the pi computers and update the eeprom like this:

    wget 172.28.0.3/boot.conf
    wget 172.28.0.3/boot.conf.sig
    wget 172.28.0.3/public.pem

    sudo rpi-eeprom-config -a boot.conf
    # note the pieeprom-*.bin filename
    cp /lib/firmware/raspberrypi/bootloader-2711/default/pieeprom-2024-04-15.bin pieeprom.bin

    sudo apt install python3-pycryptodome
    rpi-eeprom-config -p public.pem -c boot.conf -d boot.conf.sig -o pieeprom.upd pieeprom.bin
    sudo rpi-eeprom-config -a boot.conf pieeprom.upd

    Making a Custom boot.img

    I cloned in the rpi-imager code from github and built it up following the instructions in the README.md. The build takes a long time (maybe an hour). Ultimately, the repo builds an AppImage version of the general portable imager – which wasn’t what I wanted. Confusingly, the README.md mentions the “embedded” version of the imager, but the default git branch ‘qml’ doesn’t support the embedded build.

    To build the embedded version I had to git checkout v1.8.y first. All in all, it goes something like this:

    git clone https://github.com/raspberrypi/rpi-imager.git
    cd rpi-imager
    sudo ./build-qt.sh # and wait
    git checkout v1.8.y
    cd embedded
    ./build.sh

    Honestly I’m not sure if the initial ./build-qt.sh is required – likely since that builds up the arm64 version of the Qt libs needed for the imager interface. Ultimately the embedded/build.sh should create an output directory with a minimal linux OS. I found it was just easier to do most of the embeded build stuff as root user. Don’t be a dolt like me and delete the output directory. There’s a cmdline.txt file in there if you delete it, the kernel can’t boot. Pro-tip, you can remove the quiet cmdline arg to see how the kernel boots up

    The output directory needs to be packed into a FAT32 file system. I wrote a script called pack.sh to do that.

    #!/bin/bash 

    LOOP=$(losetup -f)
    echo "LOOP is $LOOP"
    if [[ ! -f boot.img ]]
    then
    dd if=/dev/zero of=boot.img bs=1M count=36
    losetup -P $LOOP boot.img
    mkfs.vfat $LOOP
    else
    losetup -P $LOOP boot.img
    fi
    echo "mount $LOOP boot"
    mount $LOOP boot
    cd output
    cp -rp * ../boot
    cd -
    umount boot
    losetup -d $LOOP

    ./rpi-eeprom-digest -k ../private.pem -i boot.img -o boot.sig

    scp -i /home/sandy/.ssh/id_rsa boot.img boot.sig sandy@bunsen:/var/www/html/rpi-imager/

    pack.sh makes a 36MB boot.img file and then binds it to a loop device. The loop device is mounted as a directory called boot and the contents of output are copied in. Then the script signs the boot.img file with the private key from before. Then I copy those to the boot server

    Great, but that’s just the same rpi-imager signed with a different key. To customize it I needed to learn a bit about buildroot. The boot.img does a minimal boot and then starts rpi-imager as a service. The service startup script is in a buildroot overlay rpi-imager/embedded/imager/board/overlay/etc/init.d/S99-rpi-imager.
    So I hacked on that:

    #!/bin/sh

    #
    # Script executed at start
    #

    case "$1" in
    start)
    #udevd --daemon
    #udevadm trigger
    INTERVAL=10
    # Polling interval in seconds

    echo "Waiting for eth0 to connect..."

    while true; do
    # Check if eth0 is up using /sys/class/net
    if [ -d /sys/class/net/eth0 ] && [ "$(cat /sys/class/net/eth0/operstate)" = "up" ]; then
    echo "eth0 is now connected!"
    break
    else
    echo "eth0 is down, checking again in $INTERVAL seconds..."
    sleep $INTERVAL
    fi
    done
    ifconfig

    echo Starting rpi-imager
    UBUNTU_IMAGE=$(wget -q -O - http://172.28.0.3/ubuntu/latest)
    PATH=/bin:/sbin:/usr/bin:/usr/sbin rpi-imager --cli http://172.28.0.3/ubuntu/${UBUNTU_IMAGE} /dev/mmcblk0

    sync
    mkdir -p /bt
    mount /dev/mmcblk0p1 /bt
    SERIAL_NUMBER=$(cat /proc/cpuinfo | awk '/Serial/{print $3; exit}')
    wget -O /bt/meta-data http://172.28.0.3/cloud-init/meta-data?serial_number=${SERIAL_NUMBER}
    wget -O /bt/user-data http://172.28.0.3/cloud-init/user-data?serial_number=${SERIAL_NUMBER}
    umount /bt
    rm -rf /bt
    sync
    reboot -f
    ;;

    stop)
    ;;

    *)
    echo "Usage: $0 {start|stop}"
    exit 1
    ;;
    esac

    exit $?

    This version of the “service” runs rpi-imager in --cli mode with two arguments: a URL for an image to etch on the SD card, and the device file of the card (/dev/mmcblk0). The URL points back to my boot server. I had to add a block to wait for the network to connect – if you just run the imager without the network it fails (at least in cli mode). There’s two partitions in the ubuntu image: a boot partion (mmcblk0p1) and the main system partition (mmcblk0p2). The wget for latest makes the Ubuntu version variable (I just update that on the webserver when I add a new image file).

    The script goes on to mount the newly created boot partition and drops files called user-data and meta-data which are used by cloud-init. The .../cloud-init/... URLs are actually php scripts on the boot server that take a device serial number as an argument. That lets me customize the hostname and instance_id of the node using the pi serial number as a unique identifier.

    Finally, the service script tells the pi to reboot. When it reboots, it will use the [all]boot.conf (because I’ve released the button that grounded GPIO 8). Since there’s a freshly populated version of Ubuntu on the SD card it boots that.

    cloud-init custom user-data / meta-data

    If you put a file called user-data in the boot partition of that ubuntu/noble server image, then cloud-init will run on first boot and apply any changes specified in the file. Here’s what my finished user-data and meta-data look like:

    #cloud-config

    hostname: node-123456789
    preserve_hostname: false

    package_update: false
    package_upgrade: false

    manage_etc_hosts: true
    packages:
    - avahi-daemon
    - apt-transport-https
    - ca-certificates
    - curl
    - gnupg
    - lsb-release
    - unattended-upgrades
    - net-tools
    - nfs-common
    - cifs-utils
    - open-iscsi

    apt:
    conf: |
    Acquire {
    Check-Date "false";
    };

    users:
    - default
    - name: ubuntu
    groups: users,adm,dialout,audio,netdev,video,plugdev,cdrom,games,input,gpio,spi,i2c,render,sudo
    shell: /bin/bash
    lock_passwd: true
    passwd: $5$q...WG6
    ssh_authorized_keys:
    - ssh-rsa AAAAB3N...r8VBj9ERGEu/9M= sandy@bunsen
    sudo: ALL=(ALL) NOPASSWD:ALL

    timezone: America/Los_Angeles

    write_files:
    - path: /etc/cloud/templates/hosts.debian.tmpl
    append: true
    content: |
    192.168.1.1 docker-registry www
    - path: /etc/rancher/k3s/registries.yaml
    content: |
    mirrors:
    "docker-registry:5000":
    endpoint:
    - "http://docker-registry:5000"
    configs:
    "docker-registry:5000":
    tls:
    insecure_skip_verify: true
    - path: /etc/sysctl.d/99-k3s.conf
    permissions: "0644"
    content: |
    fs.inotify.max_user_instances=512
    fs.inotify.max_user_watches=524288

    runcmd:
    - localectl set-x11-keymap "us" pc105
    - setupcon -k --force || true
    - sysctl --system
    - sed -i -e '1s/^\(.*\)$/\1 cgroup_memory=1 cgroup_enable=memory/' /boot/firmware/cmdline.txt
    - curl -sfL https://get.k3s.io | K3S_URL=https://172.28.0.2:6443 K3S_TOKEN=K1...4::server:2aeeb60...59f0 sh -
    - reboot

    # This is the meta-data configuration file for cloud-init. Please refer to the
    # cloud-init documentation for more information:
    #
    # https://cloudinit.readthedocs.io/

    # Set the datasource mode to "local". This ensures that user-data is acted upon
    # prior to bringing up the network (because everything about the datasource is
    # assumed to be local). If you wish to use an HTTP datasource instead, you can
    # change this to "net" or override it on the kernel cmdline (see README).
    dsmode: local

    # Specifies the "unique" identifier of the instance. Typically in cloud-init
    # this is generated by the owning cloud and is actually unique (to some
    # degree). Here our data-source is local, so this is just a fixed string.
    # Warning: changing this will cause cloud-init to assume it is running on a
    # "new" instance, and to go through first time setup again (the value is
    # compared to a cached copy).

    instance_id: rpi-123456789

    The user-data has instructions there to set a hostname for the new node of the form “node-serial#”. Install some packages, setup a ubuntu user. At the end it installs the k3s-agent and registers it with the control-plane node at 172.28.0.2. There’s also some node configuration to support k3s: setting the docker-registry, enabling cgroups on the kernel cmdline.txt and doing some sysctl changes to help support containerd.

    The php that fills that in looks like this cloud-init/user-data:

    #cloud-config
    <?php
    include_once('config.php');
    echo "hostname: $node_name\n";
    ?>
    preserve_hostname: false
    ...
    timezone: America/Los_Angeles
    <?php
    $cmd= "runcmd:\n";
    $cmd.=" - localectl set-x11-keymap \"us\" pc105\n";
    $cmd.=" - setupcon -k --force || true\n";
    $cmd.=" - sysctl --system\n";
    $cmd.=" - sed -i -e '1s/^\(.*\)$/\\1 cgroup_memory=1 cgroup_enable=memory/' /boot/firmware/cmdline.txt\n";
    if ($node_type == 'control-plane') {
    $cmd.=" - curl -sfL https://get.k3s.io | sh -\n";
    }
    else {
    $cmd.=" - curl -sfL https://get.k3s.io | K3S_URL=$cp_url K3S_TOKEN=$node_token sh -\n";
    }
    $cmd.=" - reboot\n";
    echo $cmd;
    ?>

    and similar in cloud-init/meta-data

    ...
    # Warning: changing this will cause cloud-init to assume it is running on a
    # "new" instance, and to go through first time setup again (the value is
    # compared to a cached copy).
    <?php
       include_once('config.php');
       echo "instance_id: $instance_id\n";
    ?>

    and cloud-init/config.php

    <?php 
    $cp_serial_number='6c...65f';

    // control plane ip is set by DHCP on the router
    $cp_url="https://172.28.0.2:6443";

    $node_token=trim(`/usr/bin/ssh -o StrictHostKeyChecking=no -q -i /var/www/cloud-init/id_rsa ubuntu@node-master sudo cat /var/lib/rancher/k3s/server/node-token`);

    $serial_number='12345';
    if (isset($_GET['serial_number'])) {
    $serial_number=$_GET['serial_number'];
    }
    $node_type='worker';
    if ($serial_number == $cp_serial_number) {
    $node_type='control-plane';
    }
    $node_name="node-$serial_number";
    $instance_id="rpi-$serial_number";
    ?>

    There’s some hard-coding in there for things like the control-plane node serial number (I’ve got 3 pi4s and the control-plane is a pi5). I get the node-token from the control-plane automatically with a ssh command. And the kubeconfig to access the cluster comes from /etc/ranger/k3s/k3s.yaml). The idea is you bring up the control-plane and once it starts answering ssh you can automatically roll the other nodes.

    Check The Nodes

    Finally, with the nodes initialize and the k3s.yaml brought in and edited to point at the control plane, I can check them with kubectl:

    sandy@bunsen:~$ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    node-100000001b5d3ab7 Ready <none> 4h23m v1.33.3+k3s1
    node-100000008d83d984 Ready <none> 5h4m v1.33.3+k3s1
    node-10000000e5fe589d Ready <none> 5h1m v1.33.3+k3s1
    node-master Ready control-plane,master 47h v1.33.3+k3s1

    Success!! Now what will I build in my pi cluster?…

    References

    • https://github.com/simsandyca/rpi-imager/tree/cli_boot_server
      • this is my fork of the v1.8.y branch with the embedded build – I’m just running the embedded/build.sh now and it definitely references Qt libs and headers so I’m thinking to build this you need to checkout the qml branch and run sudo ./build-qt.sh – then checkout the cli_boot_server branch and cd embedded; ./build.sh (probably all as root).
    • https://downloads.raspberrypi.org/net_install/
    • https://ubuntu.com/download/raspberry-pi
    • https://forums.raspberrypi.com/viewtopic.php?t=378943
      • I went down a whole rabbit hole with this post about the boot.img format and another project that makes a custom boot.img – It’s actually possible to download The Pi Foundation boot.img unpack it – then unpack the rootfs.cpio.zst within and get at the S99rpi-imager service script – edit that and repack it all – then re-sign the root.cpio.zst pack/unpack commands look sort of like this
        • losetup -f # note next available loop dev
        • losetup -P /dev/loop15 ./boot.img
        • mount /dev/loop15 /mnt/boot
        • mkdir rootfs
        • cd rootfs
        • zstdcat /mnt/boot/rootfs.cpio.zst | cpio -dmvi
        • vi etc/init.d/S99rpi-imager
        • … make your edits (you only get 32MB so don’t go too crazy – and the level must be the max:19 in the zstd compressor)…
        • find . -print0 | cpio -o --format=newc -0 | zstd -19 -z > ../rootfs.cpio.zst
        • cd ..
        • cp rootfs.cpio.zst /mnt/boot
        • umount boot
        • losetup -d /dev/loop15
        • ./rpi-eeprom-digest -k ~/keys/private.pem -i boot.img -o boot.sig
    • https://github.com/raspberrypi/rpi-system-update
      • this is the project ref’d in the above thread – I did build the boot images in that repo – and that’s how I figured out the format of the output directory and how to unpack and repack boot.img and rootfs.cpio.zst