Category: Engineering

Stuff I’m working on…

Gitops for the Arkade

In the last post I had built up some Makefiles and docker tooling to create nginx based images to serve the mame emulators. So, now it’s time to deploy the images to the kubernetes cluster using ArgoCD.

Github Actions

I setup a two self-hosted action runners to use with Github Actions. The action runner on my main server builds the emulators. Then the action runner on the in-cluster thinkpad can run deployments using argocd sync.

Then I setup a maker workflow and tested out it out.

name: Make
on:
  workflow_dispatch:

jobs:
  builder:
    environment: Games
    runs-on: www
    steps:
      - uses: actions/checkout@v4
      - name: Make the mamebuilder
        run: |
          make mamebuilder
          docker tag mamebuilder docker-registry:5000/mamebuilder
          docker push docker-registry:5000/mamebuilder
  games:
    environment: Games
    needs: [builder]
    runs-on: www
    steps:
      - uses: actions/checkout@v4
      - name: Make the game images
        run: |
          make
          echo done

After triggering the make workflow on github, the build runs and emulator images are pushed to my private docker registry – checking that from inside the cluster’s network shows:

sandy@bunsen:~/arkade/.github/workflows$ curl -X GET http://docker-registry:5000/v2/_catalog 
{"repositories":["1943mii","20pacgal","centiped","circus","defender","dkong","gng","invaders","joust","mamebuilder","milliped","pacman","qix","robby","supertnk","tempest","topgunnr","truxton","victory"]}

Helm Chart

Next I worked on a helm chart for the emulators. I created the standard helm chart:

mkdir helm
cd helm 
helm create game

On top of the default chart, I added a Traefik IngressRoute next to each deployment and a volume mount for the /var/www/html/roms directory. The final layout is like this:

sandy@bunsen:~/arkade$ tree helm roms-pvc.yaml 
helm
└── game
    ├── charts
    ├── Chart.yaml
    ├── templates
    │   ├── deployment.yaml
    │   ├── _helpers.tpl
    │   ├── hpa.yaml
    │   ├── ingressroute.yaml
    │   ├── ingress.yaml
    │   ├── NOTES.txt
    │   ├── serviceaccount.yaml
    │   ├── service.yaml
    │   └── tests
    │       └── test-connection.yaml
    └── values.yaml
roms-pvc.yaml  [error opening dir]

Most customization is in the values.yaml a diff of that against the default looks like this:

sandy@bunsen:~/arkade/helm/game$ diff values.yaml  ~/test/game/
25c25
<   create: false
---
>   create: true
76,78d75
< ingressroute:
<   enabled: true
< 
110,113c107,111
< volumes: 
<   - name: roms
<     persistentVolumeClaim:
<       claimName: roms
---
> volumes: []
> # - name: foo
> #   secret:
> #     secretName: mysecret
> #     optional: false
116,118c114,117
< volumeMounts: 
<   - name: roms
<     mountPath: /var/www/html/roms
---
> volumeMounts: []
> # - name: foo
> #   mountPath: "/etc/foo"
> #   readOnly: true

The Traefik IngressRoute object replaces the usual kubernetes ingress. I added a couple targets in the Makefile that can package the chart and deploy all the emulators:

package:
        $(HELM) package --version $(CHART_VER) helm/game

install: 
        @for game in $(GAMES) ; do \
            $(HELM) install $$game game-$(CHART_VER).tgz \
                --set image.repository="docker-registry:5000/$$game" \
                --set image.tag='latest' \
                --set fullnameOverride="$$game" \
                --create-namespace \
                --namespace games ;\
        done 

upgrade:  
        @for game in $(GAMES) ; do \
            $(HELM) upgrade $$game game-$(CHART_VER).tgz \
                --set image.repository="docker-registry:5000/$$game" \
                --set image.tag='latest' \
                --set fullnameOverride="$$game" \
                --namespace games ;\
        done

The initial install for all the emulators is make package install. Everything is installed into a namespace called games so it’s relatively to clean up the whole mess with kubect delete ns games. There was one adjustment I had to make in the helm chart…kubernetes enforces RFC 1035 for service names. I was using the game name as the service name, but games like 1943mii don’t conform. So I updated the service definitions in the chart like this: name: svc-{{ include "game.fullname" . }} to get around it.

ArgoCD Setup

ArgoCD is setup in the cluster based on the getting started guide. Ultimately, I made a little script argocd.sh to do the setup on repeat:

#!/bin/bash

ARGOCD_PASSWORD='REDACTED'

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

sleep 30 
kubectl -n argocd patch configmap/argocd-cmd-params-cm \
  --type merge \
  -p '{"data":{"server.insecure":"true"}}'

kubectl -n argocd apply -f argocd_IngressRoute.yaml

sleep 30
INITIAL_PASSWORD=$(argocd admin initial-password -n argocd 2>/dev/null | awk '{print $1; exit}')
argocd login argocd --username admin --insecure --skip-test-tls --password "${INITIAL_PASSWORD}"
argocd account update-password --account admin --current-password "${INITIAL_PASSWORD}" --new-password "${ARGOCD_PASSWORD}"

That installs ArgoCD into it’s own namespace and sets up an IngressRoute using Traefik. The ConfigMap patch is necessary to avoid redirect loops (Traefik proxies https at the LoadBalancer anyway). Then it goes on and logs in the admin user and updates the password.

Then added a couple more targets to the Makefile

argocd_create:
        $(KUBECTL) create ns games || true
        $(KUBECTL) apply -f roms-pvc.yaml
        @for game in $(GAMES) ; do \
            $(ARGOCD) app create $$game \
                --repo https://github.com/simsandyca/arkade.git \
                --path helm/game \
                --dest-server https://kubernetes.default.svc  \
                --dest-namespace games \
                --helm-set image.repository="docker-registry:5000/$$game" \
                --helm-set image.tag='latest' \
                --helm-set fullnameOverride="$$game" ;\
        done

argocd_sync:
        @for game in $(GAMES) ; do \
            $(ARGOCD) app sync $$game ; \
        done

And added a workflow in GitHub actions:

Ultimately I get this dashboard in ArgoCD with all the emulators running.

Sync is manual here, but the beauty of ArgoCD is that it tracks your git repo and can deploy the changes automatically.

Try It Out

To use the IngressRoute, I need to add DNS entries for each game. The pi-cluster is behind an old ASUS router running FreshTomato so I can add those using dnsmasq host-records:

Let’s try loading https://victory

Oh that’s right no game roms yet. To test it out I’d need a royalty free ROM – there’s a few available for personal use on the MAME Project page. To load victory.zip I can do something like this:

sandy@bunsen:~/arkade$ kubectl -n games get pod -l app.kubernetes.io/instance=victory 
NAME                       READY   STATUS    RESTARTS   AGE
victory-5d695d668c-tj7ch   1/1     Running   0          12m
sandy@bunsen:~/arkade$ kubectl -n games cp ~/roms/victory.zip victory-5d695d668c-tj7ch:/var/www/html/roms/@sandy

Then from the bunsen laptop that lives behind the router (+- the self-signed certificate) I can load the https://victory and click the launch button…

On to CI/CD?

So far, this feels like it’s about as far as I can take the Arkade project. Is it full on continuous delivery? No – but this is about all I’d attempt with a personal github account running self-hosted runners. It’s close though – the builds are automated, ArgoCD is configured and watching for changes in the repo. There are github actions in place to run the builds. I’d still need some versioning on the image tags to have rollback targets and release tracking…maybe that’s next

-Sandy

August 14, 2025

Automating the Arkade Build

Last entry I got my proof of concept arcade emulator going. I’m calling the project “arkade”. First thing was I setup a new github repo for the project. Last time I’d found that archive.org seems to use MAME version 0.239 and emsdk 3.0.0 – after some more trial and error I found that they actually use emsdk v3.1.0 (based on matching the llvm hash in the version string that gets dumped.

The MAME Builder Image

So I locked that into a Dockerfile:

FROM ubuntu:24.04

ENV EMSDK_VER=3.1.0
ENV MAME_VER=mame0239

RUN apt update \
    && DEBIAN_FRONTEND=noninteractive \
       apt -y install git build-essential python3 libsdl2-dev libsdl2-ttf-dev \
       libfontconfig-dev libpulse-dev qtbase5-dev qtbase5-dev-tools qtchooser qt5-qmake \
    && apt clean \
    && rm -rf /var/lib/apt/lists/*

# Build up latest copy of mame for -xmllist function
RUN git clone https://github.com/mamedev/mame --depth 1 \
    && make -C /mame -j $(nproc) OPTIMIZE=3 NOWERROR=1 TOOLS=0 REGENIE=1 \
    && install /mame/mame /usr/local/bin \
    && rm -rf /mame

#Setup to build WEBASSEMBLY versions 
RUN git clone  https://github.com/mamedev/mame --depth 1 --branch $MAME_VER \
    && git clone https://github.com/emscripten-core/emsdk.git \
    && cd emsdk \
    && ./emsdk install $EMSDK_VER \
    && ./emsdk activate $EMSDK_VER

ADD Makefile.docker /Makefile

WORKDIR /

RUN mkdir -p /output

The docker image also has a full build of the latest version of MAME (I’ll use that later). The last command sets up the MAME and Enscripten versions that seemed to work.

The Game Images

That put the tools in place I wanted to move onto building up little nginx docker images one for each arcade machine. To get that all going the image needs a few bits and pieces:

nginx webserver and basic configuration
the emularity launcher and support javascript
the emulator javascript (emulator.js and emulator.wasm)
the arcade rom to playback in the emulator

That collection of stuff looks like this as a Dockerfile:

FROM arm64v8/nginx

ADD nginx/default /etc/nginx/conf.d/default.conf

RUN mkdir -p /var/www/html /var/www/html/roms

ADD build/{name}/* /var/www/html

Couple things to see here. I’m using the arm64v8 version of nginx because I’m gonna want to run the images in my pi cluster. The system will buildup a set of files in build/{name} where name is the name of the game.

So I setup a Makefile that creates the build/<game> directory populated with all the bits and pieces. There’s a collection of meta data needed to render a game:

The name of the game
The emulator for the game
the width x height of the game

MAME can actually output all kinds of metadata about the games it emulates. To get access to that, I build the full version of the emulator binary so that I can run that mame -listxml <gamelist>. There’s a target in the Makefile that runs mame on the small list of games and outputs a file called list.xml. From that, there’s a couple python scripts that parse down the xml to use the metadata.

Ultimately the build directory for a game looks something like this:

sandy@www:~/arkade/build$ tree joust
joust
├── browserfs.min.js
├── browserfs.min.js.map
├── es6-promise.js
├── images
│   ├── 11971247621441025513Sabathius_3.5_Floppy_Disk_Blue_Labelled.svg
│   └── spinner.png
├── index.html
├── joust.json
├── loader.js
├── logo
│   ├── emularity_color.png
│   ├── emularity_color_small.png
│   ├── emularity_dark.png
│   └── emularity_light.png
├── mamewilliams.js
└── mamewilliams.wasm

3 directories, 14 files

And the index.html file looks like this:

sandy@www:~/arkade/build/joust$ cat index.html 

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>Joust (Green label)</title>
  </head>
  <body>
    <canvas id="canvas" style="width: 50%; height: 50%; display: block; margin: 0 auto;"></canvas>
    <script type="text/javascript" src="es6-promise.js"></script>
    <script type="text/javascript" src="browserfs.min.js"></script>
    <script type="text/javascript" src="loader.js"></script>
    <script type="text/javascript">
      var emulator = 
        new Emulator(document.querySelector("#canvas"),
                     null,
                     new MAMELoader(MAMELoader.driver("joust"),
                                    MAMELoader.nativeResolution(292, 240),
                                    MAMELoader.scale(3),
                                    MAMELoader.emulatorWASM("mamewilliams.wasm"),
                                    MAMELoader.emulatorJS("mamewilliams.js"),
                                             MAMELoader.extraArgs(['-verbose']),
                                             MAMELoader.mountFile("joust.zip",
                                                                  MAMELoader.fetchFile("Game File",
                                                                                       "/roms/joust.zip"))))
      emulator.start({ waitAfterDownloading: true });
    </script>
  </body>
</html>

The metadata shows up here in a few ways. The <title> field in the index.html is based on the game description. The nativeResolution is a function of the game display “rotation” and width/height. Pulling that information from the metadata helps get the game viewport size and aspect ratio correct. The name of the game is used to set the driver and rom name. There’s a separate driver field in the metadata which is actually the emulator name. For instance in this example, joust is a game you emulate with the williams emulator. Critically the emulator name is used to set the SOURCE= line in the mame build for the emulator.

There’s a m->n relationship between emulator and game (e.g. defender is also a williams game). That mapping is handled using jq and the build/<game>/<game>.json files.

Once the build directory is populated, it’s time to build the docker images. There’s a gamedocker.py script that writes a Dockerfile.<game> for each game. After that, it runs docker buildx build --platform linux/arm64... to build up the images.

Docker Multiplatform and Add a Registry

I do the building on my 8-core amd/64 server so I needed to do a couple things to get the images over to my pi cluster. First I had to setup docker multiplatform builds:

 docker run --privileged --rm tonistiigi/binfmt --install all

I also setup a small private docker registry using this docker-compose file

sandy@www:/sim/registry$ ls
config  data  docker-compose.yaml
sandy@www:/sim/registry$ cat docker-compose.yaml 
version: '3.3'

services:
  registry:
    container_name: registry
    restart: always
    image: registry:latest
    ports:
      - 5000:5000
    volumes:
      - ./config/config.yml:/etc/docker/registry/config.yml:ro
      - ./data:/var/lib/registry:rw
  #environment:
    #- "STANDALONE=true"
    #- "MIRROR_SOURCE=https://registry-1.docker.io"
    #- "MIRROR_SOURCE_INDEX=https://index.docker.io"

I also had to reconfigure the nodes in the pi cluster to use an insecure registry. To do that I added this bit to my cloud-init configuration:

write_files:
  - path: /etc/cloud/templates/hosts.debian.tmpl
    append: true
    content: |
      192.168.1.1 docker-registry www
  - path: /etc/rancher/k3s/registries.yaml
    content: |
      mirrors:
        "docker-registry:5000":
          endpoint:
            - "http://docker-registry:5000"
      configs:
        "docker-registry:5000":
          tls:
            insecure_skip_verify: true

Then I reinstalled all the nodes to pull that change.

Ultimately, I was able to test it out by creating a little deployment:

kubectl create deployment 20pacgal --image=docker-registry:5000/20pacgal --replicas=1 --port=80

sandy@bunsen:~$ kubectl get pods 
NAME                        READY   STATUS    RESTARTS   AGE
20pacgal-77b777866c-d4dhf   1/1     Running   0          86m
sandy@bunsen:~$ kubectl port-forward --address 0.0.0.0 20pacgal-77b777866c-d4dhf  8888:80 
Forwarding from 0.0.0.0:8888 -> 80
Handling connection for 8888
Handling connection for 8888

Almost there. The rom file is still missing – I’ll need to setup a physical volume to hold the roms…next time.

-Sandy

August 11, 2025

Building up MAME JS (Was – Deploy ?Something?)
You have a kubernetes cluster setup – now what?

I’ve had a lot of fun retro gaming with a retropie system I setup last year. Maybe I could setup a virtual arcade in my little kubernetes cluster…

My RetroPie setup in the garage.

I’m thinking I’ll try to get a build of the Multi Arcade Machine Emulator (MAME) working. I went and got the code for MAME and built up galaxian and pacman emulators and bippidy, boppady, blah, blah, blah!

Not!

The JavaScript Emulator Build

Building a copy of MAME to run pacman went fine, but I wanted the javascript build and that was much harder to get going – which was frustrating because it’s already out there on the Internet Archive working! I guess I could just go grab a copy of their javascripts, but I want some sort of repo to build off of so that I’ll be able to demo some gitops – like maybe ArgoCD.

Not sure, the javascript build seems like the ugly step child of MAME but the instructions didn’t work for me. Anyway, – not bitter – here’s what I did to get it working. It’s a pretty power hungry build, so I do it on my main server.

Start out by following the Compiling MAME instructions.
```
sudo apt-get install git build-essential python3 libsdl2-dev libsdl2-ttf-dev libfontconfig-dev libpulse-dev qtbase5-dev qtbase5-dev-tools qtchooser qt5-qmake
git clone https://github.com/mamedev/mame#
cd mame
make SUBTARGET=pacman SOURCES=src/mame/pacman/pacman.cpp TOOLS=0 REGENIE=1 -j16
```
That built up the emulator off the master branch – and it worked.

To get the js/wasm build going you are supposed to do something like:
- Install emsdk from https://emscripten.org/docs/getting_started/downloads.html
- Go get the Emularity project from https://github.com/db48x/emularity
- Setup emsdk and source the environment script source ~/emsdk/emsdk_env.sh
- Run the MAME build via emmake
- emmake SUBTARGET=pacjs SOURCES=src/mame/pacman/pacman.cpp REGENIE=1 -j16 WEBASSEMBLY=1
- That should output some files like pacjs.js pacjs.wasm and pacjs.html
It’s been a couple days since I started working on this…I’m pretty sure that initial build finished fine. But there’s a big difference between a finished build and a working build.

Setup nginx To Serve the MAME Emulator

You then have to setup a webserver and copy in most of the files from Emularity and create a html page to point to it all. Emularity is some javascript that makes it easier to launch the MAME emulator.
```
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>example arcade game</title>
  </head>
  <body>
    <canvas id="canvas" style="width: 50%; height: 50%; display: block; margin: 0 auto;"></canvas>
    <script type="text/javascript" src="es6-promise.js"></script>
    <script type="text/javascript" src="browserfs.min.js"></script>
    <script type="text/javascript" src="loader.js"></script>
    <script type="text/javascript">
      var emulator = 
        new Emulator(document.querySelector("#canvas"),
                     null,
                     new MAMELoader(MAMELoader.driver("pacman"),
                            MAMELoader.nativeResolution(224, 256),
                            MAMELoader.scale(3),
                            MAMELoader.emulatorWASM("emulator/pacman.wasm"),
                            MAMELoader.emulatorJS("emulator/pacman.js"),
                            MAMELoader.mountFile("emulator/pacman.zip",
                                        MAMELoader.fetchFile("Game File",
                                                             "/roms/pacman.zip"))))
      emulator.start({ waitAfterDownloading: true });
    </script>
  </body>
</html>
```
Back to The Build

Great but it didn’t load… It just crashed out with an Uncaught (in promise) error (with some random addr).

There isn’t a ton of info on this build system. I did find this post that implied that there’s some version matching that has to go on. The javascript version is basically the binary build of a C++ application massaged into web assembly by emscripten.

I hacked around for a couple days trying to add symbols and debug in the code and trying to get a break point in the javascript console. Ultimately, I kinda cheated and just tried to have a look at what the Internet Archive had done.

If you look in the console on a successful run you can see that the MAME emulator is run with -verbose on so you get a dump of a couple things:

Version Trouble

Critically, they’re running the build on version 0.239 of MAME. Figuring out the emsdk version was a little harder – but I could see they are running Clang 14.0.0 from llvm tools. You can run ./emsdk list to list the emscripten versions. Ultimately by playing with it a bit (sort of loop over testing different versions of emcc – ./emsdk install 2.0.0; ./emsdk activate 2.0.0; source ./emsdk_env.sh; emcc -v) I settled on version 3.0.0 which had Clang 14.0.0. There’s tags in the MAME repo for each version so to get my build working I did this:
```
cd ~/emsdk
./emsdk install 3.0.0
./emsdk activate 3.0.0
source ~/emsdk_env.sh 
emcc -v
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.0.0 (3fd52e107187b8a169bb04a02b9f982c8a075205)
clang version 14.0.0 (https://github.com/llvm/llvm-project 4348cd42c385e71b63e5da7e492172cff6a79d7b)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /home/sandy/emsdk/upstream/bin

cd ~/mame
git checkout mame0239
./buildpacman.sh REGENIE=1
```
Where my buildpacman.sh script looked like this:
```
#!/bin/bash 

subtarget=pacman

emmake make -j 16 SUBTARGET=$subtarget SOURCES=src/mame/drivers/pacman.cpp  ARCHOPTS=" -s EXCEPTION_CATCHING_ALLOWED=[instantiateArrayBuffer,instantiateAsync,createWasm] " WEBASSEMBLY=1 OPTIMIZE=s  $@
```
There’s a couple things to mention here:
- First is that in this older version of MAME, the SOURCES are in a different place src/mame/drivers/pacman.cpp instead of src/mame/pacman/pacman.cpp
- Next, the EXCEPTION_CATCHING_ALLOWED clause was required. To get a list of functions where catching is allowed, I had to enable DEBUG=1 and SYMBOLS=1 and OPTIMIZE=0 to get a better trace of the stack on that crash. That bit is probably the biggest part of the fix. It seems like there’s some async loading of the webassembly going on. That exception needs to be caught (and likely ignored) so that a retry can be attempted
- The default compiler optimization level in the MAME build is OPTIMIZE=3 – I found pacman a little choppy at that level, so I increased it to OPTIMIZE=s – it runs smooth and makes the download smaller too.
So now it’s working at least as a proof of concept. Hooray.

In the image I’m just running nginx on my pink macbook air the web site files look like this:

Then the final pacman.html that runs the loader looks like this:

The extraArgs call was a good find too. You can pass args like -showconfig in there which help you trouble shoot where to put the rom file etc.

I know I started by saying I was gonna deploy something into a kubernetes cluster – but I’m a little fried from the hacking I’ve done already. All in all, it was a lot of fun hacking on a C++ / WebAssembly build. Next time I’ll go onto automating the build for a couple games and dockerizing the results.

-Sandy
August 7, 2025
Roll Your Own Raspberry Pi Kubernetes Cluster
I always seem to have a drawer-full of Raspberry Pi’s in the garage that aren’t doing anything, so I thought I’d hook them to an old router and try to make them into a kubernetes cluster. I’ve got an old think pad in the mix I’m hoping to network that in and set it up as a boot server using the rpi-imager and cloud-init.

A side goal was to have the nodes auto deploy sort of like a mini cloud provider. To head towards auto install I took a cue from the rpi-imager. When you plugin a rpi with a blank SD card it will run a little code from eeprom and after looking around for something to boot will eventually hook eth0 to the network using DHCP. Then it downloads a boot.img from downloads.raspberrypi.org.

That boot.img contains the rpi-imager binary so after a reboot the imager appears with a preconfigured list of rpi hardware, OS types and install target. – Cool – but not very automatic and I don’t really want to pull the node images from the internet everytime. Also when I go to redeploy, I’d rather not have to pull out each SD card and wipe it first.

I sort of settled on running ubuntu/noble and wanted to use cloud-init to configure a couple things like unique hostname and timezone and to install k3s-agent on the nodes.

HTTP_BOOT

The little bit of magic that hooks up the DHCP network connection and pulls the rpi-imager boot.img is known as HTTP_BOOT. It’s a bit of code and settings that live in the eeprom of the pi. You can update the eeprom with a custom URL for HTTP_BOOT something like this boot.conf file:
```
[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
BOOT_ORDER=0xf461

[gpio8=0]
SIGNED_BOOT=1
BOOT_ORDER=0xf7
NET_INSTALL_ENABLED=1
HTTP_HOST=172.28.0.3
HTTP_PATH=rpi-imager
```
The [gpio8=0] block tells the pi to pull a boot.img from http://172.28.0.3/rpi-imager (but only when the general purpose I/O pin #8 is grounded – more on that later). HTTP_BOOT actually pulls two files from that URL – the boot.img and a signature file called boot.sig.

Out of Band Control

My “cluster” is a board with an old router zip-tied down and three rpi4 mounted on finish nails I drove through the plywood. I ran jumpers from GPIO8 and the GRND pins on the pis to a bread-board with momentary buttons. To reboot a pi4 you can ground the “RUN” pad in the J2 jumper block (I’ve got a ground wire with a Dupont prong on it to touch the run pad).

To cause an automatic reinstall, I hold down one of the buttons and ground the run pad of the corresponding pi. I could automate my out of band controller with relays and maybe something like an esp32 to drive it.

Signed Boot Objects

When the pi boots up and pulls boot.img it doesn’t just trust that random code from the internet it checks the boot.img against boot.sig. The eeprom contains a public key and boot.sig was created using the Raspberry Pi Foundation’s private key.

To make a custom boot.img or even a boot.conf you need your own key pair and run extra commands to sign things. So let’s do that and sign the boot.conf:
```
mkdir ~/keys
cd ~/keys
openssl genrsa -out private.pem 2048
openssl rsa -in private.pem -outform PEM -pubout -out public.pem
cd -
rpi-eeprom-digest -k ~/keys/private.pem -i boot.conf -o boot.conf.sig
cp ~/keys/public.pem boot.conf boot.conf.sig /var/www/html
```
Here I’ve used the private.pem key to create a signature file for boot.conf (boot.conf.sig). Then I copied the public key, and boot.conf files onto a nginx server I’m using as my boot server (172.28.0.3).

Updating the eeprom on the pi

I log into one of the pi computers and update the eeprom like this:
```
wget 172.28.0.3/boot.conf
wget 172.28.0.3/boot.conf.sig
wget 172.28.0.3/public.pem

sudo rpi-eeprom-config -a boot.conf
# note the pieeprom-*.bin filename
cp /lib/firmware/raspberrypi/bootloader-2711/default/pieeprom-2024-04-15.bin pieeprom.bin

sudo apt install python3-pycryptodome
rpi-eeprom-config -p public.pem -c boot.conf -d boot.conf.sig -o pieeprom.upd pieeprom.bin
sudo rpi-eeprom-config -a boot.conf pieeprom.upd
```
Making a Custom boot.img

I cloned in the rpi-imager code from github and built it up following the instructions in the README.md. The build takes a long time (maybe an hour). Ultimately, the repo builds an AppImage version of the general portable imager – which wasn’t what I wanted. Confusingly, the README.md mentions the “embedded” version of the imager, but the default git branch ‘qml’ doesn’t support the embedded build.

To build the embedded version I had to git checkout v1.8.y first. All in all, it goes something like this:
```
git clone https://github.com/raspberrypi/rpi-imager.git
cd rpi-imager
sudo ./build-qt.sh # and wait
git checkout v1.8.y
cd embedded
./build.sh 
```
Honestly I’m not sure if the initial ./build-qt.sh is required – likely since that builds up the arm64 version of the Qt libs needed for the imager interface. Ultimately the embedded/build.sh should create an output directory with a minimal linux OS. I found it was just easier to do most of the embeded build stuff as root user. Don’t be a dolt like me and delete the output directory. There’s a cmdline.txt file in there if you delete it, the kernel can’t boot. Pro-tip, you can remove the quiet cmdline arg to see how the kernel boots up.

Packup the boot.img

The output directory needs to be packed into a FAT32 file system. I wrote a script called pack.sh to do that.
```
#!/bin/bash 

LOOP=$(losetup -f)
echo "LOOP is $LOOP"
if [[ ! -f boot.img ]]
then
   dd if=/dev/zero of=boot.img bs=1M count=36
   losetup -P $LOOP boot.img
   mkfs.vfat $LOOP
else
   losetup -P $LOOP boot.img
fi
echo "mount $LOOP boot"
mount $LOOP boot
cd output
cp -rp * ../boot
cd -
umount boot
losetup -d $LOOP

./rpi-eeprom-digest -k ../private.pem -i boot.img -o boot.sig

scp  -i /home/sandy/.ssh/id_rsa boot.img boot.sig sandy@bunsen:/var/www/html/rpi-imager/
```
pack.sh makes a 36MB boot.img file and then binds it to a loop device. The loop device is mounted as a directory called boot and the contents of output are copied in. Then the script signs the boot.img file with the private key from before. Then I copy those to the boot server.

Hacking the rpi-image Service Script

Great, but that’s just the same rpi-imager signed with a different key. To customize it I needed to learn a bit about buildroot. The boot.img does a minimal boot and then starts rpi-imager as a service. The service startup script is in a buildroot overlay rpi-imager/embedded/imager/board/overlay/etc/init.d/S99-rpi-imager.
So I hacked on that:
```
#!/bin/sh

#
# Script executed at start
#

case "$1" in
  start)
    #udevd --daemon
    #udevadm trigger
    INTERVAL=10
    # Polling interval in seconds

    echo "Waiting for eth0 to connect..."

    while true; do
        # Check if eth0 is up using /sys/class/net
        if [ -d /sys/class/net/eth0 ] && [ "$(cat /sys/class/net/eth0/operstate)" = "up" ]; then
            echo "eth0 is now connected!"
            break
        else
            echo "eth0 is down, checking again in $INTERVAL seconds..."
            sleep $INTERVAL
        fi
    done
    ifconfig

    echo Starting rpi-imager
    UBUNTU_IMAGE=$(wget -q  -O -  http://172.28.0.3/ubuntu/latest)
    PATH=/bin:/sbin:/usr/bin:/usr/sbin rpi-imager --cli http://172.28.0.3/ubuntu/${UBUNTU_IMAGE} /dev/mmcblk0

    sync
    mkdir -p /bt
    mount /dev/mmcblk0p1 /bt
    SERIAL_NUMBER=$(cat /proc/cpuinfo | awk '/Serial/{print $3; exit}')
    wget -O /bt/meta-data http://172.28.0.3/cloud-init/meta-data?serial_number=${SERIAL_NUMBER}
    wget -O /bt/user-data http://172.28.0.3/cloud-init/user-data?serial_number=${SERIAL_NUMBER}
    umount /bt
    rm -rf /bt
    sync
    reboot -f
    ;;

  stop)
    ;;

  *)
    echo "Usage: $0 {start|stop}"
    exit 1
    ;;
esac

exit $?
```
This version of the “service” runs rpi-imager in --cli mode with two arguments: a URL for an image to etch on the SD card, and the device file of the card (/dev/mmcblk0). The URL points back to my boot server. I had to add a block to wait for the network to connect – if you just run the imager without the network it fails (at least in cli mode). There’s two partitions in the ubuntu image: a boot partion (mmcblk0p1) and the main system partition (mmcblk0p2). The wget for latest makes the Ubuntu version variable (I just update that on the webserver when I add a new image file).

The script goes on to mount the newly created boot partition and drops files called user-data and meta-data which are used by cloud-init. The .../cloud-init/... URLs are actually php scripts on the boot server that take a device serial number as an argument. That lets me customize the hostname and instance_id of the node using the pi serial number as a unique identifier.

Finally, the service script tells the pi to reboot. When it reboots, it will use the [all]boot.conf (because I’ve released the button that grounded GPIO 8). Since there’s a freshly populated version of Ubuntu on the SD card it boots that.

cloud-init custom user-data / meta-data

If you put a file called user-data in the boot partition of that ubuntu/noble server image, then cloud-init will run on first boot and apply any changes specified in the file. Here’s what my finished user-data and meta-data look like:
```
#cloud-config

hostname: node-123456789
preserve_hostname: false

package_update: false
package_upgrade: false

manage_etc_hosts: true
packages:
  - avahi-daemon
  - apt-transport-https
  - ca-certificates
  - curl
  - gnupg
  - lsb-release
  - unattended-upgrades
  - net-tools
  - nfs-common
  - cifs-utils
  - open-iscsi

apt:
  conf: |
    Acquire {
      Check-Date "false";
    };

users:
- default 
- name: ubuntu
  groups: users,adm,dialout,audio,netdev,video,plugdev,cdrom,games,input,gpio,spi,i2c,render,sudo
  shell: /bin/bash
  lock_passwd: true
  passwd: $5$q...WG6
  ssh_authorized_keys:
    - ssh-rsa AAAAB3N...r8VBj9ERGEu/9M= sandy@bunsen
  sudo: ALL=(ALL) NOPASSWD:ALL

timezone: America/Los_Angeles

write_files:
  - path: /etc/cloud/templates/hosts.debian.tmpl 
    append: true
    content: |
      192.168.1.1 docker-registry www 
  - path: /etc/rancher/k3s/registries.yaml
    content: |
      mirrors:
        "docker-registry:5000":
          endpoint:
            - "http://docker-registry:5000"
      configs:
        "docker-registry:5000":
          tls:
            insecure_skip_verify: true
  - path: /etc/sysctl.d/99-k3s.conf
    permissions: "0644"
    content: |
      fs.inotify.max_user_instances=512
      fs.inotify.max_user_watches=524288
  - path: /etc/rancher/k3s/config.yaml
    content: |
      node-label:
        - "model=Pi5"
        - "serial_number=6c1a0ae425e8665f" 

runcmd:
  - localectl set-x11-keymap "us" pc105
  - setupcon -k --force || true
  - sysctl --system
  - sed -i -e '1s/^$.*$$/\1 cgroup_memory=1 cgroup_enable=memory/' /boot/firmware/cmdline.txt
  - curl -sfL https://get.k3s.io | K3S_URL=https://172.28.0.2:6443 K3S_TOKEN=K1...4::server:2aeeb60...59f0 sh -
  - reboot
```
```
# This is the meta-data configuration file for cloud-init. Please refer to the
# cloud-init documentation for more information:
#
# https://cloudinit.readthedocs.io/

# Set the datasource mode to "local". This ensures that user-data is acted upon
# prior to bringing up the network (because everything about the datasource is
# assumed to be local). If you wish to use an HTTP datasource instead, you can
# change this to "net" or override it on the kernel cmdline (see README).
dsmode: local

# Specifies the "unique" identifier of the instance. Typically in cloud-init
# this is generated by the owning cloud and is actually unique (to some
# degree). Here our data-source is local, so this is just a fixed string.
# Warning: changing this will cause cloud-init to assume it is running on a
# "new" instance, and to go through first time setup again (the value is
# compared to a cached copy).

instance_id: rpi-123456789
```
The user-data has instructions there to set a hostname for the new node of the form “node-serial#”. Install some packages, setup a ubuntu user. At the end it installs the k3s-agent and registers it with the control-plane node at 172.28.0.2. There’s also some node configuration to support k3s: setting the docker-registry, enabling cgroups on the kernel cmdline.txt and doing some sysctl changes to help support containerd.

The php that fills that in looks like this cloud-init/user-data:
```
#cloud-config
<?php
   include_once('config.php');
   echo "hostname: $node_name\n";
?>
preserve_hostname: false
...
timezone: America/Los_Angeles

write_files:
...
  - path: /etc/rancher/k3s/config.yaml
    content: |
      node-label:
<?php
echo "        - \"model=$model\"\n";
echo "        - \"serial_number=$serial_number\"\n";
?>

<?php 
   $cmd= "runcmd:\n";
   $cmd.="  - localectl set-x11-keymap \"us\" pc105\n";
   $cmd.="  - setupcon -k --force || true\n";
   $cmd.="  - sysctl --system\n";
   $cmd.="  - sed -i -e '1s/^$.*$$/\\1 cgroup_memory=1 cgroup_enable=memory/' /boot/firmware/cmdline.txt\n";
   if ($node_type == 'control-plane') {
      $cmd.="  - curl -sfL https://get.k3s.io | sh -\n";
   }
   else {
      $cmd.="  - curl -sfL https://get.k3s.io | K3S_URL=$cp_url K3S_TOKEN=$node_token sh -\n";
   } 
   $cmd.="  - reboot\n";
   echo $cmd;
?>
```
and similar in cloud-init/meta-data
```
...
# Warning: changing this will cause cloud-init to assume it is running on a
# "new" instance, and to go through first time setup again (the value is
# compared to a cached copy).
<?php
   include_once('config.php');
   echo "instance_id: $instance_id\n";
?>
```
and cloud-init/config.php
```
<?php 
   $cp_serial_number='6c...65f';

   // control plane ip is set by DHCP on the router
   $cp_url="https://172.28.0.2:6443";
    
   $node_token=trim(`/usr/bin/ssh -o StrictHostKeyChecking=no -q -i /var/www/cloud-init/id_rsa ubuntu@node-master sudo cat /var/lib/rancher/k3s/server/node-token`);

   $serial_number='12345';
   if (isset($_GET['serial_number'])) {
      $serial_number=$_GET['serial_number'];
   }
   $model='Pi4';
   if (isset($_GET['model'])) {
      $model=$_GET['model'];
   }
   $node_type='worker';
   if ($serial_number == $cp_serial_number) {
      $node_type='control-plane';
   }
   $node_name="node-$serial_number";
   $instance_id="rpi-$serial_number";
?>
```
There’s some hard-coding in there for things like the control-plane node serial number (I’ve got 3 pi4s and the control-plane is a pi5). I get the node-token from the control-plane automatically with a ssh command. And the kubeconfig to access the cluster comes from /etc/ranger/k3s/k3s.yaml). The idea is you bring up the control-plane and once it starts answering ssh you can automatically roll the other nodes.

Check The Nodes

Finally, with the nodes initialize and the k3s.yaml brought in and edited to point at the control plane, I can check them with kubectl:
```
sandy@bunsen:~$ kubectl get nodes
NAME                    STATUS   ROLES                  AGE     VERSION
node-100000001b5d3ab7   Ready    <none>                 4h23m   v1.33.3+k3s1
node-100000008d83d984   Ready    <none>                 5h4m    v1.33.3+k3s1
node-10000000e5fe589d   Ready    <none>                 5h1m    v1.33.3+k3s1
node-master             Ready    control-plane,master   47h     v1.33.3+k3s1
```
Success!! Now what will I build in my pi cluster?…

References
- https://github.com/simsandyca/rpi-imager/tree/cli_boot_server
  - this is my fork of the v1.8.y branch with the embedded build – I’m just running the embedded/build.sh now and it definitely references Qt libs and headers so I’m thinking to build this you need to checkout the qml branch and run sudo ./build-qt.sh – then checkout the cli_boot_server branch and cd embedded; ./build.sh (probably all as root).
- https://downloads.raspberrypi.org/net_install/
- https://ubuntu.com/download/raspberry-pi
- https://forums.raspberrypi.com/viewtopic.php?t=378943
  - I went down a whole rabbit hole with this post about the boot.img format and another project that makes a custom boot.img – It’s actually possible to download The Pi Foundation boot.img unpack it – then unpack the rootfs.cpio.zst within and get at the S99rpi-imager service script – edit that and repack it all – then re-sign the root.cpio.zst pack/unpack commands look sort of like this
    
    losetup -f # note next available loop dev
    
    losetup -P /dev/loop15 ./boot.img
    
    mount /dev/loop15 /mnt/boot
    
    mkdir rootfs
    
    cd rootfs
    
    zstdcat /mnt/boot/rootfs.cpio.zst | cpio -dmvi
    
    vi etc/init.d/S99rpi-imager
    
    … make your edits (you only get 32MB so don’t go too crazy – and the level must be the max:19 in the zstd compressor)…
    
    find . -print0 | cpio -o --format=newc -0 | zstd -19 -z > ../rootfs.cpio.zst
    
    cd ..
    
    cp rootfs.cpio.zst /mnt/boot
    
    umount boot
    
    losetup -d /dev/loop15
    
    ./rpi-eeprom-digest -k ~/keys/private.pem -i boot.img -o boot.sig
- https://github.com/raspberrypi/rpi-system-update
  - this is the project ref’d in the above thread – I did build the boot images in that repo – and that’s how I figured out the format of the output directory and how to unpack and repack boot.img and rootfs.cpio.zst
- Node labels for serial_number and model were a later addition when I added grafana monitoring. You add node labels for the k3s nodes in /etc/rancher/k3s/config.yml. A couple of the prometheus/grafana pods use aprox. 1GB ram – which pretty much swamps any of the Pi4 nodes (these are only 2GB). Setting node labels let me set node affinity so that those larger deployments end up on a Pi5 (one has 4GB and the other is 8GB).
August 1, 2025