From 1262e03e218e6346c6897a92c3276bba7de6618a Mon Sep 17 00:00:00 2001 From: Petru Paler Date: Tue, 21 Oct 2025 00:05:44 +0100 Subject: [PATCH] Cluster changes writeup. --- docs/CLUSTER_REVAMP.md | 1717 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1717 insertions(+) create mode 100644 docs/CLUSTER_REVAMP.md diff --git a/docs/CLUSTER_REVAMP.md b/docs/CLUSTER_REVAMP.md new file mode 100644 index 0000000..1a96d4c --- /dev/null +++ b/docs/CLUSTER_REVAMP.md @@ -0,0 +1,1717 @@ +# Cluster Architecture Revamp + +**Status**: Planning complete, ready for review and refinement + +## Key Decisions + +✅ **Replication**: 5-minute intervals (incremental btrfs send) +✅ **WordPress**: Currently syncthing → will use `/data/services` via NFS +✅ **Media**: Only media.hcl needs `/data/media`, constrained to fractal +✅ **Unifi**: Floating (no constraint needed) +✅ **Sunny**: Standalone, ethereum data stays local (not replicated) +✅ **Quorum**: 5 servers (c1, c2, c3, fractal, zippy) +✅ **NFS Failover**: Via Consul DNS (`services.service.consul`) + +## Table of Contents +1. [End State Architecture](#end-state-architecture) +2. [Migration Steps](#migration-steps) +3. [Service Catalog](#service-catalog) +4. [Failover Procedures](#failover-procedures) + +--- + +## End State Architecture + +### Cluster Topology + +**5-Server Quorum (Consul + Nomad server+client):** +- **c1, c2, c3**: Cattle nodes - x86_64, run most stateless workloads +- **fractal**: Storage node - x86_64, 6x spinning drives, runs media workloads +- **zippy**: Stateful anchor - x86_64, runs database workloads (via affinity), primary NFS server + +**Standalone Nodes (not in quorum):** +- **sunny**: x86_64, ethereum node + staking, base NixOS configs only +- **chilly**: x86_64, Home Assistant VM, base NixOS configs only + +**Quorum Math:** +- 5 servers → quorum requires 3 healthy nodes +- Can tolerate 2 simultaneous failures +- Bootstrap expect: 3 + +### Storage Architecture + +**Primary Storage (zippy):** +- `/persist/services` - btrfs subvolume + - Contains: mysql, postgres, redis, clickhouse, mongodb, app data + - Exported via NFS to: `services.service.consul:/persist/services` + - Replicated via **btrfs send** to c1 and c2 every **5 minutes** (incremental) + +**Standby Storage (c1, c2):** +- `/persist/services-standby` - btrfs subvolume + - Receives replicated snapshots from zippy via incremental btrfs send + - Can be promoted to `/persist/services` and exported as NFS during failover + - Maximum data loss: **5 minutes** (last replication interval) + +**Standalone Storage (sunny):** +- `/persist/ethereum` - local btrfs subvolume (or similar) + - Contains: ethereum blockchain data, staking keys + - **NOT replicated** - too large/expensive to replicate full ethereum node + - Backed up via kopia to fractal (if feasible/needed) + +**Media Storage (fractal):** +- `/data/media` - existing spinning drive storage + - Exported via Samba (existing) + - Mounted on c1, c2, c3 via CIFS (existing) + - Local access on fractal for media workloads + +**Shared Storage (fractal):** +- `/data/shared` - existing spinning drive storage + - Exported via Samba (existing) + - Mounted on c1, c2, c3 via CIFS (existing) + +### Network Services + +**NFS Primary (zippy):** +```nix +services.nfs.server = { + enable = true; + exports = '' + /persist/services 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash) + ''; +}; + +services.consul.extraConfig.services = [{ + name = "services"; + port = 2049; + checks = [{ tcp = "localhost:2049"; interval = "30s"; }]; +}]; +``` + +**NFS Client (all nodes):** +```nix +fileSystems."/data/services" = { + device = "services.service.consul:/persist/services"; + fsType = "nfs"; + options = [ "x-systemd.automount" "noauto" "x-systemd.idle-timeout=60" ]; +}; +``` + +**Samba Exports (fractal - existing):** +- `//fractal/media` → `/data/media` +- `//fractal/shared` → `/data/shared` + +### Nomad Job Placement Strategy + +**Affinity-based (prefer zippy, allow c1/c2):** +- mysql, postgres, redis - stateful databases +- Run on zippy normally, can failover to c1/c2 if zippy down + +**Constrained (must run on fractal):** +- **media.hcl** - radarr, sonarr, bazarr, plex, qbittorrent + - Reason: Heavy /data/media access, benefits from local storage +- **prometheus.hcl** - metrics database with 30d retention + - Reason: Large time-series data, spinning disks OK, saves SSD space +- **loki.hcl** - log aggregation with 31d retention + - Reason: Large log data, spinning disks OK +- **clickhouse.hcl** - analytics database for plausible + - Reason: Large time-series data, spinning disks OK + +**Floating (can run anywhere on c1/c2/c3/fractal/zippy):** +- All other services including: + - traefik, authentik, web apps + - **grafana** (small data, just dashboards/config, queries prometheus for metrics) + - databases (mysql, postgres, redis) + - vector (system job, runs everywhere) +- Nomad schedules based on resources and constraints + +### Data Migration + +**Path changes needed in Nomad jobs:** +- `/data/compute/appdata/*` → `/data/services/*` +- `/data/compute/config/*` → `/data/services/*` +- `/data/sync/wordpress` → `/data/services/wordpress` + +**No changes needed:** +- `/data/media/*` - stays the same (CIFS mount from fractal, used only by media services) +- `/data/shared/*` - stays the same (CIFS mount from fractal) + +**Deprecated after migration:** +- `/data/sync/wordpress` - currently managed by syncthing to avoid slow GlusterFS + - Will be replaced by NFS mount at `/data/services/wordpress` + - Syncthing configuration for this can be removed + - Final sync: copy from syncthing to `/persist/services/wordpress` on zippy before cutover + +--- + +## Migration Steps + +**Important path simplification note:** +- All service paths use `/data/services/*` directly (not `/data/services/appdata/*`) +- Example: `/data/compute/appdata/mysql` → `/data/services/mysql` +- Simpler, cleaner, easier to manage + +### Phase 0: Preparation +**Duration: 1-2 hours** + +1. **Backup everything** + ```bash + # On all nodes, ensure kopia backups are current + kopia snapshot list + + # Backup glusterfs data manually + rsync -av /data/compute/ /backup/compute-pre-migration/ + ``` + +2. **Document current state** + ```bash + # Save current nomad job list + nomad job status -json > /backup/nomad-jobs-pre-migration.json + + # Save consul service catalog + consul catalog services > /backup/consul-services-pre-migration.txt + ``` + +3. **Review this document** + - Verify all services are cataloged + - Confirm priority assignments + - Adjust as needed + +### Phase 1: Convert fractal to NixOS +**Duration: 6-8 hours** + +**Current state:** +- Proxmox on ZFS +- System pool: `rpool` (~500GB, will be wiped) +- Data pools (preserved): + - `double1` - 3.6T (homes, shared) + - `double2` - 7.2T (backup - kopia repo, PBS) + - `double3` - 17T (media, torrent) +- Services: Samba (homes, shared, media), Kopia server, PBS +- Bind mounts: `/data/{homes,shared,media,torrent}` → ZFS datasets + +**Goal:** Fresh NixOS on rpool, preserve data pools, join cluster + +#### Step-by-step procedure: + +**1. Pre-migration documentation** + ```bash + # On fractal, save ZFS layout + cat > /tmp/detect-zfs.sh << 'EOF' +#!/bin/bash +echo "=== ZFS Pools ===" +zpool status + +echo -e "\n=== ZFS Datasets ===" +zfs list -o name,mountpoint,used,avail,mounted -r double1 double2 double3 + +echo -e "\n=== Bind mounts ===" +cat /etc/fstab | grep double + +echo -e "\n=== Data directories ===" +ls -la /data/ + +echo -e "\n=== Samba users/groups ===" +getent group shared compute +getent passwd compute +EOF + chmod +x /tmp/detect-zfs.sh + ssh fractal /tmp/detect-zfs.sh > /backup/fractal-zfs-layout.txt + + # Save samba config + scp fractal:/etc/samba/smb.conf /backup/fractal-smb.conf + + # Save kopia certs and config + scp -r fractal:~/kopia-certs /backup/fractal-kopia-certs/ + scp fractal:~/.config/kopia/repository.config /backup/fractal-kopia-repository.config + + # Verify kopia backups are current + ssh fractal "kopia snapshot list --all" + ``` + +**2. Stop services on fractal** + ```bash + ssh fractal "systemctl stop smbd nmbd kopia" + # Don't stop PBS yet (in case we need to restore) + ``` + +**3. Install NixOS** + - Boot NixOS installer USB + - **IMPORTANT**: Do NOT touch double1, double2, double3 during install! + - Install only on `rpool` (or create new pool if needed) + + ```bash + # In NixOS installer + # Option A: Reuse rpool (wipe and recreate) + zpool destroy rpool + + # Option B: Use different disk if available + # Then follow standard NixOS btrfs install on that disk + ``` + + - Use standard encrypted btrfs layout (matching other hosts) + - Minimal install first, will add cluster configs later + +**4. First boot - import ZFS pools** + ```bash + # SSH into fresh NixOS install + + # Import pools (read-only first, to be safe) + zpool import -f -o readonly=on double1 + zpool import -f -o readonly=on double2 + zpool import -f -o readonly=on double3 + + # Verify datasets + zfs list -r double1 double2 double3 + + # Example output should show: + # double1/homes + # double1/shared + # double2/backup + # double3/media + # double3/torrent + + # If everything looks good, export and reimport read-write + zpool export double1 double2 double3 + zpool import double1 + zpool import double2 + zpool import double3 + + # Set ZFS mountpoints (if needed) + # These may already be set from Proxmox + zfs set mountpoint=/double1 double1 + zfs set mountpoint=/double2 double2 + zfs set mountpoint=/double3 double3 + ``` + +**5. Create fractal NixOS configuration** + ```nix + # hosts/fractal/default.nix + { config, pkgs, ... }: + { + imports = [ + ../../common/encrypted-btrfs-layout.nix + ../../common/global + ../../common/cluster-node.nix # Consul + Nomad (will add in step 7) + ../../common/nomad.nix # Both server and client + ./hardware.nix + ]; + + networking.hostName = "fractal"; + + # ZFS support + boot.supportedFilesystems = [ "zfs" ]; + boot.zfs.extraPools = [ "double1" "double2" "double3" ]; + + # Ensure ZFS pools are imported before mounting + systemd.services.zfs-import.wantedBy = [ "multi-user.target" ]; + + # Bind mounts for /data (matching Proxmox setup) + fileSystems."/data/homes" = { + device = "/double1/homes"; + fsType = "none"; + options = [ "bind" "x-systemd.requires=zfs-mount.service" ]; + }; + + fileSystems."/data/shared" = { + device = "/double1/shared"; + fsType = "none"; + options = [ "bind" "x-systemd.requires=zfs-mount.service" ]; + }; + + fileSystems."/data/media" = { + device = "/double3/media"; + fsType = "none"; + options = [ "bind" "x-systemd.requires=zfs-mount.service" ]; + }; + + fileSystems."/data/torrent" = { + device = "/double3/torrent"; + fsType = "none"; + options = [ "bind" "x-systemd.requires=zfs-mount.service" ]; + }; + + fileSystems."/backup" = { + device = "/double2/backup"; + fsType = "none"; + options = [ "bind" "x-systemd.requires=zfs-mount.service" ]; + }; + + # Create data directory structure + systemd.tmpfiles.rules = [ + "d /data 0755 root root -" + ]; + + # Users and groups for samba + users.groups.shared = { gid = 1001; }; + users.groups.compute = { gid = 1002; }; + users.users.compute = { + isSystemUser = true; + uid = 1002; + group = "compute"; + }; + + # Ensure ppetru is in shared group + users.users.ppetru.extraGroups = [ "shared" ]; + + # Samba server + services.samba = { + enable = true; + openFirewall = true; + + extraConfig = '' + workgroup = WORKGROUP + server string = fractal + netbios name = fractal + security = user + map to guest = bad user + ''; + + shares = { + homes = { + comment = "Home Directories"; + browseable = "no"; + path = "/data/homes/%S"; + "read only" = "no"; + }; + + shared = { + path = "/data/shared"; + "read only" = "no"; + browseable = "yes"; + "guest ok" = "no"; + "create mask" = "0775"; + "directory mask" = "0775"; + "force group" = "+shared"; + }; + + media = { + path = "/data/media"; + "read only" = "no"; + browseable = "yes"; + "guest ok" = "no"; + "create mask" = "0755"; + "directory mask" = "0755"; + }; + }; + }; + + # Kopia backup server + systemd.services.kopia-server = { + description = "Kopia Backup Server"; + wantedBy = [ "multi-user.target" ]; + after = [ "network.target" "zfs-mount.service" ]; + + serviceConfig = { + User = "ppetru"; + Group = "users"; + ExecStart = '' + ${pkgs.kopia}/bin/kopia server start \ + --address 0.0.0.0:51515 \ + --tls-cert-file /persist/kopia-certs/kopia.cert \ + --tls-key-file /persist/kopia-certs/kopia.key + ''; + Restart = "on-failure"; + }; + }; + + # Kopia nightly snapshot (from cron) + systemd.services.kopia-snapshot = { + description = "Kopia snapshot of homes and shared"; + serviceConfig = { + Type = "oneshot"; + User = "ppetru"; + Group = "users"; + ExecStart = '' + ${pkgs.kopia}/bin/kopia --config-file=/home/ppetru/.config/kopia/repository.config \ + snapshot create /data/homes /data/shared \ + --log-level=warning --no-progress + ''; + }; + }; + + systemd.timers.kopia-snapshot = { + wantedBy = [ "timers.target" ]; + timerConfig = { + OnCalendar = "22:47"; + Persistent = true; + }; + }; + + # Keep kopia config and certs persistent + environment.persistence."/persist" = { + directories = [ + "/home/ppetru/.config/kopia" + "/home/ppetru/kopia-certs" + ]; + }; + + networking.firewall.allowedTCPPorts = [ + 139 445 # Samba + 51515 # Kopia + ]; + networking.firewall.allowedUDPPorts = [ + 137 138 # Samba + ]; + } + ``` + +**6. Deploy initial config (without cluster)** + ```bash + # First, deploy without cluster-node.nix to verify storage works + # Comment out cluster-node import temporarily + + deploy -s '.#fractal' + + # Verify mounts + ssh fractal "df -h | grep data" + ssh fractal "ls -la /data/" + + # Test samba + smbclient -L fractal -U ppetru + + # Test kopia + ssh fractal "systemctl status kopia-server" + ``` + +**7. Join cluster (add to quorum)** + ```bash + # Uncomment cluster-node.nix import in fractal config + # Update all cluster configs for 5-server quorum + # (See step 3 in existing Phase 1 docs) + + deploy # Deploy to all nodes + + # Verify quorum + consul members + nomad server members + ``` + +**8. Update cluster configs for 5-server quorum** + ```nix + # common/consul.nix + servers = ["c1" "c2" "c3" "fractal" "zippy"]; + bootstrap_expect = 3; + + # common/nomad.nix + servers = ["c1" "c2" "c3" "fractal" "zippy"]; + bootstrap_expect = 3; + ``` + +**9. Verify fractal is fully operational** + ```bash + # Check all services + ssh fractal "systemctl status samba kopia-server kopia-snapshot.timer" + + # Verify ZFS pools + ssh fractal "zpool status" + ssh fractal "zfs list" + + # Test accessing shares from another node + ssh c1 "ls /data/media /data/shared" + + # Verify kopia clients can still connect + kopia repository status --server=https://fractal:51515 + + # Check nomad can see fractal + nomad node status | grep fractal + + # Verify quorum + consul members # Should see c1, c2, c3, fractal + nomad server members # Should see 4 servers + ``` + +### Phase 2: Setup zippy storage layer +**Duration: 2-3 hours** + +**Goal:** Prepare zippy for NFS server role, setup replication + +1. **Create btrfs subvolume on zippy** + ```bash + ssh zippy + sudo btrfs subvolume create /persist/services + sudo chown ppetru:users /persist/services + ``` + +2. **Update zippy configuration** + ```nix + # hosts/zippy/default.nix + imports = [ + ../../common/encrypted-btrfs-layout.nix + ../../common/global + ../../common/cluster-node.nix # Adds to quorum + ../../common/nomad.nix + ./hardware.nix + ]; + + # NFS server + services.nfs.server = { + enable = true; + exports = '' + /persist/services 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash) + ''; + }; + + # Consul service registration for NFS + services.consul.extraConfig.services = [{ + name = "services"; + port = 2049; + checks = [{ tcp = "localhost:2049"; interval = "30s"; }]; + }]; + + # Btrfs replication to standbys (incremental after first full send) + systemd.services.replicate-to-c1 = { + description = "Replicate /persist/services to c1"; + script = '' + ${pkgs.btrfs-progs}/bin/btrfs subvolume snapshot -r /persist/services /persist/services@$(date +%Y%m%d-%H%M%S) + LATEST=$(ls -t /persist/services@* | head -1) + + # Get previous snapshot for incremental send + PREV=$(ls -t /persist/services@* | head -2 | tail -1) + + # First run: full send. Subsequent: incremental with -p (parent) + if [ "$LATEST" != "$PREV" ]; then + ${pkgs.btrfs-progs}/bin/btrfs send -p $PREV $LATEST | ${pkgs.openssh}/bin/ssh c1 "${pkgs.btrfs-progs}/bin/btrfs receive /persist/" + else + # First snapshot, full send + ${pkgs.btrfs-progs}/bin/btrfs send $LATEST | ${pkgs.openssh}/bin/ssh c1 "${pkgs.btrfs-progs}/bin/btrfs receive /persist/" + fi + + # Cleanup old snapshots (keep last 24 hours on sender) + find /persist/services@* -mtime +1 -exec ${pkgs.btrfs-progs}/bin/btrfs subvolume delete {} \; + ''; + }; + + systemd.timers.replicate-to-c1 = { + wantedBy = [ "timers.target" ]; + timerConfig = { + OnCalendar = "*:0/5"; # Every 5 minutes (incremental after first full send) + Persistent = true; + }; + }; + + # Same for c2 + systemd.services.replicate-to-c2 = { ... }; + systemd.timers.replicate-to-c2 = { ... }; + ``` + +3. **Setup standby storage on c1 and c2** + ```bash + # On c1 and c2 + ssh c1 sudo btrfs subvolume create /persist/services-standby + ssh c2 sudo btrfs subvolume create /persist/services-standby + ``` + +4. **Deploy and verify** + ```bash + deploy -s '.#zippy' + + # Verify NFS export + showmount -e zippy + + # Verify Consul registration + dig @localhost -p 8600 services.service.consul + ``` + +5. **Verify quorum is now 5 servers** + ```bash + consul members # Should show c1, c2, c3, fractal, zippy + nomad server members + ``` + +### Phase 3: Migrate from GlusterFS to NFS +**Duration: 3-4 hours** + +**Goal:** Move all data, update mounts, remove GlusterFS + +1. **Copy data from GlusterFS to zippy** + ```bash + # On any node with /data/compute mounted + rsync -av --progress /data/compute/ zippy:/persist/services/ + + # Verify + ssh zippy du -sh /persist/services + ``` + +2. **Update all nodes to mount NFS** + ```nix + # Update common/glusterfs-client.nix → common/nfs-client.nix + # OR update common/cluster-node.nix to import nfs-client instead + + fileSystems."/data/services" = { + device = "services.service.consul:/persist/services"; + fsType = "nfs"; + options = [ "x-systemd.automount" "noauto" "x-systemd.idle-timeout=60" ]; + }; + + # Remove old GlusterFS mount + # fileSystems."/data/compute" = ... # DELETE + ``` + +3. **Deploy updated configs** + ```bash + deploy -s '.#c1' '.#c2' '.#c3' '.#fractal' '.#zippy' + ``` + +4. **Verify NFS mounts** + ```bash + for host in c1 c2 c3 fractal zippy; do + ssh $host "df -h | grep services" + done + ``` + +5. **Stop all Nomad jobs temporarily** + ```bash + # Get list of running jobs + nomad job status | grep running | awk '{print $1}' > /tmp/running-jobs.txt + + # Stop all (they'll be restarted with updated paths in Phase 4) + cat /tmp/running-jobs.txt | xargs -I {} nomad job stop {} + ``` + +6. **Remove GlusterFS from cluster** + ```bash + # On c1 (or any gluster server) + gluster volume stop compute + gluster volume delete compute + + # On all nodes + for host in c1 c2 c3; do + ssh $host "sudo systemctl stop glusterd; sudo systemctl disable glusterd" + done + ``` + +7. **Remove GlusterFS from NixOS configs** + ```nix + # common/compute-node.nix - remove ./glusterfs.nix import + # Deploy again + deploy + ``` + +### Phase 4: Update and redeploy Nomad jobs +**Duration: 2-4 hours** + +**Goal:** Update all Nomad job paths, add constraints/affinities, redeploy + +1. **Update job specs** (see Service Catalog below for details) + - Change `/data/compute` → `/data/services` + - Add constraints for media jobs → fractal + - Add affinities for database jobs → zippy + +2. **Deploy critical services first** + ```bash + # Core infrastructure + nomad run services/mysql.hcl + nomad run services/postgres.hcl + nomad run services/redis.hcl + nomad run services/traefik.hcl + nomad run services/authentik.hcl + + # Verify + nomad job status mysql + consul catalog services + ``` + +3. **Deploy high-priority services** + ```bash + nomad run services/prometheus.hcl + nomad run services/grafana.hcl + nomad run services/loki.hcl + nomad run services/vector.hcl + + nomad run services/unifi.hcl + nomad run services/gitea.hcl + ``` + +4. **Deploy medium-priority services** + ```bash + # See service catalog for full list + nomad run services/wordpress.hcl + nomad run services/ghost.hcl + nomad run services/wiki.hcl + # ... etc + ``` + +5. **Deploy low-priority services** + ```bash + nomad run services/media.hcl # Will run on fractal due to constraint + # ... etc + ``` + +6. **Verify all services healthy** + ```bash + nomad job status + consul catalog services + # Check traefik dashboard for health + ``` + +### Phase 5: Convert sunny to NixOS (Optional, can defer) +**Duration: 6-10 hours (split across 2 stages)** + +**Current state:** +- Proxmox with ~1.5TB ethereum node data +- 2x LXC containers: besu (execution client), lighthouse (consensus beacon) +- 1x VM: Rocketpool smartnode (docker containers for validator, node, MEV-boost, etc.) +- Running in "hybrid mode" - managing own execution/consensus, rocketpool manages the rest + +**Goal:** Get sunny on NixOS quickly, preserve ethereum data, defer "perfect" native setup + +--- + +#### Stage 1: Quick NixOS Migration (containers) +**Duration: 6-8 hours** +**Goal:** NixOS + containerized ethereum stack, minimal disruption + +**1. Pre-migration backup and documentation** + ```bash + # Document current setup + ssh sunny "pct list" > /backup/sunny-containers.txt + ssh sunny "qm list" > /backup/sunny-vms.txt + + # Find ethereum data locations in LXC containers + ssh sunny "pct config BESU_CT_ID" > /backup/sunny-besu-config.txt + ssh sunny "pct config LIGHTHOUSE_CT_ID" > /backup/sunny-lighthouse-config.txt + + # Document rocketpool VM volumes + ssh sunny "qm config ROCKETPOOL_VM_ID" > /backup/sunny-rocketpool-config.txt + + # Estimate ethereum data size + ssh sunny "du -sh /path/to/besu/data" + ssh sunny "du -sh /path/to/lighthouse/data" + + # Backup rocketpool config (docker-compose, wallet keys, etc.) + # This is in the VM - need to access and backup critical files + ``` + +**2. Extract ethereum data from containers/VM** + ```bash + # Stop ethereum services to get consistent state + # (This will pause validation! Plan for attestation penalties) + + # Copy besu data out of LXC + ssh sunny "pct stop BESU_CT_ID" + rsync -av --progress sunny:/var/lib/lxc/BESU_CT_ID/rootfs/path/to/besu/ /backup/sunny-besu-data/ + + # Copy lighthouse data out of LXC + ssh sunny "pct stop LIGHTHOUSE_CT_ID" + rsync -av --progress sunny:/var/lib/lxc/LIGHTHOUSE_CT_ID/rootfs/path/to/lighthouse/ /backup/sunny-lighthouse-data/ + + # Copy rocketpool data out of VM + # This includes validator keys, wallet, node config + # Access VM and copy out: ~/.rocketpool/data + ``` + +**3. Install NixOS on sunny** + - Fresh install with btrfs + impermanence + - Create large `/persist/ethereum` for 1.5TB+ data + - **DO NOT** try to resync from network (takes weeks!) + +**4. Restore ethereum data to NixOS** + ```bash + # After NixOS install, copy data back + ssh sunny "mkdir -p /persist/ethereum/{besu,lighthouse,rocketpool}" + + rsync -av --progress /backup/sunny-besu-data/ sunny:/persist/ethereum/besu/ + rsync -av --progress /backup/sunny-lighthouse-data/ sunny:/persist/ethereum/lighthouse/ + # Rocketpool data copied later + ``` + +**5. Create sunny NixOS config (container-based)** + ```nix + # hosts/sunny/default.nix + { config, pkgs, ... }: + { + imports = [ + ../../common/encrypted-btrfs-layout.nix + ../../common/global + ./hardware.nix + ]; + + networking.hostName = "sunny"; + + # NO cluster-node import - standalone for now + # Can add to quorum later if desired + + # Container runtime + virtualisation.podman = { + enable = true; + dockerCompat = true; # Provides 'docker' command + defaultNetwork.settings.dns_enabled = true; + }; + + # Besu execution client (container) + virtualisation.oci-containers.containers.besu = { + image = "hyperledger/besu:latest"; + volumes = [ + "/persist/ethereum/besu:/var/lib/besu" + ]; + ports = [ + "8545:8545" # HTTP RPC + "8546:8546" # WebSocket RPC + "30303:30303" # P2P + ]; + cmd = [ + "--data-path=/var/lib/besu" + "--rpc-http-enabled=true" + "--rpc-http-host=0.0.0.0" + "--rpc-ws-enabled=true" + "--rpc-ws-host=0.0.0.0" + "--engine-rpc-enabled=true" + "--engine-host-allowlist=*" + "--engine-jwt-secret=/var/lib/besu/jwt.hex" + # Add other besu flags as needed + ]; + autoStart = true; + }; + + # Lighthouse beacon client (container) + virtualisation.oci-containers.containers.lighthouse-beacon = { + image = "sigp/lighthouse:latest"; + volumes = [ + "/persist/ethereum/lighthouse:/data" + "/persist/ethereum/besu/jwt.hex:/jwt.hex:ro" + ]; + ports = [ + "5052:5052" # HTTP API + "9000:9000" # P2P + ]; + cmd = [ + "lighthouse" + "beacon" + "--datadir=/data" + "--http" + "--http-address=0.0.0.0" + "--execution-endpoint=http://besu:8551" + "--execution-jwt=/jwt.hex" + # Add other lighthouse flags + ]; + dependsOn = [ "besu" ]; + autoStart = true; + }; + + # Rocketpool stack (podman-compose for multi-container setup) + # TODO: This requires converting docker-compose to NixOS config + # For now, can run docker-compose via systemd service + systemd.services.rocketpool = { + description = "Rocketpool Smartnode Stack"; + after = [ "podman.service" "lighthouse-beacon.service" ]; + wantedBy = [ "multi-user.target" ]; + + serviceConfig = { + Type = "oneshot"; + RemainAfterExit = "yes"; + WorkingDirectory = "/persist/ethereum/rocketpool"; + ExecStart = "${pkgs.docker-compose}/bin/docker-compose up -d"; + ExecStop = "${pkgs.docker-compose}/bin/docker-compose down"; + }; + }; + + # Ensure ethereum data persists + environment.persistence."/persist" = { + directories = [ + "/persist/ethereum" + ]; + }; + + # Firewall for ethereum + networking.firewall = { + allowedTCPPorts = [ + 30303 # Besu P2P + 9000 # Lighthouse P2P + # Add rocketpool ports + ]; + allowedUDPPorts = [ + 30303 # Besu P2P + 9000 # Lighthouse P2P + ]; + }; + } + ``` + +**6. Setup rocketpool docker-compose on NixOS** + ```bash + # After NixOS is running, restore rocketpool config + ssh sunny "mkdir -p /persist/ethereum/rocketpool" + + # Copy rocketpool data (wallet, keys, config) + rsync -av /backup/sunny-rocketpool-data/ sunny:/persist/ethereum/rocketpool/ + + # Create docker-compose.yml for rocketpool stack + # Based on rocketpool hybrid mode docs + # This runs: validator, node software, MEV-boost, prometheus, etc. + # Connects to your besu + lighthouse containers + ``` + +**7. Deploy and test** + ```bash + deploy -s '.#sunny' + + # Verify containers are running + ssh sunny "podman ps" + + # Check besu sync status + ssh sunny "curl -X POST -H 'Content-Type: application/json' --data '{\"jsonrpc\":\"2.0\",\"method\":\"eth_syncing\",\"params\":[],\"id\":1}' http://localhost:8545" + + # Check lighthouse sync status + ssh sunny "curl http://localhost:5052/eth/v1/node/syncing" + + # Monitor rocketpool + ssh sunny "cd /persist/ethereum/rocketpool && docker-compose logs -f" + ``` + +**8. Monitor and stabilize** + - Ethereum should resume from where it left off (not resync!) + - Validation will resume once beacon is sync'd + - May have missed a few attestations during migration (minor penalty) + +--- + +#### Stage 2: Native NixOS Services (Future) +**Duration: TBD (do this later when time permits)** +**Goal:** Convert to native NixOS services using ethereum-nix + +**Why defer this:** +- Complex (rocketpool not fully packaged for Nix) +- Current container setup works fine +- Can migrate incrementally (besu → native, then lighthouse, etc.) +- No downtime once Stage 1 is stable + +**When ready:** +1. Research ethereum-nix support for besu + lighthouse + rocketpool +2. Test on separate machine first +3. Migrate one service at a time with minimal downtime +4. Document in separate migration plan + +**For now:** Stage 1 gets sunny on NixOS with base configs, managed declaratively, just using containers instead of native services. + +### Phase 6: Verification and cleanup +**Duration: 1 hour** + +1. **Test failover procedure** (see Failover Procedures below) + +2. **Verify backups are working** + ```bash + kopia snapshot list + # Check that /persist/services is being backed up + ``` + +3. **Update documentation** + - Update README.md + - Document new architecture + - Update stateful-commands.txt + +4. **Clean up old GlusterFS data** + ```bash + # Only after verifying everything works! + for host in c1 c2 c3; do + ssh $host "sudo rm -rf /persist/glusterfs" + done + ``` + +--- + +## Service Catalog + +**Legend:** +- **Priority**: CRITICAL (must be up) / HIGH (important) / MEDIUM (nice to have) / LOW (can wait) +- **Target**: Where it should run (constraint or affinity) +- **Data**: What data it needs access to +- **Changes**: What needs updating in the .hcl file + +### Core Infrastructure + +#### mysql +- **File**: `services/mysql.hcl` +- **Priority**: CRITICAL +- **Current**: Uses `/data/compute/appdata/mysql` +- **Target**: Affinity for zippy, allow c1/c2 +- **Data**: `/data/services/appdata/mysql` (NFS from zippy) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/mysql` → `/data/services/appdata/mysql` + - ✏️ Add affinity: + ```hcl + affinity { + attribute = "${node.unique.name}" + value = "zippy" + weight = 100 + } + ``` + - ✏️ Add constraint to allow fallback: + ```hcl + constraint { + attribute = "${node.unique.name}" + operator = "regexp" + value = "zippy|c1|c2" + } + ``` +- **Notes**: Core database, needs to stay up. Consul DNS `mysql.service.consul` unchanged. + +#### postgres +- **File**: `services/postgres.hcl` +- **Priority**: CRITICAL +- **Current**: Uses `/data/compute/appdata/postgres`, `/data/compute/appdata/pgadmin` +- **Target**: Affinity for zippy, allow c1/c2 +- **Data**: `/data/services/appdata/postgres`, `/data/services/appdata/pgadmin` (NFS) +- **Changes**: + - ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/appdata/*` + - ✏️ Add affinity and constraint (same as mysql) +- **Notes**: Core database for authentik, gitea, plausible, netbox, etc. + +#### redis +- **File**: `services/redis.hcl` +- **Priority**: CRITICAL +- **Current**: Uses `/data/compute/appdata/redis` +- **Target**: Affinity for zippy, allow c1/c2 +- **Data**: `/data/services/appdata/redis` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/redis` → `/data/services/appdata/redis` + - ✏️ Add affinity and constraint (same as mysql) +- **Notes**: Used by authentik, wordpress. Should co-locate with databases. + +#### traefik +- **File**: `services/traefik.hcl` +- **Priority**: CRITICAL +- **Current**: Uses `/data/compute/config/traefik` +- **Target**: Float on c1/c2/c3 (keepalived handles HA) +- **Data**: `/data/services/config/traefik` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/config/traefik` → `/data/services/config/traefik` +- **Notes**: Reverse proxy, has keepalived for VIP failover. Critical for all web access. + +#### authentik +- **File**: `services/authentik.hcl` +- **Priority**: CRITICAL +- **Current**: No persistent volumes (stateless, uses postgres/redis) +- **Target**: Float on c1/c2/c3 +- **Data**: None (uses postgres.service.consul, redis.service.consul) +- **Changes**: None needed +- **Notes**: SSO for most services. Must stay up. + +### Monitoring Stack + +#### prometheus +- **File**: `services/prometheus.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/prometheus` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/prometheus` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/prometheus` → `/data/services/appdata/prometheus` +- **Notes**: Metrics database. Important for monitoring but not critical for services. + +#### grafana +- **File**: `services/grafana.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/grafana` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/grafana` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/grafana` → `/data/services/appdata/grafana` +- **Notes**: Monitoring UI. Depends on prometheus. + +#### loki +- **File**: `services/loki.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/loki` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/loki` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/loki` → `/data/services/appdata/loki` +- **Notes**: Log aggregation. Important for debugging. + +#### vector +- **File**: `services/vector.hcl` +- **Priority**: MEDIUM +- **Current**: No persistent volumes, type=system (runs on all nodes) +- **Target**: System job (runs everywhere) +- **Data**: None (ephemeral logs, ships to loki) +- **Changes**: + - ❓ Check if glusterfs log path is still needed: `/var/log/glusterfs:/var/log/glusterfs:ro` + - ✏️ Remove glusterfs log collection after GlusterFS is removed +- **Notes**: Log shipper. Can tolerate downtime. + +### Databases (Specialized) + +#### clickhouse +- **File**: `services/clickhouse.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/clickhouse` +- **Target**: Affinity for zippy (large dataset), allow c1/c2/c3 +- **Data**: `/data/services/appdata/clickhouse` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/clickhouse` → `/data/services/appdata/clickhouse` + - ✏️ Add affinity for zippy (optional, but helps with performance) +- **Notes**: Used by plausible. Large time-series data. Important but can be recreated. + +#### mongodb +- **File**: `services/unifi.hcl` (embedded in unifi job) +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/unifi/mongodb` +- **Target**: Float on c1/c2/c3 (with unifi) +- **Data**: `/data/services/appdata/unifi/mongodb` (NFS) +- **Changes**: See unifi below +- **Notes**: Only used by unifi. Should stay with unifi controller. + +### Web Applications + +#### wordpress +- **File**: `services/wordpress.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/sync/wordpress` (syncthing-managed to avoid slow GlusterFS) +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/wordpress` (NFS from zippy) +- **Changes**: + - ✏️ Volume path: `/data/sync/wordpress` → `/data/services/appdata/wordpress` + - 📋 **Before cutover**: Copy data from syncthing to zippy: `rsync -av /data/sync/wordpress/ zippy:/persist/services/appdata/wordpress/` + - 📋 **After migration**: Remove syncthing configuration for wordpress sync +- **Notes**: Production website. Important but can tolerate brief downtime during migration. + +#### ghost +- **File**: `services/ghost.hcl` +- **Priority**: no longer used, should wipe +- **Current**: Uses `/data/compute/appdata/ghost` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/ghost` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/ghost` → `/data/services/appdata/ghost` +- **Notes**: Blog platform (alo.land). Can tolerate downtime. + +#### gitea +- **File**: `services/gitea.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/gitea/data`, `/data/compute/appdata/gitea/config` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/gitea/*` (NFS) +- **Changes**: + - ✏️ Volume paths: `/data/compute/appdata/gitea/*` → `/data/services/appdata/gitea/*` +- **Notes**: Git server. Contains code repositories. Important. + +#### wiki (tiddlywiki) +- **File**: `services/wiki.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/wiki` via host volume mount +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/wiki` (NFS) +- **Changes**: + - ✏️ Volume mount path in `volume_mount` blocks + - ⚠️ Uses `exec` driver with host volumes - verify NFS mount works with this +- **Notes**: Multiple tiddlywiki instances. Personal wikis. Can tolerate downtime. + +#### code-server +- **File**: `services/code-server.hcl` +- **Priority**: LOW +- **Current**: Uses `/data/compute/appdata/code` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/code` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/code` → `/data/services/appdata/code` +- **Notes**: Web IDE. Low priority, for development only. + +#### beancount (fava) +- **File**: `services/beancount.hcl` +- **Priority**: MEDIUM +- **Current**: Uses `/data/compute/appdata/beancount` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/beancount` (NFS) +- **Changes**: + - ✏️ Volume path: `/data/compute/appdata/beancount` → `/data/services/appdata/beancount` +- **Notes**: Finance tracking. Low priority. + +#### adminer +- **File**: `services/adminer.hcl` +- **Priority**: LOW +- **Current**: Stateless +- **Target**: Float on c1/c2/c3 +- **Data**: None +- **Changes**: None needed +- **Notes**: Database admin UI. Only needed for maintenance. + +#### plausible +- **File**: `services/plausible.hcl` +- **Priority**: HIGH +- **Current**: Stateless (uses postgres and clickhouse) +- **Target**: Float on c1/c2/c3 +- **Data**: None (uses postgres.service.consul, clickhouse.service.consul) +- **Changes**: None needed +- **Notes**: Website analytics. Nice to have but not critical. + +#### evcc +- **File**: `services/evcc.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/evcc/evcc.yaml`, `/data/compute/appdata/evcc/evcc` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/evcc/*` (NFS) +- **Changes**: + - ✏️ Volume paths: `/data/compute/appdata/evcc/*` → `/data/services/appdata/evcc/*` +- **Notes**: EV charging controller. Important for daily use. + +#### vikunja +- **File**: `services/vikunja.hcl` (assumed to exist based on README) +- **Priority**: no longer used, should delete +- **Current**: Likely uses `/data/compute/appdata/vikunja` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/vikunja` (NFS) +- **Changes**: + - ✏️ Volume paths: Update to `/data/services/appdata/vikunja` +- **Notes**: Task management. Low priority. + +#### leantime +- **File**: `services/leantime.hcl` +- **Priority**: no longer used, should delete +- **Current**: Likely uses `/data/compute/appdata/leantime` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/leantime` (NFS) +- **Changes**: + - ✏️ Volume paths: Update to `/data/services/appdata/leantime` +- **Notes**: Project management. Low priority. + +### Network Infrastructure + +#### unifi +- **File**: `services/unifi.hcl` +- **Priority**: HIGH +- **Current**: Uses `/data/compute/appdata/unifi/data`, `/data/compute/appdata/unifi/mongodb` +- **Target**: Float on c1/c2/c3/fractal/zippy +- **Data**: `/data/services/appdata/unifi/*` (NFS) +- **Changes**: + - ✏️ Volume paths: `/data/compute/appdata/unifi/*` → `/data/services/appdata/unifi/*` +- **Notes**: UniFi network controller. Critical for network management. Has keepalived VIP for stable inform address. Floating is fine. + +### Media Stack + +#### media (radarr, sonarr, bazarr, plex, qbittorrent) +- **File**: `services/media.hcl` +- **Priority**: MEDIUM +- **Current**: Uses `/data/compute/appdata/radarr`, `/data/compute/appdata/sonarr`, etc. and `/data/media` +- **Target**: **MUST run on fractal** (local /data/media access) +- **Data**: + - `/data/services/appdata/radarr` (NFS) - config data + - `/data/media` (local CIFS mount on fractal, local disk on fractal) +- **Changes**: + - ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/appdata/*` + - ✏️ **Add constraint**: + ```hcl + constraint { + attribute = "${node.unique.name}" + value = "fractal" + } + ``` +- **Notes**: Heavy I/O to /data/media. Must run on fractal for performance. Has keepalived VIP. + +### Utility Services + +#### weewx +- **File**: `services/weewx.hcl` +- **Priority**: HIGH +- **Current**: Likely uses `/data/compute/appdata/weewx` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/weewx` (NFS) +- **Changes**: + - ✏️ Volume paths: Update to `/data/services/appdata/weewx` +- **Notes**: Weather station. Low priority. + +#### maps +- **File**: `services/maps.hcl` +- **Priority**: MEDIUM +- **Current**: Likely uses `/data/compute/appdata/maps` +- **Target**: Float on c1/c2/c3 (or fractal if large tile data) +- **Data**: `/data/services/appdata/maps` (NFS) or `/data/media/maps` if large +- **Changes**: + - ✏️ Volume paths: Check data size, may want to move to /data/media +- **Notes**: Map tiles. Low priority. + +#### netbox +- **File**: `services/netbox.hcl` +- **Priority**: LOW +- **Current**: Likely uses `/data/compute/appdata/netbox` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/netbox` (NFS) +- **Changes**: + - ✏️ Volume paths: Update to `/data/services/appdata/netbox` +- **Notes**: IPAM/DCIM. Low priority, for documentation. + +#### farmos +- **File**: `services/farmos.hcl` +- **Priority**: LOW +- **Current**: Likely uses `/data/compute/appdata/farmos` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/farmos` (NFS) +- **Changes**: + - ✏️ Volume paths: Update to `/data/services/appdata/farmos` +- **Notes**: Farm management. Low priority. + +#### urbit +- **File**: `services/urbit.hcl` +- **Priority**: LOW +- **Current**: Likely uses `/data/compute/appdata/urbit` +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/urbit` (NFS) +- **Changes**: + - ✏️ Volume paths: Update to `/data/services/appdata/urbit` +- **Notes**: Urbit node. Experimental, low priority. + +#### webodm +- **File**: `services/webodm.hcl` +- **Priority**: LOW +- **Current**: Likely uses `/data/compute/appdata/webodm` +- **Target**: Float on c1/c2/c3 (or fractal if processing large imagery from /data/media) +- **Data**: `/data/services/appdata/webodm` (NFS) +- **Changes**: + - ✏️ Volume paths: Update to `/data/services/appdata/webodm` + - 🤔 May benefit from running on fractal if it processes files from /data/media +- **Notes**: Drone imagery processing. Low priority. + +#### velutrack +- **File**: `services/velutrack.hcl` +- **Priority**: LOW +- **Current**: Likely minimal state +- **Target**: Float on c1/c2/c3 +- **Data**: Minimal +- **Changes**: Verify if any volume paths need updating +- **Notes**: Vehicle tracking. Low priority. + +#### resol-gateway +- **File**: `services/resol-gateway.hcl` +- **Priority**: HIGH +- **Current**: Likely minimal state +- **Target**: Float on c1/c2/c3 +- **Data**: Minimal +- **Changes**: Verify if any volume paths need updating +- **Notes**: Solar thermal controller. Low priority. + +#### igsync +- **File**: `services/igsync.hcl` +- **Priority**: MEDIUM +- **Current**: Likely uses `/data/compute/appdata/igsync` or `/data/media` +- **Target**: Float on c1/c2/c3 (or fractal if storing to /data/media) +- **Data**: Check if it writes to `/data/media` or `/data/services` +- **Changes**: + - ✏️ Volume paths: Verify and update +- **Notes**: Instagram sync. Low priority. + +#### jupyter +- **File**: `services/jupyter.hcl` +- **Priority**: LOW +- **Current**: Stateless or minimal state +- **Target**: Float on c1/c2/c3 +- **Data**: Minimal +- **Changes**: Verify if any volume paths need updating +- **Notes**: Notebook server. Low priority, for experimentation. + +#### whoami +- **File**: `services/whoami.hcl` +- **Priority**: LOW +- **Current**: Stateless +- **Target**: Float on c1/c2/c3 +- **Data**: None +- **Changes**: None needed +- **Notes**: Test service. Can be stopped during migration. + +#### tiddlywiki (if separate from wiki.hcl) +- **File**: `services/tiddlywiki.hcl` +- **Priority**: MEDIUM +- **Current**: Likely same as wiki.hcl +- **Target**: Float on c1/c2/c3 +- **Data**: `/data/services/appdata/tiddlywiki` (NFS) +- **Changes**: Same as wiki.hcl +- **Notes**: May be duplicate of wiki.hcl. + +### Backup Jobs + +#### mysql-backup +- **File**: `services/mysql-backup.hcl` +- **Priority**: HIGH +- **Current**: Likely writes to `/data/compute` or `/data/shared` +- **Target**: Float on c1/c2/c3 +- **Data**: Should write to `/data/shared` (backed up to fractal) +- **Changes**: + - ✏️ Verify backup destination, should be `/data/shared/backups/mysql` +- **Notes**: Important for disaster recovery. Should run regularly. + +#### postgres-backup +- **File**: `services/postgres-backup.hcl` +- **Priority**: HIGH +- **Current**: Likely writes to `/data/compute` or `/data/shared` +- **Target**: Float on c1/c2/c3 +- **Data**: Should write to `/data/shared` (backed up to fractal) +- **Changes**: + - ✏️ Verify backup destination, should be `/data/shared/backups/postgres` +- **Notes**: Important for disaster recovery. Should run regularly. + +#### wordpress-backup +- **File**: `services/wordpress-backup.hcl` +- **Priority**: MEDIUM +- **Current**: Likely writes to `/data/compute` or `/data/shared` +- **Target**: Float on c1/c2/c3 +- **Data**: Should write to `/data/shared` (backed up to fractal) +- **Changes**: + - ✏️ Verify backup destination +- **Notes**: Periodic backup job. + +--- + +## Failover Procedures + +### NFS Server Failover (zippy → c1 or c2) + +**When to use:** zippy is down and not coming back soon + +**Prerequisites:** +- c1 and c2 have been receiving btrfs snapshots from zippy +- Last successful replication was < 1 hour ago (verify timestamps) + +**Procedure:** + +1. **Choose standby node** (c1 or c2) + ```bash + # Check replication freshness + ssh c1 "ls -lt /persist/services-standby@* | head -5" + ssh c2 "ls -lt /persist/services-standby@* | head -5" + + # Choose the one with most recent snapshot + # For this example, we'll use c1 + ``` + +2. **On standby node (c1), promote standby to primary** + ```bash + ssh c1 + + # Stop NFS client mount (if running) + sudo systemctl stop data-services.mount + + # Find latest snapshot + LATEST=$(ls -t /persist/services-standby@* | head -1) + + # Create writable subvolume from snapshot + sudo btrfs subvolume snapshot $LATEST /persist/services + + # Verify + ls -la /persist/services + ``` + +3. **Deploy c1-nfs-server configuration** + ```bash + # From your workstation + deploy -s '.#c1-nfs-server' + + # This activates: + # - NFS server on c1 + # - Consul service registration for "services" + # - Firewall rules + ``` + +4. **On c1, verify NFS is running** + ```bash + ssh c1 + sudo systemctl status nfs-server + showmount -e localhost + dig @localhost -p 8600 services.service.consul # Should show c1's IP + ``` + +5. **On other nodes, remount NFS** + ```bash + # Nodes should auto-remount via Consul DNS, but you can force it: + for host in c2 c3 fractal zippy; do + ssh $host "sudo systemctl restart data-services.mount" + done + ``` + +6. **Verify Nomad jobs are healthy** + ```bash + nomad job status mysql + nomad job status postgres + # Check all critical services + ``` + +7. **Update monitoring/alerts** + - Note in documentation that c1 is now primary NFS server + - Set up alert to remember to fail back to zippy when it's repaired + +**Recovery Time Objective (RTO):** ~10-15 minutes + +**Recovery Point Objective (RPO):** Last snapshot interval (**5 minutes** max) + +### Failing Back to zippy + +**When to use:** zippy is repaired and ready to resume primary role + +**Procedure:** + +1. **Sync data from c1 back to zippy** + ```bash + # On c1 (current primary) + sudo btrfs subvolume snapshot -r /persist/services /persist/services@failback-$(date +%Y%m%d-%H%M%S) + FAILBACK=$(ls -t /persist/services@failback-* | head -1) + sudo btrfs send $FAILBACK | ssh zippy "sudo btrfs receive /persist/" + + # On zippy, make it writable + ssh zippy "sudo btrfs subvolume snapshot /persist/$(basename $FAILBACK) /persist/services" + ``` + +2. **Deploy zippy back to NFS server role** + ```bash + deploy -s '.#zippy' + # Consul will register services.service.consul → zippy again + ``` + +3. **Demote c1 back to standby** + ```bash + deploy -s '.#c1' + # This removes NFS server, restores NFS client mount + ``` + +4. **Verify all nodes are mounting from zippy** + ```bash + dig @c1 -p 8600 services.service.consul # Should show zippy's IP + + for host in c1 c2 c3 fractal; do + ssh $host "df -h | grep services" + done + ``` + +### Database Job Failover (automatic via Nomad) + +**When to use:** zippy is down, database jobs need to run elsewhere + +**What happens automatically:** +1. Nomad detects zippy is unhealthy +2. Jobs with constraint `zippy|c1|c2` are rescheduled to c1 or c2 +3. Jobs start on new node, accessing `/data/services` (now via NFS from promoted standby) + +**Manual intervention needed:** +- None if NFS failover completed successfully +- If jobs are stuck: `nomad job stop mysql && nomad job run services/mysql.hcl` + +**What to check:** +```bash +nomad job status mysql +nomad job status postgres +nomad job status redis + +# Verify they're running on c1 or c2, not zippy +nomad alloc status +``` + +### Complete Cluster Failure (lose quorum) + +**Scenario:** 3 or more servers go down, quorum lost + +**Prevention:** This is why we have 5 servers (need 3 for quorum) + +**Recovery:** +1. **Bring up at least 3 servers** (any 3 from c1, c2, c3, fractal, zippy) +2. **If that's not possible, bootstrap new cluster:** + ```bash + # On one surviving server, force bootstrap + consul force-leave + nomad operator raft list-peers + nomad operator raft remove-peer + ``` +3. **Restore from backups** (worst case) + +--- + +## Post-Migration Verification Checklist + +- [ ] All 5 servers in quorum: `consul members` shows c1, c2, c3, fractal, zippy +- [ ] NFS mounts working: `df -h | grep services` on all nodes +- [ ] Btrfs replication running: Check systemd timers on zippy +- [ ] Critical services up: mysql, postgres, redis, traefik, authentik +- [ ] Monitoring working: Prometheus, Grafana, Loki accessible +- [ ] Media stack on fractal: `nomad alloc status` shows media job on fractal +- [ ] Database jobs on zippy: `nomad alloc status` shows mysql/postgres on zippy +- [ ] Consul DNS working: `dig @localhost -p 8600 services.service.consul` +- [ ] Backups running: Kopia snapshots include `/persist/services` +- [ ] GlusterFS removed: No glusterfs processes, volumes deleted +- [ ] Documentation updated: README.md, architecture diagrams + +--- + +## Rollback Plan + +**If migration fails catastrophically:** + +1. **Stop all new Nomad jobs** + ```bash + nomad job stop -purge + ``` + +2. **Restore GlusterFS mounts** + ```bash + # On all nodes, re-enable GlusterFS client + deploy # With old configs + ``` + +3. **Restart old Nomad jobs** + ```bash + # With old paths pointing to /data/compute + nomad run services/*.hcl # Old versions from git + ``` + +4. **Restore data if needed** + ```bash + rsync -av /backup/compute-pre-migration/ /data/compute/ + ``` + +**Important:** Keep GlusterFS running until Phase 4 is complete and verified! + +--- + +## Questions Answered + +1. ✅ **Where is `/data/sync/wordpress` mounted from?** + - **Answer**: Syncthing-managed to avoid slow GlusterFS + - **Action**: Migrate to `/data/services/appdata/wordpress`, remove syncthing config + +2. ✅ **Which services use `/data/media` directly?** + - **Answer**: Only media.hcl (radarr, sonarr, plex, qbittorrent) + - **Action**: Constrain media.hcl to fractal, everything else uses CIFS mount + +3. ✅ **Do we want unifi on fractal or floating?** + - **Answer**: Floating is fine + - **Action**: No constraint needed + +4. ✅ **What's the plan for sunny's existing data?** + - **Answer**: Ethereum data stays local, not replicated (too expensive) + - **Action**: Either backup/restore or resync from network during NixOS conversion + +## Questions Still to Answer + +1. **Backup retention for btrfs snapshots?** + - Current plan: Keep 24 hours of snapshots on zippy + - Is this enough? Or do we want more for safety? + - This should be fine -- snapshots are just for hot recovery. More/older backups are kept via kopia on fractal. + +2. **c1-nfs-server vs c1 config - same host, different configs?** + - Recommendation: Use same hostname, different flake output + - `c1` = normal config with NFS client + - `c1-nfs-server` = variant with NFS server enabled + - Both in flake.nix, deploy appropriate one based on role + - Answer: recommendation makes sense. + +3. **Should we verify webodm, igsync, maps don't need /data/media access?** + - neither of them needs /data/media + - maps needs /data/shared + +--- + +## Timeline Estimate + +**Total duration: 12-20 hours** (can be split across multiple sessions) + +- Phase 0 (Prep): 1-2 hours +- Phase 1 (fractal): 4-6 hours +- Phase 2 (zippy storage): 2-3 hours +- Phase 3 (GlusterFS → NFS): 3-4 hours +- Phase 4 (Nomad jobs): 2-4 hours +- Phase 5 (sunny): 2-3 hours (optional, can be done later) +- Phase 6 (Cleanup): 1 hour + +**Suggested schedule:** +- **Day 1**: Phases 0-1 (fractal conversion, establish quorum) +- **Day 2**: Phases 2-3 (zippy storage, data migration) +- **Day 3**: Phase 4 (Nomad job updates and deployment) +- **Day 4**: Phases 5-6 (sunny + cleanup) or take a break and do later + +**Maintenance windows needed:** +- Phase 3: ~1 hour downtime (all services stopped during data migration) +- Phase 4: Rolling (services come back up as redeployed)