Files
alo-cluster/docs/CLUSTER_REVAMP.md
2025-10-22 13:59:31 +01:00

1718 lines
52 KiB
Markdown

# Cluster Architecture Revamp
**Status**: Planning complete, ready for review and refinement
## Key Decisions
**Replication**: 5-minute intervals (incremental btrfs send)
**WordPress**: Currently syncthing → will use `/data/services` via NFS
**Media**: Only media.hcl needs `/data/media`, constrained to fractal
**Unifi**: Floating (no constraint needed)
**Sunny**: Standalone, ethereum data stays local (not replicated)
**Quorum**: 5 servers (c1, c2, c3, fractal, zippy)
**NFS Failover**: Via Consul DNS (`services.service.consul`)
## Table of Contents
1. [End State Architecture](#end-state-architecture)
2. [Migration Steps](#migration-steps)
3. [Service Catalog](#service-catalog)
4. [Failover Procedures](#failover-procedures)
---
## End State Architecture
### Cluster Topology
**5-Server Quorum (Consul + Nomad server+client):**
- **c1, c2, c3**: Cattle nodes - x86_64, run most stateless workloads
- **fractal**: Storage node - x86_64, 6x spinning drives, runs media workloads
- **zippy**: Stateful anchor - x86_64, runs database workloads (via affinity), primary NFS server
**Standalone Nodes (not in quorum):**
- **sunny**: x86_64, ethereum node + staking, base NixOS configs only
- **chilly**: x86_64, Home Assistant VM, base NixOS configs only
**Quorum Math:**
- 5 servers → quorum requires 3 healthy nodes
- Can tolerate 2 simultaneous failures
- Bootstrap expect: 3
### Storage Architecture
**Primary Storage (zippy):**
- `/persist/services` - btrfs subvolume
- Contains: mysql, postgres, redis, clickhouse, mongodb, app data
- Exported via NFS to: `services.service.consul:/persist/services`
- Replicated via **btrfs send** to c1 and c2 every **5 minutes** (incremental)
**Standby Storage (c1, c2):**
- `/persist/services-standby` - btrfs subvolume
- Receives replicated snapshots from zippy via incremental btrfs send
- Can be promoted to `/persist/services` and exported as NFS during failover
- Maximum data loss: **5 minutes** (last replication interval)
**Standalone Storage (sunny):**
- `/persist/ethereum` - local btrfs subvolume (or similar)
- Contains: ethereum blockchain data, staking keys
- **NOT replicated** - too large/expensive to replicate full ethereum node
- Backed up via kopia to fractal (if feasible/needed)
**Media Storage (fractal):**
- `/data/media` - existing spinning drive storage
- Exported via Samba (existing)
- Mounted on c1, c2, c3 via CIFS (existing)
- Local access on fractal for media workloads
**Shared Storage (fractal):**
- `/data/shared` - existing spinning drive storage
- Exported via Samba (existing)
- Mounted on c1, c2, c3 via CIFS (existing)
### Network Services
**NFS Primary (zippy):**
```nix
services.nfs.server = {
enable = true;
exports = ''
/persist/services 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
'';
};
services.consul.extraConfig.services = [{
name = "services";
port = 2049;
checks = [{ tcp = "localhost:2049"; interval = "30s"; }];
}];
```
**NFS Client (all nodes):**
```nix
fileSystems."/data/services" = {
device = "services.service.consul:/persist/services";
fsType = "nfs";
options = [ "x-systemd.automount" "noauto" "x-systemd.idle-timeout=60" ];
};
```
**Samba Exports (fractal - existing):**
- `//fractal/media``/data/media`
- `//fractal/shared``/data/shared`
### Nomad Job Placement Strategy
**Affinity-based (prefer zippy, allow c1/c2):**
- mysql, postgres, redis - stateful databases
- Run on zippy normally, can failover to c1/c2 if zippy down
**Constrained (must run on fractal):**
- **media.hcl** - radarr, sonarr, bazarr, plex, qbittorrent
- Reason: Heavy /data/media access, benefits from local storage
- **prometheus.hcl** - metrics database with 30d retention
- Reason: Large time-series data, spinning disks OK, saves SSD space
- **loki.hcl** - log aggregation with 31d retention
- Reason: Large log data, spinning disks OK
- **clickhouse.hcl** - analytics database for plausible
- Reason: Large time-series data, spinning disks OK
**Floating (can run anywhere on c1/c2/c3/fractal/zippy):**
- All other services including:
- traefik, authentik, web apps
- **grafana** (small data, just dashboards/config, queries prometheus for metrics)
- databases (mysql, postgres, redis)
- vector (system job, runs everywhere)
- Nomad schedules based on resources and constraints
### Data Migration
**Path changes needed in Nomad jobs:**
- `/data/compute/appdata/*``/data/services/*`
- `/data/compute/config/*``/data/services/*`
- `/data/sync/wordpress``/data/services/wordpress`
**No changes needed:**
- `/data/media/*` - stays the same (CIFS mount from fractal, used only by media services)
- `/data/shared/*` - stays the same (CIFS mount from fractal)
**Deprecated after migration:**
- `/data/sync/wordpress` - currently managed by syncthing to avoid slow GlusterFS
- Will be replaced by NFS mount at `/data/services/wordpress`
- Syncthing configuration for this can be removed
- Final sync: copy from syncthing to `/persist/services/wordpress` on zippy before cutover
---
## Migration Steps
**Important path simplification note:**
- All service paths use `/data/services/*` directly (not `/data/services/*`)
- Example: `/data/compute/appdata/mysql``/data/services/mysql`
- Simpler, cleaner, easier to manage
### Phase 0: Preparation
**Duration: 1-2 hours**
1. **Backup everything**
```bash
# On all nodes, ensure kopia backups are current
kopia snapshot list
# Backup glusterfs data manually
rsync -av /data/compute/ /backup/compute-pre-migration/
```
2. **Document current state**
```bash
# Save current nomad job list
nomad job status -json > /backup/nomad-jobs-pre-migration.json
# Save consul service catalog
consul catalog services > /backup/consul-services-pre-migration.txt
```
3. **Review this document**
- Verify all services are cataloged
- Confirm priority assignments
- Adjust as needed
### Phase 1: Convert fractal to NixOS
**Duration: 6-8 hours**
**Current state:**
- Proxmox on ZFS
- System pool: `rpool` (~500GB, will be wiped)
- Data pools (preserved):
- `double1` - 3.6T (homes, shared)
- `double2` - 7.2T (backup - kopia repo, PBS)
- `double3` - 17T (media, torrent)
- Services: Samba (homes, shared, media), Kopia server, PBS
- Bind mounts: `/data/{homes,shared,media,torrent}` → ZFS datasets
**Goal:** Fresh NixOS on rpool, preserve data pools, join cluster
#### Step-by-step procedure:
**1. Pre-migration documentation**
```bash
# On fractal, save ZFS layout
cat > /tmp/detect-zfs.sh << 'EOF'
#!/bin/bash
echo "=== ZFS Pools ==="
zpool status
echo -e "\n=== ZFS Datasets ==="
zfs list -o name,mountpoint,used,avail,mounted -r double1 double2 double3
echo -e "\n=== Bind mounts ==="
cat /etc/fstab | grep double
echo -e "\n=== Data directories ==="
ls -la /data/
echo -e "\n=== Samba users/groups ==="
getent group shared compute
getent passwd compute
EOF
chmod +x /tmp/detect-zfs.sh
ssh fractal /tmp/detect-zfs.sh > /backup/fractal-zfs-layout.txt
# Save samba config
scp fractal:/etc/samba/smb.conf /backup/fractal-smb.conf
# Save kopia certs and config
scp -r fractal:~/kopia-certs /backup/fractal-kopia-certs/
scp fractal:~/.config/kopia/repository.config /backup/fractal-kopia-repository.config
# Verify kopia backups are current
ssh fractal "kopia snapshot list --all"
```
**2. Stop services on fractal**
```bash
ssh fractal "systemctl stop smbd nmbd kopia"
# Don't stop PBS yet (in case we need to restore)
```
**3. Install NixOS**
- Boot NixOS installer USB
- **IMPORTANT**: Do NOT touch double1, double2, double3 during install!
- Install only on `rpool` (or create new pool if needed)
```bash
# In NixOS installer
# Option A: Reuse rpool (wipe and recreate)
zpool destroy rpool
# Option B: Use different disk if available
# Then follow standard NixOS btrfs install on that disk
```
- Use standard encrypted btrfs layout (matching other hosts)
- Minimal install first, will add cluster configs later
**4. First boot - import ZFS pools**
```bash
# SSH into fresh NixOS install
# Import pools (read-only first, to be safe)
zpool import -f -o readonly=on double1
zpool import -f -o readonly=on double2
zpool import -f -o readonly=on double3
# Verify datasets
zfs list -r double1 double2 double3
# Example output should show:
# double1/homes
# double1/shared
# double2/backup
# double3/media
# double3/torrent
# If everything looks good, export and reimport read-write
zpool export double1 double2 double3
zpool import double1
zpool import double2
zpool import double3
# Set ZFS mountpoints (if needed)
# These may already be set from Proxmox
zfs set mountpoint=/double1 double1
zfs set mountpoint=/double2 double2
zfs set mountpoint=/double3 double3
```
**5. Create fractal NixOS configuration**
```nix
# hosts/fractal/default.nix
{ config, pkgs, ... }:
{
imports = [
../../common/encrypted-btrfs-layout.nix
../../common/global
../../common/cluster-node.nix # Consul + Nomad (will add in step 7)
../../common/nomad.nix # Both server and client
./hardware.nix
];
networking.hostName = "fractal";
# ZFS support
boot.supportedFilesystems = [ "zfs" ];
boot.zfs.extraPools = [ "double1" "double2" "double3" ];
# Ensure ZFS pools are imported before mounting
systemd.services.zfs-import.wantedBy = [ "multi-user.target" ];
# Bind mounts for /data (matching Proxmox setup)
fileSystems."/data/homes" = {
device = "/double1/homes";
fsType = "none";
options = [ "bind" "x-systemd.requires=zfs-mount.service" ];
};
fileSystems."/data/shared" = {
device = "/double1/shared";
fsType = "none";
options = [ "bind" "x-systemd.requires=zfs-mount.service" ];
};
fileSystems."/data/media" = {
device = "/double3/media";
fsType = "none";
options = [ "bind" "x-systemd.requires=zfs-mount.service" ];
};
fileSystems."/data/torrent" = {
device = "/double3/torrent";
fsType = "none";
options = [ "bind" "x-systemd.requires=zfs-mount.service" ];
};
fileSystems."/backup" = {
device = "/double2/backup";
fsType = "none";
options = [ "bind" "x-systemd.requires=zfs-mount.service" ];
};
# Create data directory structure
systemd.tmpfiles.rules = [
"d /data 0755 root root -"
];
# Users and groups for samba
users.groups.shared = { gid = 1001; };
users.groups.compute = { gid = 1002; };
users.users.compute = {
isSystemUser = true;
uid = 1002;
group = "compute";
};
# Ensure ppetru is in shared group
users.users.ppetru.extraGroups = [ "shared" ];
# Samba server
services.samba = {
enable = true;
openFirewall = true;
extraConfig = ''
workgroup = WORKGROUP
server string = fractal
netbios name = fractal
security = user
map to guest = bad user
'';
shares = {
homes = {
comment = "Home Directories";
browseable = "no";
path = "/data/homes/%S";
"read only" = "no";
};
shared = {
path = "/data/shared";
"read only" = "no";
browseable = "yes";
"guest ok" = "no";
"create mask" = "0775";
"directory mask" = "0775";
"force group" = "+shared";
};
media = {
path = "/data/media";
"read only" = "no";
browseable = "yes";
"guest ok" = "no";
"create mask" = "0755";
"directory mask" = "0755";
};
};
};
# Kopia backup server
systemd.services.kopia-server = {
description = "Kopia Backup Server";
wantedBy = [ "multi-user.target" ];
after = [ "network.target" "zfs-mount.service" ];
serviceConfig = {
User = "ppetru";
Group = "users";
ExecStart = ''
${pkgs.kopia}/bin/kopia server start \
--address 0.0.0.0:51515 \
--tls-cert-file /persist/kopia-certs/kopia.cert \
--tls-key-file /persist/kopia-certs/kopia.key
'';
Restart = "on-failure";
};
};
# Kopia nightly snapshot (from cron)
systemd.services.kopia-snapshot = {
description = "Kopia snapshot of homes and shared";
serviceConfig = {
Type = "oneshot";
User = "ppetru";
Group = "users";
ExecStart = ''
${pkgs.kopia}/bin/kopia --config-file=/home/ppetru/.config/kopia/repository.config \
snapshot create /data/homes /data/shared \
--log-level=warning --no-progress
'';
};
};
systemd.timers.kopia-snapshot = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "22:47";
Persistent = true;
};
};
# Keep kopia config and certs persistent
environment.persistence."/persist" = {
directories = [
"/home/ppetru/.config/kopia"
"/home/ppetru/kopia-certs"
];
};
networking.firewall.allowedTCPPorts = [
139 445 # Samba
51515 # Kopia
];
networking.firewall.allowedUDPPorts = [
137 138 # Samba
];
}
```
**6. Deploy initial config (without cluster)**
```bash
# First, deploy without cluster-node.nix to verify storage works
# Comment out cluster-node import temporarily
deploy -s '.#fractal'
# Verify mounts
ssh fractal "df -h | grep data"
ssh fractal "ls -la /data/"
# Test samba
smbclient -L fractal -U ppetru
# Test kopia
ssh fractal "systemctl status kopia-server"
```
**7. Join cluster (add to quorum)**
```bash
# Uncomment cluster-node.nix import in fractal config
# Update all cluster configs for 5-server quorum
# (See step 3 in existing Phase 1 docs)
deploy # Deploy to all nodes
# Verify quorum
consul members
nomad server members
```
**8. Update cluster configs for 5-server quorum**
```nix
# common/consul.nix
servers = ["c1" "c2" "c3" "fractal" "zippy"];
bootstrap_expect = 3;
# common/nomad.nix
servers = ["c1" "c2" "c3" "fractal" "zippy"];
bootstrap_expect = 3;
```
**9. Verify fractal is fully operational**
```bash
# Check all services
ssh fractal "systemctl status samba kopia-server kopia-snapshot.timer"
# Verify ZFS pools
ssh fractal "zpool status"
ssh fractal "zfs list"
# Test accessing shares from another node
ssh c1 "ls /data/media /data/shared"
# Verify kopia clients can still connect
kopia repository status --server=https://fractal:51515
# Check nomad can see fractal
nomad node status | grep fractal
# Verify quorum
consul members # Should see c1, c2, c3, fractal
nomad server members # Should see 4 servers
```
### Phase 2: Setup zippy storage layer
**Duration: 2-3 hours**
**Goal:** Prepare zippy for NFS server role, setup replication
1. **Create btrfs subvolume on zippy**
```bash
ssh zippy
sudo btrfs subvolume create /persist/services
sudo chown ppetru:users /persist/services
```
2. **Update zippy configuration**
```nix
# hosts/zippy/default.nix
imports = [
../../common/encrypted-btrfs-layout.nix
../../common/global
../../common/cluster-node.nix # Adds to quorum
../../common/nomad.nix
./hardware.nix
];
# NFS server
services.nfs.server = {
enable = true;
exports = ''
/persist/services 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
'';
};
# Consul service registration for NFS
services.consul.extraConfig.services = [{
name = "services";
port = 2049;
checks = [{ tcp = "localhost:2049"; interval = "30s"; }];
}];
# Btrfs replication to standbys (incremental after first full send)
systemd.services.replicate-to-c1 = {
description = "Replicate /persist/services to c1";
script = ''
${pkgs.btrfs-progs}/bin/btrfs subvolume snapshot -r /persist/services /persist/services@$(date +%Y%m%d-%H%M%S)
LATEST=$(ls -t /persist/services@* | head -1)
# Get previous snapshot for incremental send
PREV=$(ls -t /persist/services@* | head -2 | tail -1)
# First run: full send. Subsequent: incremental with -p (parent)
if [ "$LATEST" != "$PREV" ]; then
${pkgs.btrfs-progs}/bin/btrfs send -p $PREV $LATEST | ${pkgs.openssh}/bin/ssh c1 "${pkgs.btrfs-progs}/bin/btrfs receive /persist/"
else
# First snapshot, full send
${pkgs.btrfs-progs}/bin/btrfs send $LATEST | ${pkgs.openssh}/bin/ssh c1 "${pkgs.btrfs-progs}/bin/btrfs receive /persist/"
fi
# Cleanup old snapshots (keep last 24 hours on sender)
find /persist/services@* -mtime +1 -exec ${pkgs.btrfs-progs}/bin/btrfs subvolume delete {} \;
'';
};
systemd.timers.replicate-to-c1 = {
wantedBy = [ "timers.target" ];
timerConfig = {
OnCalendar = "*:0/5"; # Every 5 minutes (incremental after first full send)
Persistent = true;
};
};
# Same for c2
systemd.services.replicate-to-c2 = { ... };
systemd.timers.replicate-to-c2 = { ... };
```
3. **Setup standby storage on c1 and c2**
```bash
# On c1 and c2
ssh c1 sudo btrfs subvolume create /persist/services-standby
ssh c2 sudo btrfs subvolume create /persist/services-standby
```
4. **Deploy and verify**
```bash
deploy -s '.#zippy'
# Verify NFS export
showmount -e zippy
# Verify Consul registration
dig @localhost -p 8600 services.service.consul
```
5. **Verify quorum is now 5 servers**
```bash
consul members # Should show c1, c2, c3, fractal, zippy
nomad server members
```
### Phase 3: Migrate from GlusterFS to NFS
**Duration: 3-4 hours**
**Goal:** Move all data, update mounts, remove GlusterFS
1. **Copy data from GlusterFS to zippy**
```bash
# On any node with /data/compute mounted
rsync -av --progress /data/compute/ zippy:/persist/services/
# Verify
ssh zippy du -sh /persist/services
```
2. **Update all nodes to mount NFS**
```nix
# Update common/glusterfs-client.nix → common/nfs-client.nix
# OR update common/cluster-node.nix to import nfs-client instead
fileSystems."/data/services" = {
device = "services.service.consul:/persist/services";
fsType = "nfs";
options = [ "x-systemd.automount" "noauto" "x-systemd.idle-timeout=60" ];
};
# Remove old GlusterFS mount
# fileSystems."/data/compute" = ... # DELETE
```
3. **Deploy updated configs**
```bash
deploy -s '.#c1' '.#c2' '.#c3' '.#fractal' '.#zippy'
```
4. **Verify NFS mounts**
```bash
for host in c1 c2 c3 fractal zippy; do
ssh $host "df -h | grep services"
done
```
5. **Stop all Nomad jobs temporarily**
```bash
# Get list of running jobs
nomad job status | grep running | awk '{print $1}' > /tmp/running-jobs.txt
# Stop all (they'll be restarted with updated paths in Phase 4)
cat /tmp/running-jobs.txt | xargs -I {} nomad job stop {}
```
6. **Remove GlusterFS from cluster**
```bash
# On c1 (or any gluster server)
gluster volume stop compute
gluster volume delete compute
# On all nodes
for host in c1 c2 c3; do
ssh $host "sudo systemctl stop glusterd; sudo systemctl disable glusterd"
done
```
7. **Remove GlusterFS from NixOS configs**
```nix
# common/compute-node.nix - remove ./glusterfs.nix import
# Deploy again
deploy
```
### Phase 4: Update and redeploy Nomad jobs
**Duration: 2-4 hours**
**Goal:** Update all Nomad job paths, add constraints/affinities, redeploy
1. **Update job specs** (see Service Catalog below for details)
- Change `/data/compute` → `/data/services`
- Add constraints for media jobs → fractal
- Add affinities for database jobs → zippy
2. **Deploy critical services first**
```bash
# Core infrastructure
nomad run services/mysql.hcl
nomad run services/postgres.hcl
nomad run services/redis.hcl
nomad run services/traefik.hcl
nomad run services/authentik.hcl
# Verify
nomad job status mysql
consul catalog services
```
3. **Deploy high-priority services**
```bash
nomad run services/prometheus.hcl
nomad run services/grafana.hcl
nomad run services/loki.hcl
nomad run services/vector.hcl
nomad run services/unifi.hcl
nomad run services/gitea.hcl
```
4. **Deploy medium-priority services**
```bash
# See service catalog for full list
nomad run services/wordpress.hcl
nomad run services/ghost.hcl
nomad run services/wiki.hcl
# ... etc
```
5. **Deploy low-priority services**
```bash
nomad run services/media.hcl # Will run on fractal due to constraint
# ... etc
```
6. **Verify all services healthy**
```bash
nomad job status
consul catalog services
# Check traefik dashboard for health
```
### Phase 5: Convert sunny to NixOS (Optional, can defer)
**Duration: 6-10 hours (split across 2 stages)**
**Current state:**
- Proxmox with ~1.5TB ethereum node data
- 2x LXC containers: besu (execution client), lighthouse (consensus beacon)
- 1x VM: Rocketpool smartnode (docker containers for validator, node, MEV-boost, etc.)
- Running in "hybrid mode" - managing own execution/consensus, rocketpool manages the rest
**Goal:** Get sunny on NixOS quickly, preserve ethereum data, defer "perfect" native setup
---
#### Stage 1: Quick NixOS Migration (containers)
**Duration: 6-8 hours**
**Goal:** NixOS + containerized ethereum stack, minimal disruption
**1. Pre-migration backup and documentation**
```bash
# Document current setup
ssh sunny "pct list" > /backup/sunny-containers.txt
ssh sunny "qm list" > /backup/sunny-vms.txt
# Find ethereum data locations in LXC containers
ssh sunny "pct config BESU_CT_ID" > /backup/sunny-besu-config.txt
ssh sunny "pct config LIGHTHOUSE_CT_ID" > /backup/sunny-lighthouse-config.txt
# Document rocketpool VM volumes
ssh sunny "qm config ROCKETPOOL_VM_ID" > /backup/sunny-rocketpool-config.txt
# Estimate ethereum data size
ssh sunny "du -sh /path/to/besu/data"
ssh sunny "du -sh /path/to/lighthouse/data"
# Backup rocketpool config (docker-compose, wallet keys, etc.)
# This is in the VM - need to access and backup critical files
```
**2. Extract ethereum data from containers/VM**
```bash
# Stop ethereum services to get consistent state
# (This will pause validation! Plan for attestation penalties)
# Copy besu data out of LXC
ssh sunny "pct stop BESU_CT_ID"
rsync -av --progress sunny:/var/lib/lxc/BESU_CT_ID/rootfs/path/to/besu/ /backup/sunny-besu-data/
# Copy lighthouse data out of LXC
ssh sunny "pct stop LIGHTHOUSE_CT_ID"
rsync -av --progress sunny:/var/lib/lxc/LIGHTHOUSE_CT_ID/rootfs/path/to/lighthouse/ /backup/sunny-lighthouse-data/
# Copy rocketpool data out of VM
# This includes validator keys, wallet, node config
# Access VM and copy out: ~/.rocketpool/data
```
**3. Install NixOS on sunny**
- Fresh install with btrfs + impermanence
- Create large `/persist/ethereum` for 1.5TB+ data
- **DO NOT** try to resync from network (takes weeks!)
**4. Restore ethereum data to NixOS**
```bash
# After NixOS install, copy data back
ssh sunny "mkdir -p /persist/ethereum/{besu,lighthouse,rocketpool}"
rsync -av --progress /backup/sunny-besu-data/ sunny:/persist/ethereum/besu/
rsync -av --progress /backup/sunny-lighthouse-data/ sunny:/persist/ethereum/lighthouse/
# Rocketpool data copied later
```
**5. Create sunny NixOS config (container-based)**
```nix
# hosts/sunny/default.nix
{ config, pkgs, ... }:
{
imports = [
../../common/encrypted-btrfs-layout.nix
../../common/global
./hardware.nix
];
networking.hostName = "sunny";
# NO cluster-node import - standalone for now
# Can add to quorum later if desired
# Container runtime
virtualisation.podman = {
enable = true;
dockerCompat = true; # Provides 'docker' command
defaultNetwork.settings.dns_enabled = true;
};
# Besu execution client (container)
virtualisation.oci-containers.containers.besu = {
image = "hyperledger/besu:latest";
volumes = [
"/persist/ethereum/besu:/var/lib/besu"
];
ports = [
"8545:8545" # HTTP RPC
"8546:8546" # WebSocket RPC
"30303:30303" # P2P
];
cmd = [
"--data-path=/var/lib/besu"
"--rpc-http-enabled=true"
"--rpc-http-host=0.0.0.0"
"--rpc-ws-enabled=true"
"--rpc-ws-host=0.0.0.0"
"--engine-rpc-enabled=true"
"--engine-host-allowlist=*"
"--engine-jwt-secret=/var/lib/besu/jwt.hex"
# Add other besu flags as needed
];
autoStart = true;
};
# Lighthouse beacon client (container)
virtualisation.oci-containers.containers.lighthouse-beacon = {
image = "sigp/lighthouse:latest";
volumes = [
"/persist/ethereum/lighthouse:/data"
"/persist/ethereum/besu/jwt.hex:/jwt.hex:ro"
];
ports = [
"5052:5052" # HTTP API
"9000:9000" # P2P
];
cmd = [
"lighthouse"
"beacon"
"--datadir=/data"
"--http"
"--http-address=0.0.0.0"
"--execution-endpoint=http://besu:8551"
"--execution-jwt=/jwt.hex"
# Add other lighthouse flags
];
dependsOn = [ "besu" ];
autoStart = true;
};
# Rocketpool stack (podman-compose for multi-container setup)
# TODO: This requires converting docker-compose to NixOS config
# For now, can run docker-compose via systemd service
systemd.services.rocketpool = {
description = "Rocketpool Smartnode Stack";
after = [ "podman.service" "lighthouse-beacon.service" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = "yes";
WorkingDirectory = "/persist/ethereum/rocketpool";
ExecStart = "${pkgs.docker-compose}/bin/docker-compose up -d";
ExecStop = "${pkgs.docker-compose}/bin/docker-compose down";
};
};
# Ensure ethereum data persists
environment.persistence."/persist" = {
directories = [
"/persist/ethereum"
];
};
# Firewall for ethereum
networking.firewall = {
allowedTCPPorts = [
30303 # Besu P2P
9000 # Lighthouse P2P
# Add rocketpool ports
];
allowedUDPPorts = [
30303 # Besu P2P
9000 # Lighthouse P2P
];
};
}
```
**6. Setup rocketpool docker-compose on NixOS**
```bash
# After NixOS is running, restore rocketpool config
ssh sunny "mkdir -p /persist/ethereum/rocketpool"
# Copy rocketpool data (wallet, keys, config)
rsync -av /backup/sunny-rocketpool-data/ sunny:/persist/ethereum/rocketpool/
# Create docker-compose.yml for rocketpool stack
# Based on rocketpool hybrid mode docs
# This runs: validator, node software, MEV-boost, prometheus, etc.
# Connects to your besu + lighthouse containers
```
**7. Deploy and test**
```bash
deploy -s '.#sunny'
# Verify containers are running
ssh sunny "podman ps"
# Check besu sync status
ssh sunny "curl -X POST -H 'Content-Type: application/json' --data '{\"jsonrpc\":\"2.0\",\"method\":\"eth_syncing\",\"params\":[],\"id\":1}' http://localhost:8545"
# Check lighthouse sync status
ssh sunny "curl http://localhost:5052/eth/v1/node/syncing"
# Monitor rocketpool
ssh sunny "cd /persist/ethereum/rocketpool && docker-compose logs -f"
```
**8. Monitor and stabilize**
- Ethereum should resume from where it left off (not resync!)
- Validation will resume once beacon is sync'd
- May have missed a few attestations during migration (minor penalty)
---
#### Stage 2: Native NixOS Services (Future)
**Duration: TBD (do this later when time permits)**
**Goal:** Convert to native NixOS services using ethereum-nix
**Why defer this:**
- Complex (rocketpool not fully packaged for Nix)
- Current container setup works fine
- Can migrate incrementally (besu → native, then lighthouse, etc.)
- No downtime once Stage 1 is stable
**When ready:**
1. Research ethereum-nix support for besu + lighthouse + rocketpool
2. Test on separate machine first
3. Migrate one service at a time with minimal downtime
4. Document in separate migration plan
**For now:** Stage 1 gets sunny on NixOS with base configs, managed declaratively, just using containers instead of native services.
### Phase 6: Verification and cleanup
**Duration: 1 hour**
1. **Test failover procedure** (see Failover Procedures below)
2. **Verify backups are working**
```bash
kopia snapshot list
# Check that /persist/services is being backed up
```
3. **Update documentation**
- Update README.md
- Document new architecture
- Update stateful-commands.txt
4. **Clean up old GlusterFS data**
```bash
# Only after verifying everything works!
for host in c1 c2 c3; do
ssh $host "sudo rm -rf /persist/glusterfs"
done
```
---
## Service Catalog
**Legend:**
- **Priority**: CRITICAL (must be up) / HIGH (important) / MEDIUM (nice to have) / LOW (can wait)
- **Target**: Where it should run (constraint or affinity)
- **Data**: What data it needs access to
- **Changes**: What needs updating in the .hcl file
### Core Infrastructure
#### mysql
- **File**: `services/mysql.hcl`
- **Priority**: CRITICAL
- **Current**: Uses `/data/compute/appdata/mysql`
- **Target**: Affinity for zippy, allow c1/c2
- **Data**: `/data/services/mysql` (NFS from zippy)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/mysql` → `/data/services/mysql`
- ✏️ Add affinity:
```hcl
affinity {
attribute = "${node.unique.name}"
value = "zippy"
weight = 100
}
```
- ✏️ Add constraint to allow fallback:
```hcl
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "zippy|c1|c2"
}
```
- **Notes**: Core database, needs to stay up. Consul DNS `mysql.service.consul` unchanged.
#### postgres
- **File**: `services/postgres.hcl`
- **Priority**: CRITICAL
- **Current**: Uses `/data/compute/appdata/postgres`, `/data/compute/appdata/pgadmin`
- **Target**: Affinity for zippy, allow c1/c2
- **Data**: `/data/services/postgres`, `/data/services/pgadmin` (NFS)
- **Changes**:
- ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/*`
- ✏️ Add affinity and constraint (same as mysql)
- **Notes**: Core database for authentik, gitea, plausible, netbox, etc.
#### redis
- **File**: `services/redis.hcl`
- **Priority**: CRITICAL
- **Current**: Uses `/data/compute/appdata/redis`
- **Target**: Affinity for zippy, allow c1/c2
- **Data**: `/data/services/redis` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/redis` → `/data/services/redis`
- ✏️ Add affinity and constraint (same as mysql)
- **Notes**: Used by authentik, wordpress. Should co-locate with databases.
#### traefik
- **File**: `services/traefik.hcl`
- **Priority**: CRITICAL
- **Current**: Uses `/data/compute/config/traefik`
- **Target**: Float on c1/c2/c3 (keepalived handles HA)
- **Data**: `/data/services/config/traefik` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/config/traefik` → `/data/services/config/traefik`
- **Notes**: Reverse proxy, has keepalived for VIP failover. Critical for all web access.
#### authentik
- **File**: `services/authentik.hcl`
- **Priority**: CRITICAL
- **Current**: No persistent volumes (stateless, uses postgres/redis)
- **Target**: Float on c1/c2/c3
- **Data**: None (uses postgres.service.consul, redis.service.consul)
- **Changes**: None needed
- **Notes**: SSO for most services. Must stay up.
### Monitoring Stack
#### prometheus
- **File**: `services/prometheus.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/prometheus`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/prometheus` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/prometheus` → `/data/services/prometheus`
- **Notes**: Metrics database. Important for monitoring but not critical for services.
#### grafana
- **File**: `services/grafana.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/grafana`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/grafana` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/grafana` → `/data/services/grafana`
- **Notes**: Monitoring UI. Depends on prometheus.
#### loki
- **File**: `services/loki.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/loki`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/loki` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/loki` → `/data/services/loki`
- **Notes**: Log aggregation. Important for debugging.
#### vector
- **File**: `services/vector.hcl`
- **Priority**: MEDIUM
- **Current**: No persistent volumes, type=system (runs on all nodes)
- **Target**: System job (runs everywhere)
- **Data**: None (ephemeral logs, ships to loki)
- **Changes**:
- ❓ Check if glusterfs log path is still needed: `/var/log/glusterfs:/var/log/glusterfs:ro`
- ✏️ Remove glusterfs log collection after GlusterFS is removed
- **Notes**: Log shipper. Can tolerate downtime.
### Databases (Specialized)
#### clickhouse
- **File**: `services/clickhouse.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/clickhouse`
- **Target**: Affinity for zippy (large dataset), allow c1/c2/c3
- **Data**: `/data/services/clickhouse` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/clickhouse` → `/data/services/clickhouse`
- ✏️ Add affinity for zippy (optional, but helps with performance)
- **Notes**: Used by plausible. Large time-series data. Important but can be recreated.
#### mongodb
- **File**: `services/unifi.hcl` (embedded in unifi job)
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/unifi/mongodb`
- **Target**: Float on c1/c2/c3 (with unifi)
- **Data**: `/data/services/unifi/mongodb` (NFS)
- **Changes**: See unifi below
- **Notes**: Only used by unifi. Should stay with unifi controller.
### Web Applications
#### wordpress
- **File**: `services/wordpress.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/sync/wordpress` (syncthing-managed to avoid slow GlusterFS)
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/wordpress` (NFS from zippy)
- **Changes**:
- ✏️ Volume path: `/data/sync/wordpress` → `/data/services/wordpress`
- 📋 **Before cutover**: Copy data from syncthing to zippy: `rsync -av /data/sync/wordpress/ zippy:/persist/services/appdata/wordpress/`
- 📋 **After migration**: Remove syncthing configuration for wordpress sync
- **Notes**: Production website. Important but can tolerate brief downtime during migration.
#### ghost
- **File**: `services/ghost.hcl`
- **Priority**: no longer used, should wipe
- **Current**: Uses `/data/compute/appdata/ghost`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/ghost` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/ghost` → `/data/services/ghost`
- **Notes**: Blog platform (alo.land). Can tolerate downtime.
#### gitea
- **File**: `services/gitea.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/gitea/data`, `/data/compute/appdata/gitea/config`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/gitea/*` (NFS)
- **Changes**:
- ✏️ Volume paths: `/data/compute/appdata/gitea/*` → `/data/services/gitea/*`
- **Notes**: Git server. Contains code repositories. Important.
#### wiki (tiddlywiki)
- **File**: `services/wiki.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/wiki` via host volume mount
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/wiki` (NFS)
- **Changes**:
- ✏️ Volume mount path in `volume_mount` blocks
- ⚠️ Uses `exec` driver with host volumes - verify NFS mount works with this
- **Notes**: Multiple tiddlywiki instances. Personal wikis. Can tolerate downtime.
#### code-server
- **File**: `services/code-server.hcl`
- **Priority**: LOW
- **Current**: Uses `/data/compute/appdata/code`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/code` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/code` → `/data/services/code`
- **Notes**: Web IDE. Low priority, for development only.
#### beancount (fava)
- **File**: `services/beancount.hcl`
- **Priority**: MEDIUM
- **Current**: Uses `/data/compute/appdata/beancount`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/beancount` (NFS)
- **Changes**:
- ✏️ Volume path: `/data/compute/appdata/beancount` → `/data/services/beancount`
- **Notes**: Finance tracking. Low priority.
#### adminer
- **File**: `services/adminer.hcl`
- **Priority**: LOW
- **Current**: Stateless
- **Target**: Float on c1/c2/c3
- **Data**: None
- **Changes**: None needed
- **Notes**: Database admin UI. Only needed for maintenance.
#### plausible
- **File**: `services/plausible.hcl`
- **Priority**: HIGH
- **Current**: Stateless (uses postgres and clickhouse)
- **Target**: Float on c1/c2/c3
- **Data**: None (uses postgres.service.consul, clickhouse.service.consul)
- **Changes**: None needed
- **Notes**: Website analytics. Nice to have but not critical.
#### evcc
- **File**: `services/evcc.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/evcc/evcc.yaml`, `/data/compute/appdata/evcc/evcc`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/evcc/*` (NFS)
- **Changes**:
- ✏️ Volume paths: `/data/compute/appdata/evcc/*` → `/data/services/evcc/*`
- **Notes**: EV charging controller. Important for daily use.
#### vikunja
- **File**: `services/vikunja.hcl` (assumed to exist based on README)
- **Priority**: no longer used, should delete
- **Current**: Likely uses `/data/compute/appdata/vikunja`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/vikunja` (NFS)
- **Changes**:
- ✏️ Volume paths: Update to `/data/services/vikunja`
- **Notes**: Task management. Low priority.
#### leantime
- **File**: `services/leantime.hcl`
- **Priority**: no longer used, should delete
- **Current**: Likely uses `/data/compute/appdata/leantime`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/leantime` (NFS)
- **Changes**:
- ✏️ Volume paths: Update to `/data/services/leantime`
- **Notes**: Project management. Low priority.
### Network Infrastructure
#### unifi
- **File**: `services/unifi.hcl`
- **Priority**: HIGH
- **Current**: Uses `/data/compute/appdata/unifi/data`, `/data/compute/appdata/unifi/mongodb`
- **Target**: Float on c1/c2/c3/fractal/zippy
- **Data**: `/data/services/unifi/*` (NFS)
- **Changes**:
- ✏️ Volume paths: `/data/compute/appdata/unifi/*` → `/data/services/unifi/*`
- **Notes**: UniFi network controller. Critical for network management. Has keepalived VIP for stable inform address. Floating is fine.
### Media Stack
#### media (radarr, sonarr, bazarr, plex, qbittorrent)
- **File**: `services/media.hcl`
- **Priority**: MEDIUM
- **Current**: Uses `/data/compute/appdata/radarr`, `/data/compute/appdata/sonarr`, etc. and `/data/media`
- **Target**: **MUST run on fractal** (local /data/media access)
- **Data**:
- `/data/services/radarr` (NFS) - config data
- `/data/media` (local CIFS mount on fractal, local disk on fractal)
- **Changes**:
- ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/*`
- ✏️ **Add constraint**:
```hcl
constraint {
attribute = "${node.unique.name}"
value = "fractal"
}
```
- **Notes**: Heavy I/O to /data/media. Must run on fractal for performance. Has keepalived VIP.
### Utility Services
#### weewx
- **File**: `services/weewx.hcl`
- **Priority**: HIGH
- **Current**: Likely uses `/data/compute/appdata/weewx`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/weewx` (NFS)
- **Changes**:
- ✏️ Volume paths: Update to `/data/services/weewx`
- **Notes**: Weather station. Low priority.
#### maps
- **File**: `services/maps.hcl`
- **Priority**: MEDIUM
- **Current**: Likely uses `/data/compute/appdata/maps`
- **Target**: Float on c1/c2/c3 (or fractal if large tile data)
- **Data**: `/data/services/maps` (NFS) or `/data/media/maps` if large
- **Changes**:
- ✏️ Volume paths: Check data size, may want to move to /data/media
- **Notes**: Map tiles. Low priority.
#### netbox
- **File**: `services/netbox.hcl`
- **Priority**: LOW
- **Current**: Likely uses `/data/compute/appdata/netbox`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/netbox` (NFS)
- **Changes**:
- ✏️ Volume paths: Update to `/data/services/netbox`
- **Notes**: IPAM/DCIM. Low priority, for documentation.
#### farmos
- **File**: `services/farmos.hcl`
- **Priority**: LOW
- **Current**: Likely uses `/data/compute/appdata/farmos`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/farmos` (NFS)
- **Changes**:
- ✏️ Volume paths: Update to `/data/services/farmos`
- **Notes**: Farm management. Low priority.
#### urbit
- **File**: `services/urbit.hcl`
- **Priority**: LOW
- **Current**: Likely uses `/data/compute/appdata/urbit`
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/urbit` (NFS)
- **Changes**:
- ✏️ Volume paths: Update to `/data/services/urbit`
- **Notes**: Urbit node. Experimental, low priority.
#### webodm
- **File**: `services/webodm.hcl`
- **Priority**: LOW
- **Current**: Likely uses `/data/compute/appdata/webodm`
- **Target**: Float on c1/c2/c3 (or fractal if processing large imagery from /data/media)
- **Data**: `/data/services/webodm` (NFS)
- **Changes**:
- ✏️ Volume paths: Update to `/data/services/webodm`
- 🤔 May benefit from running on fractal if it processes files from /data/media
- **Notes**: Drone imagery processing. Low priority.
#### velutrack
- **File**: `services/velutrack.hcl`
- **Priority**: LOW
- **Current**: Likely minimal state
- **Target**: Float on c1/c2/c3
- **Data**: Minimal
- **Changes**: Verify if any volume paths need updating
- **Notes**: Vehicle tracking. Low priority.
#### resol-gateway
- **File**: `services/resol-gateway.hcl`
- **Priority**: HIGH
- **Current**: Likely minimal state
- **Target**: Float on c1/c2/c3
- **Data**: Minimal
- **Changes**: Verify if any volume paths need updating
- **Notes**: Solar thermal controller. Low priority.
#### igsync
- **File**: `services/igsync.hcl`
- **Priority**: MEDIUM
- **Current**: Likely uses `/data/compute/appdata/igsync` or `/data/media`
- **Target**: Float on c1/c2/c3 (or fractal if storing to /data/media)
- **Data**: Check if it writes to `/data/media` or `/data/services`
- **Changes**:
- ✏️ Volume paths: Verify and update
- **Notes**: Instagram sync. Low priority.
#### jupyter
- **File**: `services/jupyter.hcl`
- **Priority**: LOW
- **Current**: Stateless or minimal state
- **Target**: Float on c1/c2/c3
- **Data**: Minimal
- **Changes**: Verify if any volume paths need updating
- **Notes**: Notebook server. Low priority, for experimentation.
#### whoami
- **File**: `services/whoami.hcl`
- **Priority**: LOW
- **Current**: Stateless
- **Target**: Float on c1/c2/c3
- **Data**: None
- **Changes**: None needed
- **Notes**: Test service. Can be stopped during migration.
#### tiddlywiki (if separate from wiki.hcl)
- **File**: `services/tiddlywiki.hcl`
- **Priority**: MEDIUM
- **Current**: Likely same as wiki.hcl
- **Target**: Float on c1/c2/c3
- **Data**: `/data/services/tiddlywiki` (NFS)
- **Changes**: Same as wiki.hcl
- **Notes**: May be duplicate of wiki.hcl.
### Backup Jobs
#### mysql-backup
- **File**: `services/mysql-backup.hcl`
- **Priority**: HIGH
- **Current**: Likely writes to `/data/compute` or `/data/shared`
- **Target**: Float on c1/c2/c3
- **Data**: Should write to `/data/shared` (backed up to fractal)
- **Changes**:
- ✏️ Verify backup destination, should be `/data/shared/backups/mysql`
- **Notes**: Important for disaster recovery. Should run regularly.
#### postgres-backup
- **File**: `services/postgres-backup.hcl`
- **Priority**: HIGH
- **Current**: Likely writes to `/data/compute` or `/data/shared`
- **Target**: Float on c1/c2/c3
- **Data**: Should write to `/data/shared` (backed up to fractal)
- **Changes**:
- ✏️ Verify backup destination, should be `/data/shared/backups/postgres`
- **Notes**: Important for disaster recovery. Should run regularly.
#### wordpress-backup
- **File**: `services/wordpress-backup.hcl`
- **Priority**: MEDIUM
- **Current**: Likely writes to `/data/compute` or `/data/shared`
- **Target**: Float on c1/c2/c3
- **Data**: Should write to `/data/shared` (backed up to fractal)
- **Changes**:
- ✏️ Verify backup destination
- **Notes**: Periodic backup job.
---
## Failover Procedures
### NFS Server Failover (zippy → c1 or c2)
**When to use:** zippy is down and not coming back soon
**Prerequisites:**
- c1 and c2 have been receiving btrfs snapshots from zippy
- Last successful replication was < 1 hour ago (verify timestamps)
**Procedure:**
1. **Choose standby node** (c1 or c2)
```bash
# Check replication freshness
ssh c1 "ls -lt /persist/services-standby@* | head -5"
ssh c2 "ls -lt /persist/services-standby@* | head -5"
# Choose the one with most recent snapshot
# For this example, we'll use c1
```
2. **On standby node (c1), promote standby to primary**
```bash
ssh c1
# Stop NFS client mount (if running)
sudo systemctl stop data-services.mount
# Find latest snapshot
LATEST=$(ls -t /persist/services-standby@* | head -1)
# Create writable subvolume from snapshot
sudo btrfs subvolume snapshot $LATEST /persist/services
# Verify
ls -la /persist/services
```
3. **Deploy c1-nfs-server configuration**
```bash
# From your workstation
deploy -s '.#c1-nfs-server'
# This activates:
# - NFS server on c1
# - Consul service registration for "services"
# - Firewall rules
```
4. **On c1, verify NFS is running**
```bash
ssh c1
sudo systemctl status nfs-server
showmount -e localhost
dig @localhost -p 8600 services.service.consul # Should show c1's IP
```
5. **On other nodes, remount NFS**
```bash
# Nodes should auto-remount via Consul DNS, but you can force it:
for host in c2 c3 fractal zippy; do
ssh $host "sudo systemctl restart data-services.mount"
done
```
6. **Verify Nomad jobs are healthy**
```bash
nomad job status mysql
nomad job status postgres
# Check all critical services
```
7. **Update monitoring/alerts**
- Note in documentation that c1 is now primary NFS server
- Set up alert to remember to fail back to zippy when it's repaired
**Recovery Time Objective (RTO):** ~10-15 minutes
**Recovery Point Objective (RPO):** Last snapshot interval (**5 minutes** max)
### Failing Back to zippy
**When to use:** zippy is repaired and ready to resume primary role
**Procedure:**
1. **Sync data from c1 back to zippy**
```bash
# On c1 (current primary)
sudo btrfs subvolume snapshot -r /persist/services /persist/services@failback-$(date +%Y%m%d-%H%M%S)
FAILBACK=$(ls -t /persist/services@failback-* | head -1)
sudo btrfs send $FAILBACK | ssh zippy "sudo btrfs receive /persist/"
# On zippy, make it writable
ssh zippy "sudo btrfs subvolume snapshot /persist/$(basename $FAILBACK) /persist/services"
```
2. **Deploy zippy back to NFS server role**
```bash
deploy -s '.#zippy'
# Consul will register services.service.consul → zippy again
```
3. **Demote c1 back to standby**
```bash
deploy -s '.#c1'
# This removes NFS server, restores NFS client mount
```
4. **Verify all nodes are mounting from zippy**
```bash
dig @c1 -p 8600 services.service.consul # Should show zippy's IP
for host in c1 c2 c3 fractal; do
ssh $host "df -h | grep services"
done
```
### Database Job Failover (automatic via Nomad)
**When to use:** zippy is down, database jobs need to run elsewhere
**What happens automatically:**
1. Nomad detects zippy is unhealthy
2. Jobs with constraint `zippy|c1|c2` are rescheduled to c1 or c2
3. Jobs start on new node, accessing `/data/services` (now via NFS from promoted standby)
**Manual intervention needed:**
- None if NFS failover completed successfully
- If jobs are stuck: `nomad job stop mysql && nomad job run services/mysql.hcl`
**What to check:**
```bash
nomad job status mysql
nomad job status postgres
nomad job status redis
# Verify they're running on c1 or c2, not zippy
nomad alloc status <alloc-id>
```
### Complete Cluster Failure (lose quorum)
**Scenario:** 3 or more servers go down, quorum lost
**Prevention:** This is why we have 5 servers (need 3 for quorum)
**Recovery:**
1. **Bring up at least 3 servers** (any 3 from c1, c2, c3, fractal, zippy)
2. **If that's not possible, bootstrap new cluster:**
```bash
# On one surviving server, force bootstrap
consul force-leave <failed-node>
nomad operator raft list-peers
nomad operator raft remove-peer <failed-peer>
```
3. **Restore from backups** (worst case)
---
## Post-Migration Verification Checklist
- [ ] All 5 servers in quorum: `consul members` shows c1, c2, c3, fractal, zippy
- [ ] NFS mounts working: `df -h | grep services` on all nodes
- [ ] Btrfs replication running: Check systemd timers on zippy
- [ ] Critical services up: mysql, postgres, redis, traefik, authentik
- [ ] Monitoring working: Prometheus, Grafana, Loki accessible
- [ ] Media stack on fractal: `nomad alloc status` shows media job on fractal
- [ ] Database jobs on zippy: `nomad alloc status` shows mysql/postgres on zippy
- [ ] Consul DNS working: `dig @localhost -p 8600 services.service.consul`
- [ ] Backups running: Kopia snapshots include `/persist/services`
- [ ] GlusterFS removed: No glusterfs processes, volumes deleted
- [ ] Documentation updated: README.md, architecture diagrams
---
## Rollback Plan
**If migration fails catastrophically:**
1. **Stop all new Nomad jobs**
```bash
nomad job stop -purge <new-jobs>
```
2. **Restore GlusterFS mounts**
```bash
# On all nodes, re-enable GlusterFS client
deploy # With old configs
```
3. **Restart old Nomad jobs**
```bash
# With old paths pointing to /data/compute
nomad run services/*.hcl # Old versions from git
```
4. **Restore data if needed**
```bash
rsync -av /backup/compute-pre-migration/ /data/compute/
```
**Important:** Keep GlusterFS running until Phase 4 is complete and verified!
---
## Questions Answered
1. ✅ **Where is `/data/sync/wordpress` mounted from?**
- **Answer**: Syncthing-managed to avoid slow GlusterFS
- **Action**: Migrate to `/data/services/wordpress`, remove syncthing config
2. ✅ **Which services use `/data/media` directly?**
- **Answer**: Only media.hcl (radarr, sonarr, plex, qbittorrent)
- **Action**: Constrain media.hcl to fractal, everything else uses CIFS mount
3. ✅ **Do we want unifi on fractal or floating?**
- **Answer**: Floating is fine
- **Action**: No constraint needed
4. ✅ **What's the plan for sunny's existing data?**
- **Answer**: Ethereum data stays local, not replicated (too expensive)
- **Action**: Either backup/restore or resync from network during NixOS conversion
## Questions Still to Answer
1. **Backup retention for btrfs snapshots?**
- Current plan: Keep 24 hours of snapshots on zippy
- Is this enough? Or do we want more for safety?
- This should be fine -- snapshots are just for hot recovery. More/older backups are kept via kopia on fractal.
2. **c1-nfs-server vs c1 config - same host, different configs?**
- Recommendation: Use same hostname, different flake output
- `c1` = normal config with NFS client
- `c1-nfs-server` = variant with NFS server enabled
- Both in flake.nix, deploy appropriate one based on role
- Answer: recommendation makes sense.
3. **Should we verify webodm, igsync, maps don't need /data/media access?**
- neither of them needs /data/media
- maps needs /data/shared
---
## Timeline Estimate
**Total duration: 12-20 hours** (can be split across multiple sessions)
- Phase 0 (Prep): 1-2 hours
- Phase 1 (fractal): 4-6 hours
- Phase 2 (zippy storage): 2-3 hours
- Phase 3 (GlusterFS → NFS): 3-4 hours
- Phase 4 (Nomad jobs): 2-4 hours
- Phase 5 (sunny): 2-3 hours (optional, can be done later)
- Phase 6 (Cleanup): 1 hour
**Suggested schedule:**
- **Day 1**: Phases 0-1 (fractal conversion, establish quorum)
- **Day 2**: Phases 2-3 (zippy storage, data migration)
- **Day 3**: Phase 4 (Nomad job updates and deployment)
- **Day 4**: Phases 5-6 (sunny + cleanup) or take a break and do later
**Maintenance windows needed:**
- Phase 3: ~1 hour downtime (all services stopped during data migration)
- Phase 4: Rolling (services come back up as redeployed)