# Cluster Revamp Migration TODO Track migration progress from GlusterFS to NFS-based architecture. See [CLUSTER_REVAMP.md](./CLUSTER_REVAMP.md) for detailed procedures. ## Phase 0: Preparation - [x] Review cluster revamp plan - [ ] Backup everything (kopia snapshots current) - [ ] Document current state (nomad jobs, consul services) ## Phase 1: Convert fractal to NixOS (DEFERRED - do after GlusterFS migration) - [ ] Document fractal's current ZFS layout - [ ] Install NixOS on fractal - [ ] Import ZFS pools (double1, double2, double3) - [ ] Create fractal NixOS configuration - [ ] Configure Samba server for media/shared/homes - [ ] Configure Kopia backup server - [ ] Deploy and verify fractal base config - [ ] Join fractal to cluster (5-server quorum) - [ ] Update all cluster configs for 5-server quorum - [ ] Verify fractal fully operational ## Phase 2: Setup zippy storage layer - [x] Create btrfs subvolume `/persist/services` on zippy - [x] Configure NFS server on zippy (nfs-services-server.nix) - [x] Configure Consul service registration for NFS - [x] Setup btrfs replication to c1 (incremental, 5min intervals) - [x] Fix replication script to handle SSH command restrictions - [x] Setup standby storage on c1 (`/persist/services-standby`) - [x] Configure c1 as standby (nfs-services-standby.nix) - [x] Configure Kopia to exclude replication snapshots - [x] Deploy and verify NFS server on zippy - [x] Verify replication working to c1 - [ ] Setup standby storage on c2 (if desired) - [ ] Configure replication to c2 (if desired) ## Phase 3: Migrate from GlusterFS to NFS - [x] Update all nodes to mount NFS at `/data/services` - [x] Deploy updated configs (NFS client on all nodes) - [x] Stop all Nomad jobs temporarily - [x] Copy data from GlusterFS to zippy NFS - [x] Copy `/data/compute/appdata/*` → `/persist/services/appdata/` - [x] Copy `/data/compute/config/*` → `/persist/services/config/` - [x] Copy `/data/sync/wordpress` → `/persist/services/appdata/wordpress` - [x] Verify data integrity - [x] Verify NFS mounts working on all nodes - [x] Stop GlusterFS volume - [x] Delete GlusterFS volume - [x] Remove GlusterFS from NixOS configs - [x] Remove syncthing wordpress sync configuration (no longer used) ## Phase 4: Update and redeploy Nomad jobs ### Core Infrastructure (CRITICAL) - [x] mysql.hcl - moved to zippy, using `/data/services` - [x] postgres.hcl - migrated to `/data/services` - [x] redis.hcl - migrated to `/data/services` - [x] traefik.hcl - migrated to `/data/services` - [x] authentik.hcl - stateless, no changes needed ### Monitoring Stack (HIGH) - [x] prometheus.hcl - migrated to `/data/services` - [x] grafana.hcl - migrated to `/data/services` (2025-10-23) - [x] loki.hcl - migrated to `/data/services` - [x] vector.hcl - removed glusterfs log collection (2025-10-23) ### Databases (HIGH) - [x] clickhouse.hcl - migrated to `/data/services` - [x] unifi.hcl - migrated to `/data/services` (includes mongodb) ### Web Applications (HIGH-MEDIUM) - [x] wordpress.hcl - migrated to `/data/services` - [x] gitea.hcl - migrated to `/data/services` (2025-10-23) - [x] wiki.hcl - migrated to `/data/services` (2025-10-23) - [x] plausible.hcl - stateless, no changes needed ### Web Applications (LOW, may be deprecated) - [x] vikunja.hcl - migrated to `/data/services` (2025-10-23, not running) ### Media Stack (MEDIUM) - [x] media.hcl - migrated to `/data/services` ### Utility Services (MEDIUM-LOW) - [x] evcc.hcl - migrated to `/data/services` - [x] weewx.hcl - migrated to `/data/services` (2025-10-23) - [x] code-server.hcl - migrated to `/data/services` - [x] beancount.hcl - migrated to `/data/services` - [x] adminer.hcl - stateless, no changes needed - [x] maps.hcl - migrated to `/data/services` - [x] netbox.hcl - migrated to `/data/services` - [x] farmos.hcl - migrated to `/data/services` (2025-10-23) - [x] urbit.hcl - migrated to `/data/services` - [x] webodm.hcl - migrated to `/data/services` (2025-10-23, not running) - [x] velutrack.hcl - migrated to `/data/services` - [x] resol-gateway.hcl - migrated to `/data/services` (2025-10-23) - [x] igsync.hcl - migrated to `/data/services` (2025-10-23) - [x] jupyter.hcl - migrated to `/data/services` (2025-10-23, not running) - [x] whoami.hcl - stateless test service, no changes needed ### Backup Jobs (HIGH) - [x] mysql-backup - moved to zippy, verified - [x] postgres-backup.hcl - migrated to `/data/services` ### Host Volume Definitions (CRITICAL) - [x] common/nomad.nix - consolidated `appdata` and `code` volumes into single `services` volume (2025-10-23) ### Verification - [ ] All services healthy in Nomad - [ ] All services registered in Consul - [ ] Traefik routes working - [ ] Database jobs running on zippy (verify via nomad alloc status) - [ ] Media jobs running on fractal (verify via nomad alloc status) ## Phase 5: Convert sunny to NixOS (OPTIONAL - can defer) - [ ] Document current sunny setup (ethereum containers/VMs) - [ ] Backup ethereum data - [ ] Install NixOS on sunny - [ ] Restore ethereum data to `/persist/ethereum` - [ ] Create sunny container-based config (besu, lighthouse, rocketpool) - [ ] Deploy and verify ethereum stack - [ ] Monitor sync status and validation ## Phase 6: Verification and cleanup - [ ] Test NFS failover procedure (zippy → c1) - [ ] Verify backups include `/persist/services` data - [ ] Verify backups exclude replication snapshots - [ ] Update documentation (README.md, architecture diagrams) - [x] Clean up old GlusterFS data (only after everything verified!) - [x] Remove old glusterfs directories from all nodes ## Post-Migration Checklist - [ ] All 5 servers in quorum (consul members) - [ ] NFS mounts working on all nodes - [ ] Btrfs replication running (check systemd timers on zippy) - [ ] Critical services up (mysql, postgres, redis, traefik, authentik) - [ ] Monitoring working (prometheus, grafana, loki) - [ ] Media stack on fractal - [ ] Database jobs on zippy - [ ] Consul DNS working (dig @localhost -p 8600 data-services.service.consul) - [ ] Backups running (kopia snapshots include /persist/services) - [ ] GlusterFS removed (no processes, volumes deleted) - [ ] Documentation updated --- **Last updated**: 2025-10-25 **Current phase**: Phase 3 & 4 complete! GlusterFS removed, all services on NFS **Note**: Phase 1 (fractal NixOS conversion) deferred until after GlusterFS migration is complete ## Migration Summary **All services migrated to `/data/services` (30 total):** mysql, mysql-backup, postgres, postgres-backup, redis, clickhouse, prometheus, grafana, loki, vector, unifi, wordpress, gitea, wiki, traefik, evcc, weewx, netbox, farmos, webodm, jupyter, vikunja, urbit, code-server, beancount, velutrack, maps, media, resol-gateway, igsync **Stateless/no changes needed (4 services):** authentik, adminer, plausible, whoami **Configuration changes:** - common/nomad.nix: consolidated `appdata` and `code` volumes into single `services` volume - vector.hcl: removed glusterfs log collection