Files
alo-cluster/docs/MIGRATION_TODO.md
2025-10-22 13:59:31 +01:00

154 lines
6.1 KiB
Markdown

# Cluster Revamp Migration TODO
Track migration progress from GlusterFS to NFS-based architecture.
See [CLUSTER_REVAMP.md](./CLUSTER_REVAMP.md) for detailed procedures.
## Phase 0: Preparation
- [x] Review cluster revamp plan
- [ ] Backup everything (kopia snapshots current)
- [ ] Document current state (nomad jobs, consul services)
## Phase 1: Convert fractal to NixOS (DEFERRED - do after GlusterFS migration)
- [ ] Document fractal's current ZFS layout
- [ ] Install NixOS on fractal
- [ ] Import ZFS pools (double1, double2, double3)
- [ ] Create fractal NixOS configuration
- [ ] Configure Samba server for media/shared/homes
- [ ] Configure Kopia backup server
- [ ] Deploy and verify fractal base config
- [ ] Join fractal to cluster (5-server quorum)
- [ ] Update all cluster configs for 5-server quorum
- [ ] Verify fractal fully operational
## Phase 2: Setup zippy storage layer
- [x] Create btrfs subvolume `/persist/services` on zippy
- [x] Configure NFS server on zippy (nfs-services-server.nix)
- [x] Configure Consul service registration for NFS
- [x] Setup btrfs replication to c1 (incremental, 5min intervals)
- [x] Fix replication script to handle SSH command restrictions
- [x] Setup standby storage on c1 (`/persist/services-standby`)
- [x] Configure c1 as standby (nfs-services-standby.nix)
- [x] Configure Kopia to exclude replication snapshots
- [x] Deploy and verify NFS server on zippy
- [x] Verify replication working to c1
- [ ] Setup standby storage on c2 (if desired)
- [ ] Configure replication to c2 (if desired)
## Phase 3: Migrate from GlusterFS to NFS
- [x] Update all nodes to mount NFS at `/data/services`
- [x] Deploy updated configs (NFS client on all nodes)
- [ ] Stop all Nomad jobs temporarily
- [ ] Copy data from GlusterFS to zippy NFS
- [ ] Copy `/data/compute/appdata/*``/persist/services/appdata/`
- [ ] Copy `/data/compute/config/*``/persist/services/config/`
- [ ] Copy `/data/sync/wordpress``/persist/services/appdata/wordpress`
- [ ] Verify data integrity
- [ ] Verify NFS mounts working on all nodes
- [ ] Stop GlusterFS volume
- [ ] Delete GlusterFS volume
- [ ] Remove GlusterFS from NixOS configs
- [ ] Remove syncthing wordpress sync configuration
## Phase 4: Update and redeploy Nomad jobs
### Core Infrastructure (CRITICAL)
- [x] mysql.hcl - moved to zippy, using `/data/services`
- [ ] postgres.hcl - update paths, add affinity for zippy
- [ ] redis.hcl - update paths, add affinity for zippy
- [ ] traefik.hcl - update paths (already floating)
- [ ] authentik.hcl - verify (stateless, no changes needed)
### Monitoring Stack (HIGH)
- [ ] prometheus.hcl - update paths
- [ ] grafana.hcl - update paths
- [ ] loki.hcl - update paths
- [ ] vector.hcl - remove glusterfs log collection
### Databases (HIGH)
- [ ] clickhouse.hcl - update paths, add affinity for zippy
- [ ] unifi.hcl - update paths (includes mongodb)
### Web Applications (HIGH-MEDIUM)
- [ ] wordpress.hcl - update from `/data/sync/wordpress` to `/data/services/wordpress`
- [ ] gitea.hcl - update paths
- [ ] wiki.hcl - update paths, verify with exec driver
- [ ] plausible.hcl - verify (stateless)
### Web Applications (LOW, may be deprecated)
- [ ] ghost.hcl - update paths or remove (no longer used?)
- [ ] vikunja.hcl - update paths or remove (no longer used?)
- [ ] leantime.hcl - update paths or remove (no longer used?)
### Network Infrastructure (HIGH)
- [ ] unifi.hcl - update paths (already listed above)
### Media Stack (MEDIUM)
- [ ] media.hcl - update paths, add constraint for fractal
- [ ] radarr, sonarr, bazarr, plex, qbittorrent
### Utility Services (MEDIUM-LOW)
- [ ] evcc.hcl - update paths
- [ ] weewx.hcl - update paths
- [ ] code-server.hcl - update paths
- [ ] beancount.hcl - update paths
- [ ] adminer.hcl - verify (stateless)
- [ ] maps.hcl - update paths
- [ ] netbox.hcl - update paths
- [ ] farmos.hcl - update paths
- [ ] urbit.hcl - update paths
- [ ] webodm.hcl - update paths
- [ ] velutrack.hcl - verify paths
- [ ] resol-gateway.hcl - verify paths
- [ ] igsync.hcl - update paths
- [ ] jupyter.hcl - verify paths
- [ ] whoami.hcl - verify (stateless test service)
- [ ] tiddlywiki.hcl - update paths (if separate from wiki.hcl)
### Backup Jobs (HIGH)
- [x] mysql-backup - moved to zippy, verified
- [ ] postgres-backup.hcl - verify destination
- [ ] wordpress-backup.hcl - verify destination
### Verification
- [ ] All services healthy in Nomad
- [ ] All services registered in Consul
- [ ] Traefik routes working
- [ ] Database jobs running on zippy (verify via nomad alloc status)
- [ ] Media jobs running on fractal (verify via nomad alloc status)
## Phase 5: Convert sunny to NixOS (OPTIONAL - can defer)
- [ ] Document current sunny setup (ethereum containers/VMs)
- [ ] Backup ethereum data
- [ ] Install NixOS on sunny
- [ ] Restore ethereum data to `/persist/ethereum`
- [ ] Create sunny container-based config (besu, lighthouse, rocketpool)
- [ ] Deploy and verify ethereum stack
- [ ] Monitor sync status and validation
## Phase 6: Verification and cleanup
- [ ] Test NFS failover procedure (zippy → c1)
- [ ] Verify backups include `/persist/services` data
- [ ] Verify backups exclude replication snapshots
- [ ] Update documentation (README.md, architecture diagrams)
- [ ] Clean up old GlusterFS data (only after everything verified!)
- [ ] Remove old glusterfs directories from all nodes
## Post-Migration Checklist
- [ ] All 5 servers in quorum (consul members)
- [ ] NFS mounts working on all nodes
- [ ] Btrfs replication running (check systemd timers on zippy)
- [ ] Critical services up (mysql, postgres, redis, traefik, authentik)
- [ ] Monitoring working (prometheus, grafana, loki)
- [ ] Media stack on fractal
- [ ] Database jobs on zippy
- [ ] Consul DNS working (dig @localhost -p 8600 data-services.service.consul)
- [ ] Backups running (kopia snapshots include /persist/services)
- [ ] GlusterFS removed (no processes, volumes deleted)
- [ ] Documentation updated
---
**Last updated**: 2025-10-22
**Current phase**: Phase 2 complete (zippy storage setup done), ready for Phase 3 (GlusterFS → NFS migration)
**Note**: Phase 1 (fractal NixOS conversion) deferred until after GlusterFS migration is complete