diff --git a/CLAUDE.md b/CLAUDE.md index 614cc87..71a1061 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -8,22 +8,15 @@ NixOS cluster configuration using flakes. Homelab infrastructure with Nomad/Cons ├── common/ │ ├── global/ # Applied to all hosts (backup, sops, users, etc.) │ ├── minimal-node.nix # Base (ssh, user, boot, impermanence) -│ ├── cluster-member.nix # Consul + storage clients (NFS/CIFS/GlusterFS) +│ ├── cluster-member.nix # Consul agent + storage mounts (NFS/CIFS) │ ├── nomad-worker.nix # Nomad client (runs jobs) + Docker + NFS deps │ ├── nomad-server.nix # Enables Consul + Nomad server mode │ ├── cluster-tools.nix # Just CLI tools (nomad, wander, damon) │ ├── workstation-node.nix # Dev tools (wget, deploy-rs, docker, nix-ld) │ ├── desktop-node.nix # Hyprland + GUI environment -│ ├── nfs-services-server.nix # NFS server + btrfs replication (zippy) -│ └── nfs-services-standby.nix # NFS standby + receive replication (c1) -├── hosts/ -│ ├── c1/, c2/, c3/ # Cattle nodes (quorum + workers) -│ ├── sparky/ # Primary storage + NFS server + worker (not quorum) -│ ├── chilly/ # Home Assistant VM + cluster member (Consul only) -│ ├── zippy/ # worker -│ ├── sparky/ # Desktop + cluster member (Consul only) -│ ├── fractal/ # (Proxmox, will become NixOS storage node) -│ └── sunny/ # (Standalone ethereum node, not in cluster) +│ ├── nfs-services-server.nix # NFS server + btrfs replication +│ └── nfs-services-standby.nix # NFS standby + receive replication +├── hosts/ # Host configs - check imports for roles ├── docs/ │ ├── CLUSTER_REVAMP.md # Master plan for architecture changes │ ├── MIGRATION_TODO.md # Tracking checklist for migration @@ -34,18 +27,15 @@ NixOS cluster configuration using flakes. Homelab infrastructure with Nomad/Cons ## Current Architecture ### Storage Mounts -- `/data/services` - NFS from `data-services.service.consul` (zippy primary, c1 standby) -- `/data/media` - CIFS from fractal (existing, unchanged) -- `/data/shared` - CIFS from fractal (existing, unchanged) +- `/data/services` - NFS from `data-services.service.consul` (check nfs-services-server.nix for primary) +- `/data/media` - CIFS from fractal +- `/data/shared` - CIFS from fractal -### Hosts -- **c1, c2, c3**: Cattle nodes, run most workloads, Nomad/Consul quorum members -- **sparky**: Primary NFS server, runs workloads (affinity), NOT quorum, replicates to c1 every 5min -- **chilly**: Home Assistant VM, cluster member (Consul agent + CLI tools), no workloads -- **sparky**: Nomad worker -- **beefy**: Desktop/laptop, cluster member (Consul agent + CLI tools), no workloads -- **fractal**: Storage node (Proxmox/ZFS), will join quorum after GlusterFS removed -- **sunny**: Standalone ethereum staking node (not in cluster) +### Cluster Roles (check hosts/*/default.nix for each host's imports) +- **Quorum**: hosts importing `nomad-server.nix` (3 expected for consensus) +- **Workers**: hosts importing `nomad-worker.nix` (run Nomad jobs) +- **NFS server**: host importing `nfs-services-server.nix` (affinity for direct disk access like DBs) +- **Standby**: hosts importing `nfs-services-standby.nix` (receive replication) ## Config Architecture @@ -60,19 +50,22 @@ NixOS cluster configuration using flakes. Homelab infrastructure with Nomad/Cons - `workstation-node.nix` - Dev tools (deploy-rs, docker, nix-ld, emulation) - `desktop-node.nix` - Extends workstation + Hyprland/GUI -**Host composition examples**: -- c1/c2/c3: `cluster-member + nomad-worker + nomad-server` (quorum + runs jobs) -- zippy: `cluster-member + nomad-worker` (runs jobs, not quorum) -- chilly/sparky: `cluster-member + cluster-tools` (Consul + CLI only) +**Composition patterns**: +- Quorum member: `cluster-member + nomad-worker + nomad-server` +- Worker only: `cluster-member + nomad-worker` +- CLI only: `cluster-member + cluster-tools` (Consul agent, no Nomad service) +- NFS primary: `cluster-member + nomad-worker + nfs-services-server` +- Standalone: `minimal-node` only (no cluster membership) -**Key insight**: Profiles (workstation/desktop) no longer imply cluster membership. Hosts explicitly declare roles via imports. +**Key insight**: Profiles (workstation/desktop) don't imply cluster roles. Check imports for actual roles. ## Key Patterns **NFS Server/Standby**: -- Primary (zippy): imports `nfs-services-server.nix`, sets `standbys = ["c1"]` -- Standby (c1): imports `nfs-services-standby.nix`, sets `replicationKeys = [...]` +- Primary: imports `nfs-services-server.nix`, sets `standbys = [...]` +- Standby: imports `nfs-services-standby.nix`, sets `replicationKeys = [...]` - Replication: btrfs send/receive every 5min, incremental with fallback to full +- Check host configs for current primary/standby assignments **Backups**: - Kopia client on all nodes → Kopia server on fractal @@ -94,7 +87,7 @@ See `docs/MIGRATION_TODO.md` for detailed checklist. **Deploy a host**: `deploy -s '.#hostname'` **Deploy all**: `deploy` -**Check replication**: `ssh zippy journalctl -u replicate-services-to-c1.service -f` +**Check replication**: Check NFS primary host, then `ssh journalctl -u replicate-services-to-*.service -f` **NFS failover**: See `docs/NFS_FAILOVER.md` **Nomad jobs**: `services/*.hcl` - service data stored at `/data/services/` @@ -108,8 +101,8 @@ See `docs/MIGRATION_TODO.md` for detailed checklist. ## Important Files - `common/global/backup.nix` - Kopia backup configuration -- `hosts/zippy/default.nix` - NFS server config, replication targets -- `hosts/c1/default.nix` - NFS standby config, authorized replication keys +- `common/nfs-services-server.nix` - NFS server role (check hosts for which imports this) +- `common/nfs-services-standby.nix` - NFS standby role (check hosts for which imports this) - `flake.nix` - Host definitions, nixpkgs inputs ---