Files
alo-cluster/CLAUDE.md

5.3 KiB

Claude Code Quick Reference

NixOS cluster configuration using flakes. Homelab infrastructure with Nomad/Consul orchestration.

Project Structure

├── common/
│   ├── global/                    # Applied to all hosts (backup, sops, users, etc.)
│   ├── minimal-node.nix           # Base (ssh, user, boot, impermanence)
│   ├── cluster-member.nix         # Consul + storage clients (NFS/CIFS/GlusterFS)
│   ├── nomad-worker.nix           # Nomad client (runs jobs) + Docker + NFS deps
│   ├── nomad-server.nix           # Enables Consul + Nomad server mode
│   ├── cluster-tools.nix          # Just CLI tools (nomad, wander, damon)
│   ├── workstation-node.nix       # Dev tools (wget, deploy-rs, docker, nix-ld)
│   ├── desktop-node.nix           # Hyprland + GUI environment
│   ├── nfs-services-server.nix    # NFS server + btrfs replication (zippy)
│   └── nfs-services-standby.nix   # NFS standby + receive replication (c1)
├── hosts/
│   ├── c1/, c2/, c3/    # Cattle nodes (quorum + workers)
│   ├── sparky/          # Primary storage + NFS server + worker (not quorum)
│   ├── chilly/          # Home Assistant VM + cluster member (Consul only)
│   ├── zippy/           # worker
│   ├── sparky/          # Desktop + cluster member (Consul only)
│   ├── fractal/         # (Proxmox, will become NixOS storage node)
│   └── sunny/           # (Standalone ethereum node, not in cluster)
├── docs/
│   ├── CLUSTER_REVAMP.md    # Master plan for architecture changes
│   ├── MIGRATION_TODO.md    # Tracking checklist for migration
│   └── NFS_FAILOVER.md      # NFS failover procedures
└── services/            # Nomad job specs (.hcl files)

Current Architecture

Storage Mounts

  • /data/services - NFS from data-services.service.consul (zippy primary, c1 standby)
  • /data/media - CIFS from fractal (existing, unchanged)
  • /data/shared - CIFS from fractal (existing, unchanged)

Hosts

  • c1, c2, c3: Cattle nodes, run most workloads, Nomad/Consul quorum members
  • sparky: Primary NFS server, runs workloads (affinity), NOT quorum, replicates to c1 every 5min
  • chilly: Home Assistant VM, cluster member (Consul agent + CLI tools), no workloads
  • sparky: Nomad worker
  • beefy: Desktop/laptop, cluster member (Consul agent + CLI tools), no workloads
  • fractal: Storage node (Proxmox/ZFS), will join quorum after GlusterFS removed
  • sunny: Standalone ethereum staking node (not in cluster)

Config Architecture

Modular role-based configs (compose as needed):

  • minimal-node.nix - Base for all systems (SSH, user, boot, impermanence)
  • cluster-member.nix - Consul agent + shared storage mounts (no Nomad)
  • nomad-worker.nix - Nomad client to run jobs (requires cluster-member)
  • nomad-server.nix - Enables Consul + Nomad server mode (for quorum members)
  • cluster-tools.nix - Just CLI tools (no services)

Machine type configs (via flake profile):

  • workstation-node.nix - Dev tools (deploy-rs, docker, nix-ld, emulation)
  • desktop-node.nix - Extends workstation + Hyprland/GUI

Host composition examples:

  • c1/c2/c3: cluster-member + nomad-worker + nomad-server (quorum + runs jobs)
  • zippy: cluster-member + nomad-worker (runs jobs, not quorum)
  • chilly/sparky: cluster-member + cluster-tools (Consul + CLI only)

Key insight: Profiles (workstation/desktop) no longer imply cluster membership. Hosts explicitly declare roles via imports.

Key Patterns

NFS Server/Standby:

  • Primary (zippy): imports nfs-services-server.nix, sets standbys = ["c1"]
  • Standby (c1): imports nfs-services-standby.nix, sets replicationKeys = [...]
  • Replication: btrfs send/receive every 5min, incremental with fallback to full

Backups:

  • Kopia client on all nodes → Kopia server on fractal
  • Backs up /persist hourly via btrfs snapshot
  • Excludes: services@* and services-standby/services@* (replication snapshots)

Secrets:

  • SOPS for secrets, files in secrets/
  • Keys managed per-host

Migration Status

Phase 3 & 4: COMPLETE! GlusterFS removed, all services on NFS Next: Convert fractal to NixOS (deferred)

See docs/MIGRATION_TODO.md for detailed checklist.

Common Tasks

Deploy a host: deploy -s '.#hostname' Deploy all: deploy Check replication: ssh zippy journalctl -u replicate-services-to-c1.service -f NFS failover: See docs/NFS_FAILOVER.md Nomad jobs: services/*.hcl - service data stored at /data/services/<service-name>

Troubleshooting Hints

  • Replication errors with "empty stream": SSH key restricted to btrfs receive, can't run other commands
  • NFS split-brain protection: nfs-server checks Consul before starting
  • Btrfs snapshots: nested snapshots appear as empty dirs in parent snapshots
  • Kopia: uses temporary snapshot for consistency, doesn't back up nested subvolumes

Important Files

  • common/global/backup.nix - Kopia backup configuration
  • hosts/zippy/default.nix - NFS server config, replication targets
  • hosts/c1/default.nix - NFS standby config, authorized replication keys
  • flake.nix - Host definitions, nixpkgs inputs

Auto-generated reference for Claude Code. Keep concise. Update when architecture changes.