Files
alo-cluster/CLAUDE.md
2025-11-21 14:12:19 +00:00

5.2 KiB

Claude Code Quick Reference

NixOS cluster configuration using flakes. Homelab infrastructure with Nomad/Consul orchestration.

Project Structure

├── common/
│   ├── global/                    # Applied to all hosts (backup, sops, users, etc.)
│   ├── minimal-node.nix           # Base (ssh, user, boot, impermanence)
│   ├── cluster-member.nix         # Consul agent + storage mounts (NFS/CIFS)
│   ├── nomad-worker.nix           # Nomad client (runs jobs) + Docker + NFS deps
│   ├── nomad-server.nix           # Enables Consul + Nomad server mode
│   ├── cluster-tools.nix          # Just CLI tools (nomad, wander, damon)
│   ├── workstation-node.nix       # Dev tools (wget, deploy-rs, docker, nix-ld)
│   ├── desktop-node.nix           # Hyprland + GUI environment
│   ├── nfs-services-server.nix    # NFS server + btrfs replication
│   └── nfs-services-standby.nix   # NFS standby + receive replication
├── hosts/                           # Host configs - check imports for roles
├── docs/
│   ├── CLUSTER_REVAMP.md    # Master plan for architecture changes
│   ├── MIGRATION_TODO.md    # Tracking checklist for migration
│   ├── NFS_FAILOVER.md      # NFS failover procedures
│   └── AUTH_SETUP.md        # Authentication (Pocket ID + Traefik OIDC)
└── services/            # Nomad job specs (.hcl files)

Current Architecture

Storage Mounts

  • /data/services - NFS from data-services.service.consul (check nfs-services-server.nix for primary)
  • /data/media - CIFS from fractal
  • /data/shared - CIFS from fractal

Cluster Roles (check hosts/*/default.nix for each host's imports)

  • Quorum: hosts importing nomad-server.nix (3 expected for consensus)
  • Workers: hosts importing nomad-worker.nix (run Nomad jobs)
  • NFS server: host importing nfs-services-server.nix (affinity for direct disk access like DBs)
  • Standby: hosts importing nfs-services-standby.nix (receive replication)

Config Architecture

Modular role-based configs (compose as needed):

  • minimal-node.nix - Base for all systems (SSH, user, boot, impermanence)
  • cluster-member.nix - Consul agent + shared storage mounts (no Nomad)
  • nomad-worker.nix - Nomad client to run jobs (requires cluster-member)
  • nomad-server.nix - Enables Consul + Nomad server mode (for quorum members)
  • cluster-tools.nix - Just CLI tools (no services)

Machine type configs (via flake profile):

  • workstation-node.nix - Dev tools (deploy-rs, docker, nix-ld, emulation)
  • desktop-node.nix - Extends workstation + Hyprland/GUI

Composition patterns:

  • Quorum member: cluster-member + nomad-worker + nomad-server
  • Worker only: cluster-member + nomad-worker
  • CLI only: cluster-member + cluster-tools (Consul agent, no Nomad service)
  • NFS primary: cluster-member + nomad-worker + nfs-services-server
  • Standalone: minimal-node only (no cluster membership)

Key insight: Profiles (workstation/desktop) don't imply cluster roles. Check imports for actual roles.

Key Patterns

NFS Server/Standby:

  • Primary: imports nfs-services-server.nix, sets standbys = [...]
  • Standby: imports nfs-services-standby.nix, sets replicationKeys = [...]
  • Replication: btrfs send/receive every 5min, incremental with fallback to full
  • Check host configs for current primary/standby assignments

Backups:

  • Kopia client on all nodes → Kopia server on fractal
  • Backs up /persist hourly via btrfs snapshot
  • Excludes: services@* and services-standby/services@* (replication snapshots)

Secrets:

  • SOPS for secrets, files in secrets/
  • Keys managed per-host

Authentication:

  • Pocket ID (OIDC provider) at pocket-id.v.paler.net
  • Traefik uses traefik-oidc-auth plugin for SSO
  • Services add middlewares=oidc-auth@file tag to protect
  • See docs/AUTH_SETUP.md for details

Migration Status

Phase 3 & 4: COMPLETE! GlusterFS removed, all services on NFS Next: Convert fractal to NixOS (deferred)

See docs/MIGRATION_TODO.md for detailed checklist.

Common Tasks

Deploy a host: deploy -s '.#hostname' Deploy all: deploy Check replication: Check NFS primary host, then ssh <primary> journalctl -u replicate-services-to-*.service -f NFS failover: See docs/NFS_FAILOVER.md Nomad jobs: services/*.hcl - service data stored at /data/services/<service-name>

Troubleshooting Hints

  • Replication errors with "empty stream": SSH key restricted to btrfs receive, can't run other commands
  • NFS split-brain protection: nfs-server checks Consul before starting
  • Btrfs snapshots: nested snapshots appear as empty dirs in parent snapshots
  • Kopia: uses temporary snapshot for consistency, doesn't back up nested subvolumes

Important Files

  • common/global/backup.nix - Kopia backup configuration
  • common/nfs-services-server.nix - NFS server role (check hosts for which imports this)
  • common/nfs-services-standby.nix - NFS standby role (check hosts for which imports this)
  • flake.nix - Host definitions, nixpkgs inputs

Auto-generated reference for Claude Code. Keep concise. Update when architecture changes.