117 lines
5.2 KiB
Markdown
117 lines
5.2 KiB
Markdown
# Claude Code Quick Reference
|
|
|
|
NixOS cluster configuration using flakes. Homelab infrastructure with Nomad/Consul orchestration.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
├── common/
|
|
│ ├── global/ # Applied to all hosts (backup, sops, users, etc.)
|
|
│ ├── minimal-node.nix # Base (ssh, user, boot, impermanence)
|
|
│ ├── cluster-member.nix # Consul agent + storage mounts (NFS/CIFS)
|
|
│ ├── nomad-worker.nix # Nomad client (runs jobs) + Docker + NFS deps
|
|
│ ├── nomad-server.nix # Enables Consul + Nomad server mode
|
|
│ ├── cluster-tools.nix # Just CLI tools (nomad, wander, damon)
|
|
│ ├── workstation-node.nix # Dev tools (wget, deploy-rs, docker, nix-ld)
|
|
│ ├── desktop-node.nix # Hyprland + GUI environment
|
|
│ ├── nfs-services-server.nix # NFS server + btrfs replication
|
|
│ └── nfs-services-standby.nix # NFS standby + receive replication
|
|
├── hosts/ # Host configs - check imports for roles
|
|
├── docs/
|
|
│ ├── CLUSTER_REVAMP.md # Master plan for architecture changes
|
|
│ ├── MIGRATION_TODO.md # Tracking checklist for migration
|
|
│ ├── NFS_FAILOVER.md # NFS failover procedures
|
|
│ └── AUTH_SETUP.md # Authentication (Pocket ID + Traefik OIDC)
|
|
└── services/ # Nomad job specs (.hcl files)
|
|
```
|
|
|
|
## Current Architecture
|
|
|
|
### Storage Mounts
|
|
- `/data/services` - NFS from `data-services.service.consul` (check nfs-services-server.nix for primary)
|
|
- `/data/media` - CIFS from fractal
|
|
- `/data/shared` - CIFS from fractal
|
|
|
|
### Cluster Roles (check hosts/*/default.nix for each host's imports)
|
|
- **Quorum**: hosts importing `nomad-server.nix` (3 expected for consensus)
|
|
- **Workers**: hosts importing `nomad-worker.nix` (run Nomad jobs)
|
|
- **NFS server**: host importing `nfs-services-server.nix` (affinity for direct disk access like DBs)
|
|
- **Standby**: hosts importing `nfs-services-standby.nix` (receive replication)
|
|
|
|
## Config Architecture
|
|
|
|
**Modular role-based configs** (compose as needed):
|
|
- `minimal-node.nix` - Base for all systems (SSH, user, boot, impermanence)
|
|
- `cluster-member.nix` - Consul agent + shared storage mounts (no Nomad)
|
|
- `nomad-worker.nix` - Nomad client to run jobs (requires cluster-member)
|
|
- `nomad-server.nix` - Enables Consul + Nomad server mode (for quorum members)
|
|
- `cluster-tools.nix` - Just CLI tools (no services)
|
|
|
|
**Machine type configs** (via flake profile):
|
|
- `workstation-node.nix` - Dev tools (deploy-rs, docker, nix-ld, emulation)
|
|
- `desktop-node.nix` - Extends workstation + Hyprland/GUI
|
|
|
|
**Composition patterns**:
|
|
- Quorum member: `cluster-member + nomad-worker + nomad-server`
|
|
- Worker only: `cluster-member + nomad-worker`
|
|
- CLI only: `cluster-member + cluster-tools` (Consul agent, no Nomad service)
|
|
- NFS primary: `cluster-member + nomad-worker + nfs-services-server`
|
|
- Standalone: `minimal-node` only (no cluster membership)
|
|
|
|
**Key insight**: Profiles (workstation/desktop) don't imply cluster roles. Check imports for actual roles.
|
|
|
|
## Key Patterns
|
|
|
|
**NFS Server/Standby**:
|
|
- Primary: imports `nfs-services-server.nix`, sets `standbys = [...]`
|
|
- Standby: imports `nfs-services-standby.nix`, sets `replicationKeys = [...]`
|
|
- Replication: btrfs send/receive every 5min, incremental with fallback to full
|
|
- Check host configs for current primary/standby assignments
|
|
|
|
**Backups**:
|
|
- Kopia client on all nodes → Kopia server on fractal
|
|
- Backs up `/persist` hourly via btrfs snapshot
|
|
- Excludes: `services@*` and `services-standby/services@*` (replication snapshots)
|
|
|
|
**Secrets**:
|
|
- SOPS for secrets, files in `secrets/`
|
|
- Keys managed per-host
|
|
|
|
**Authentication**:
|
|
- Pocket ID (OIDC provider) at `pocket-id.v.paler.net`
|
|
- Traefik uses `traefik-oidc-auth` plugin for SSO
|
|
- Services add `middlewares=oidc-auth@file` tag to protect
|
|
- See `docs/AUTH_SETUP.md` for details
|
|
|
|
## Migration Status
|
|
|
|
**Phase 3 & 4**: COMPLETE! GlusterFS removed, all services on NFS
|
|
**Next**: Convert fractal to NixOS (deferred)
|
|
|
|
See `docs/MIGRATION_TODO.md` for detailed checklist.
|
|
|
|
## Common Tasks
|
|
|
|
**Deploy a host**: `deploy -s '.#hostname'`
|
|
**Deploy all**: `deploy`
|
|
**Check replication**: Check NFS primary host, then `ssh <primary> journalctl -u replicate-services-to-*.service -f`
|
|
**NFS failover**: See `docs/NFS_FAILOVER.md`
|
|
**Nomad jobs**: `services/*.hcl` - service data stored at `/data/services/<service-name>`
|
|
|
|
## Troubleshooting Hints
|
|
|
|
- Replication errors with "empty stream": SSH key restricted to `btrfs receive`, can't run other commands
|
|
- NFS split-brain protection: nfs-server checks Consul before starting
|
|
- Btrfs snapshots: nested snapshots appear as empty dirs in parent snapshots
|
|
- Kopia: uses temporary snapshot for consistency, doesn't back up nested subvolumes
|
|
|
|
## Important Files
|
|
|
|
- `common/global/backup.nix` - Kopia backup configuration
|
|
- `common/nfs-services-server.nix` - NFS server role (check hosts for which imports this)
|
|
- `common/nfs-services-standby.nix` - NFS standby role (check hosts for which imports this)
|
|
- `flake.nix` - Host definitions, nixpkgs inputs
|
|
|
|
---
|
|
*Auto-generated reference for Claude Code. Keep concise. Update when architecture changes.*
|