Compare commits
17 Commits
1262e03e21
...
2437d46aa9
| Author | SHA1 | Date | |
|---|---|---|---|
| 2437d46aa9 | |||
| d16ffd9c65 | |||
| 49f159e2a6 | |||
| 17c0f2db2a | |||
| c80a2c9a58 | |||
| 706f46ae77 | |||
| fa603e8aea | |||
| 8032ad4d20 | |||
| 8ce5194ca9 | |||
| a948f26ffb | |||
| f414ac0146 | |||
| 17711da0b6 | |||
| ed06f07116 | |||
| bffc09cbd6 | |||
| f488b710bf | |||
| 65835e1ed0 | |||
| 967ff34a51 |
92
CLAUDE.md
Normal file
92
CLAUDE.md
Normal file
@@ -0,0 +1,92 @@
|
|||||||
|
# Claude Code Quick Reference
|
||||||
|
|
||||||
|
NixOS cluster configuration using flakes. Homelab infrastructure with Nomad/Consul orchestration.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
├── common/
|
||||||
|
│ ├── global/ # Applied to all hosts (backup, sops, users, etc.)
|
||||||
|
│ ├── compute-node.nix # Nomad client + Consul agent + NFS client
|
||||||
|
│ ├── cluster-node.nix # Nomad server + Consul server (for quorum members)
|
||||||
|
│ ├── nfs-services-server.nix # NFS server + btrfs replication (zippy)
|
||||||
|
│ └── nfs-services-standby.nix # NFS standby + receive replication (c1, c2)
|
||||||
|
├── hosts/
|
||||||
|
│ ├── c1/, c2/, c3/ # Cattle nodes (compute, quorum members)
|
||||||
|
│ ├── zippy/ # Primary storage + NFS server + stateful workloads
|
||||||
|
│ ├── fractal/ # (Proxmox, will become NixOS storage node)
|
||||||
|
│ ├── sunny/ # (Standalone ethereum node, not in cluster)
|
||||||
|
│ └── chilly/ # (Home Assistant VM, not in cluster)
|
||||||
|
├── docs/
|
||||||
|
│ ├── CLUSTER_REVAMP.md # Master plan for architecture changes
|
||||||
|
│ ├── MIGRATION_TODO.md # Tracking checklist for migration
|
||||||
|
│ └── NFS_FAILOVER.md # NFS failover procedures
|
||||||
|
└── services/ # Nomad job specs (.hcl files)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Current Architecture (transitioning)
|
||||||
|
|
||||||
|
**OLD**: GlusterFS on c1/c2/c3 at `/data/compute` (being phased out)
|
||||||
|
**NEW**: NFS from zippy at `/data/services` (current target)
|
||||||
|
|
||||||
|
### Storage Mounts
|
||||||
|
- `/data/services` - NFS from `data-services.service.consul` (zippy primary, c1 standby)
|
||||||
|
- `/data/media` - CIFS from fractal (existing, unchanged)
|
||||||
|
- `/data/shared` - CIFS from fractal (existing, unchanged)
|
||||||
|
|
||||||
|
### Hosts
|
||||||
|
- **c1, c2, c3**: Cattle nodes, run most workloads, Nomad/Consul quorum
|
||||||
|
- **zippy**: Primary NFS server, runs databases (affinity), replicates to c1 every 5min
|
||||||
|
- **fractal**: Storage node (Proxmox/ZFS), will join quorum after GlusterFS removed
|
||||||
|
- **sunny**: Standalone ethereum staking node
|
||||||
|
- **chilly**: Home Assistant VM
|
||||||
|
|
||||||
|
## Key Patterns
|
||||||
|
|
||||||
|
**NFS Server/Standby**:
|
||||||
|
- Primary (zippy): imports `nfs-services-server.nix`, sets `standbys = ["c1"]`
|
||||||
|
- Standby (c1): imports `nfs-services-standby.nix`, sets `replicationKeys = [...]`
|
||||||
|
- Replication: btrfs send/receive every 5min, incremental with fallback to full
|
||||||
|
|
||||||
|
**Backups**:
|
||||||
|
- Kopia client on all nodes → Kopia server on fractal
|
||||||
|
- Backs up `/persist` hourly via btrfs snapshot
|
||||||
|
- Excludes: `services@*` and `services-standby/services@*` (replication snapshots)
|
||||||
|
|
||||||
|
**Secrets**:
|
||||||
|
- SOPS for secrets, files in `secrets/`
|
||||||
|
- Keys managed per-host
|
||||||
|
|
||||||
|
## Migration Status
|
||||||
|
|
||||||
|
**Phase**: 2 complete, ready for Phase 3
|
||||||
|
**Current**: Migrating GlusterFS → NFS
|
||||||
|
**Next**: Copy data, update Nomad jobs, remove GlusterFS
|
||||||
|
**Later**: Convert fractal to NixOS (deferred)
|
||||||
|
|
||||||
|
See `docs/MIGRATION_TODO.md` for detailed checklist.
|
||||||
|
|
||||||
|
## Common Tasks
|
||||||
|
|
||||||
|
**Deploy a host**: `deploy -s '.#hostname'`
|
||||||
|
**Deploy all**: `deploy`
|
||||||
|
**Check replication**: `ssh zippy journalctl -u replicate-services-to-c1.service -f`
|
||||||
|
**NFS failover**: See `docs/NFS_FAILOVER.md`
|
||||||
|
**Nomad jobs**: `services/*.hcl` - update paths: `/data/compute/appdata/foo` → `/data/services/foo` (NOT `/data/services/appdata/foo`!)
|
||||||
|
|
||||||
|
## Troubleshooting Hints
|
||||||
|
|
||||||
|
- Replication errors with "empty stream": SSH key restricted to `btrfs receive`, can't run other commands
|
||||||
|
- NFS split-brain protection: nfs-server checks Consul before starting
|
||||||
|
- Btrfs snapshots: nested snapshots appear as empty dirs in parent snapshots
|
||||||
|
- Kopia: uses temporary snapshot for consistency, doesn't back up nested subvolumes
|
||||||
|
|
||||||
|
## Important Files
|
||||||
|
|
||||||
|
- `common/global/backup.nix` - Kopia backup configuration
|
||||||
|
- `hosts/zippy/default.nix` - NFS server config, replication targets
|
||||||
|
- `hosts/c1/default.nix` - NFS standby config, authorized replication keys
|
||||||
|
- `flake.nix` - Host definitions, nixpkgs inputs
|
||||||
|
|
||||||
|
---
|
||||||
|
*Auto-generated reference for Claude Code. Keep concise. Update when architecture changes.*
|
||||||
@@ -1,13 +1,14 @@
|
|||||||
{ pkgs, ... }:
|
{ pkgs, ... }:
|
||||||
{
|
{
|
||||||
# Cluster node configuration
|
# Cluster node configuration
|
||||||
# Extends minimal-node with cluster-specific services (Consul, GlusterFS, CIFS)
|
# Extends minimal-node with cluster-specific services (Consul, GlusterFS, CIFS, NFS)
|
||||||
# Used by: compute nodes (c1, c2, c3)
|
# Used by: compute nodes (c1, c2, c3)
|
||||||
imports = [
|
imports = [
|
||||||
./minimal-node.nix
|
./minimal-node.nix
|
||||||
./unattended-encryption.nix
|
./unattended-encryption.nix
|
||||||
./cifs-client.nix
|
./cifs-client.nix
|
||||||
./consul.nix
|
./consul.nix
|
||||||
./glusterfs-client.nix
|
./glusterfs-client.nix # Keep during migration, will be removed in Phase 3
|
||||||
|
./nfs-services-client.nix # New: NFS client for /data/services
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -21,7 +21,11 @@ let
|
|||||||
${btrfs} subvolume snapshot -r "$target_path" "$snapshot_path"
|
${btrfs} subvolume snapshot -r "$target_path" "$snapshot_path"
|
||||||
|
|
||||||
# --no-send-snapshot-path due to https://github.com/kopia/kopia/issues/4402
|
# --no-send-snapshot-path due to https://github.com/kopia/kopia/issues/4402
|
||||||
${kopia} snapshot create --no-send-snapshot-report --override-source "$target_path" -- "$snapshot_path"
|
# Exclude btrfs replication snapshots (they appear as empty dirs in the snapshot anyway)
|
||||||
|
${kopia} snapshot create --no-send-snapshot-report --override-source "$target_path" \
|
||||||
|
--ignore "services@*" \
|
||||||
|
--ignore "services-standby/services@*" \
|
||||||
|
-- "$snapshot_path"
|
||||||
|
|
||||||
${btrfs} subvolume delete "$snapshot_path"
|
${btrfs} subvolume delete "$snapshot_path"
|
||||||
${kopia} repository disconnect
|
${kopia} repository disconnect
|
||||||
|
|||||||
21
common/nfs-services-client.nix
Normal file
21
common/nfs-services-client.nix
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
{ pkgs, ... }:
|
||||||
|
{
|
||||||
|
# NFS client for /data/services
|
||||||
|
# Mounts from data-services.service.consul (Consul DNS for automatic failover)
|
||||||
|
# The NFS server registers itself in Consul, so this will automatically
|
||||||
|
# point to whichever host is currently running the NFS server
|
||||||
|
|
||||||
|
fileSystems."/data/services" = {
|
||||||
|
device = "data-services.service.consul:/persist/services";
|
||||||
|
fsType = "nfs";
|
||||||
|
options = [
|
||||||
|
"x-systemd.automount" # Auto-mount on access
|
||||||
|
"noauto" # Don't mount at boot (automount handles it)
|
||||||
|
"x-systemd.idle-timeout=60" # Unmount after 60s of inactivity
|
||||||
|
"_netdev" # Network filesystem (wait for network)
|
||||||
|
];
|
||||||
|
};
|
||||||
|
|
||||||
|
# Ensure NFS client packages are available
|
||||||
|
environment.systemPackages = [ pkgs.nfs-utils ];
|
||||||
|
}
|
||||||
176
common/nfs-services-server.nix
Normal file
176
common/nfs-services-server.nix
Normal file
@@ -0,0 +1,176 @@
|
|||||||
|
{ config, lib, pkgs, ... }:
|
||||||
|
|
||||||
|
let
|
||||||
|
cfg = config.nfsServicesServer;
|
||||||
|
in
|
||||||
|
{
|
||||||
|
options.nfsServicesServer = {
|
||||||
|
enable = lib.mkEnableOption "NFS services server" // { default = true; };
|
||||||
|
|
||||||
|
standbys = lib.mkOption {
|
||||||
|
type = lib.types.listOf lib.types.str;
|
||||||
|
default = [];
|
||||||
|
description = ''
|
||||||
|
List of standby hostnames to replicate to (e.g. ["c1"]).
|
||||||
|
|
||||||
|
Requires one-time setup on the NFS server:
|
||||||
|
sudo mkdir -p /persist/root/.ssh
|
||||||
|
sudo ssh-keygen -t ed25519 -f /persist/root/.ssh/btrfs-replication -N "" -C "root@$(hostname)-replication"
|
||||||
|
|
||||||
|
Then add the public key to each standby's nfsServicesStandby.replicationKeys option.
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
config = lib.mkIf cfg.enable {
|
||||||
|
# Persist root SSH directory for replication key
|
||||||
|
environment.persistence."/persist" = {
|
||||||
|
directories = [
|
||||||
|
"/root/.ssh"
|
||||||
|
];
|
||||||
|
};
|
||||||
|
|
||||||
|
# Bind mount /persist/services to /data/services for local access
|
||||||
|
# This makes the path consistent with NFS clients
|
||||||
|
# Use mkForce to override the NFS client mount from cluster-node.nix
|
||||||
|
fileSystems."/data/services" = lib.mkForce {
|
||||||
|
device = "/persist/services";
|
||||||
|
fsType = "none";
|
||||||
|
options = [ "bind" ];
|
||||||
|
};
|
||||||
|
|
||||||
|
# Nomad node metadata: mark this as the primary storage node
|
||||||
|
# Jobs can constrain to ${meta.storage_role} = "primary"
|
||||||
|
services.nomad.settings.client.meta = {
|
||||||
|
storage_role = "primary";
|
||||||
|
};
|
||||||
|
|
||||||
|
# NFS server configuration
|
||||||
|
services.nfs.server = {
|
||||||
|
enable = true;
|
||||||
|
exports = ''
|
||||||
|
/persist/services 192.168.1.0/24(rw,sync,no_subtree_check,no_root_squash)
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
|
||||||
|
# Consul service registration for NFS
|
||||||
|
services.consul.extraConfig.services = [{
|
||||||
|
name = "data-services";
|
||||||
|
port = 2049;
|
||||||
|
checks = [{
|
||||||
|
tcp = "localhost:2049";
|
||||||
|
interval = "30s";
|
||||||
|
}];
|
||||||
|
}];
|
||||||
|
|
||||||
|
# Firewall for NFS
|
||||||
|
networking.firewall.allowedTCPPorts = [ 2049 111 20048 ];
|
||||||
|
networking.firewall.allowedUDPPorts = [ 2049 111 20048 ];
|
||||||
|
|
||||||
|
# systemd services: NFS server split-brain check + replication services
|
||||||
|
systemd.services = lib.mkMerge ([
|
||||||
|
# Safety check: prevent split-brain by ensuring no other NFS server is active
|
||||||
|
{
|
||||||
|
nfs-server = {
|
||||||
|
preStart = ''
|
||||||
|
# Wait for Consul to be available
|
||||||
|
for i in {1..30}; do
|
||||||
|
if ${pkgs.netcat}/bin/nc -z localhost 8600; then
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
echo "Waiting for Consul DNS... ($i/30)"
|
||||||
|
sleep 1
|
||||||
|
done
|
||||||
|
|
||||||
|
# Check if another NFS server is already registered in Consul
|
||||||
|
CURRENT_SERVER=$(${pkgs.dnsutils}/bin/dig +short @localhost -p 8600 data-services.service.consul | head -1 || true)
|
||||||
|
MY_IP=$(${pkgs.iproute2}/bin/ip -4 addr show | ${pkgs.gnugrep}/bin/grep -oP '(?<=inet\s)\d+(\.\d+){3}' | ${pkgs.gnugrep}/bin/grep -v '^127\.' | head -1)
|
||||||
|
|
||||||
|
if [ -n "$CURRENT_SERVER" ] && [ "$CURRENT_SERVER" != "$MY_IP" ]; then
|
||||||
|
echo "ERROR: Another NFS server is already active at $CURRENT_SERVER"
|
||||||
|
echo "This host ($MY_IP) is configured as NFS server but should be standby."
|
||||||
|
echo "To fix:"
|
||||||
|
echo " 1. If this is intentional (failback), first demote the other server"
|
||||||
|
echo " 2. Update this host's config to use nfs-services-standby.nix instead"
|
||||||
|
echo " 3. Sync data from active server before promoting this host"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "NFS server startup check passed (no other active server found)"
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
}
|
||||||
|
] ++ (lib.forEach cfg.standbys (standby: {
|
||||||
|
"replicate-services-to-${standby}" = {
|
||||||
|
description = "Replicate /persist/services to ${standby}";
|
||||||
|
path = [ pkgs.btrfs-progs pkgs.openssh pkgs.coreutils pkgs.findutils pkgs.gnugrep ];
|
||||||
|
|
||||||
|
script = ''
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SSH_KEY="/persist/root/.ssh/btrfs-replication"
|
||||||
|
if [ ! -f "$SSH_KEY" ]; then
|
||||||
|
echo "ERROR: SSH key not found at $SSH_KEY"
|
||||||
|
echo "Run: sudo ssh-keygen -t ed25519 -f $SSH_KEY -N \"\" -C \"root@$(hostname)-replication\""
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
SNAPSHOT_NAME="services@$(date +%Y%m%d-%H%M%S)"
|
||||||
|
SNAPSHOT_PATH="/persist/$SNAPSHOT_NAME"
|
||||||
|
|
||||||
|
# Create readonly snapshot
|
||||||
|
btrfs subvolume snapshot -r /persist/services "$SNAPSHOT_PATH"
|
||||||
|
|
||||||
|
# Find previous snapshot on sender (sort by name since readonly snapshots have same mtime)
|
||||||
|
# Use -d to list directories only, not their contents
|
||||||
|
PREV_LOCAL=$(ls -1d /persist/services@* 2>/dev/null | grep -v "^$SNAPSHOT_PATH$" | sort -r | head -1 || true)
|
||||||
|
|
||||||
|
# Try incremental send if we have a parent, fall back to full send if it fails
|
||||||
|
if [ -n "$PREV_LOCAL" ]; then
|
||||||
|
echo "Attempting incremental send from $(basename $PREV_LOCAL) to ${standby}"
|
||||||
|
|
||||||
|
# Try incremental send, if it fails (e.g., parent missing on receiver), fall back to full
|
||||||
|
if btrfs send -p "$PREV_LOCAL" "$SNAPSHOT_PATH" | \
|
||||||
|
ssh -i "$SSH_KEY" -o StrictHostKeyChecking=accept-new root@${standby} \
|
||||||
|
"btrfs receive /persist/services-standby"; then
|
||||||
|
echo "Incremental send completed successfully"
|
||||||
|
else
|
||||||
|
echo "Incremental send failed (likely missing parent on receiver), falling back to full send"
|
||||||
|
btrfs send "$SNAPSHOT_PATH" | \
|
||||||
|
ssh -i "$SSH_KEY" -o StrictHostKeyChecking=accept-new root@${standby} \
|
||||||
|
"btrfs receive /persist/services-standby"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
# First snapshot, do full send
|
||||||
|
echo "Full send to ${standby} (first snapshot)"
|
||||||
|
btrfs send "$SNAPSHOT_PATH" | \
|
||||||
|
ssh -i "$SSH_KEY" -o StrictHostKeyChecking=accept-new root@${standby} \
|
||||||
|
"btrfs receive /persist/services-standby"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Cleanup old snapshots on sender (keep last 24 hours = 288 snapshots at 5min intervals)
|
||||||
|
find /persist -maxdepth 1 -name 'services@*' -mmin +1440 -exec btrfs subvolume delete {} \;
|
||||||
|
'';
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
User = "root";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}))
|
||||||
|
);
|
||||||
|
|
||||||
|
systemd.timers = lib.mkMerge (
|
||||||
|
lib.forEach cfg.standbys (standby: {
|
||||||
|
"replicate-services-to-${standby}" = {
|
||||||
|
description = "Timer for replicating /persist/services to ${standby}";
|
||||||
|
wantedBy = [ "timers.target" ];
|
||||||
|
timerConfig = {
|
||||||
|
OnCalendar = "*:0/5"; # Every 5 minutes
|
||||||
|
Persistent = true;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
})
|
||||||
|
);
|
||||||
|
};
|
||||||
|
}
|
||||||
68
common/nfs-services-standby.nix
Normal file
68
common/nfs-services-standby.nix
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
{ config, lib, pkgs, ... }:
|
||||||
|
|
||||||
|
let
|
||||||
|
cfg = config.nfsServicesStandby;
|
||||||
|
in
|
||||||
|
{
|
||||||
|
options.nfsServicesStandby = {
|
||||||
|
enable = lib.mkEnableOption "NFS services standby" // { default = true; };
|
||||||
|
|
||||||
|
replicationKeys = lib.mkOption {
|
||||||
|
type = lib.types.listOf lib.types.str;
|
||||||
|
default = [];
|
||||||
|
description = ''
|
||||||
|
SSH public keys authorized to replicate btrfs snapshots to this standby.
|
||||||
|
These keys are restricted to only run 'btrfs receive /persist/services-standby'.
|
||||||
|
|
||||||
|
Get the public key from the NFS server:
|
||||||
|
ssh <nfs-server> sudo cat /persist/root/.ssh/btrfs-replication.pub
|
||||||
|
'';
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
config = lib.mkIf cfg.enable {
|
||||||
|
# Allow root SSH login for replication (restricted by command= in authorized_keys)
|
||||||
|
# This is configured in common/sshd.nix
|
||||||
|
|
||||||
|
# Restricted SSH keys for btrfs replication
|
||||||
|
users.users.root.openssh.authorizedKeys.keys =
|
||||||
|
map (key: ''command="btrfs receive /persist/services-standby",restrict ${key}'') cfg.replicationKeys;
|
||||||
|
|
||||||
|
# Mount point for services-standby subvolume
|
||||||
|
# This is just declarative documentation - the subvolume must be created manually once:
|
||||||
|
# sudo btrfs subvolume create /persist/services-standby
|
||||||
|
# After that, it will persist across reboots (it's under /persist)
|
||||||
|
fileSystems."/persist/services-standby" = {
|
||||||
|
device = "/persist/services-standby";
|
||||||
|
fsType = "none";
|
||||||
|
options = [ "bind" ];
|
||||||
|
noCheck = true;
|
||||||
|
};
|
||||||
|
|
||||||
|
# Cleanup old snapshots on standby (keep last 48 hours for safety)
|
||||||
|
systemd.services.cleanup-services-standby-snapshots = {
|
||||||
|
description = "Cleanup old btrfs snapshots in services-standby";
|
||||||
|
path = [ pkgs.btrfs-progs pkgs.findutils ];
|
||||||
|
|
||||||
|
script = ''
|
||||||
|
set -euo pipefail
|
||||||
|
# Keep last 48 hours of snapshots (576 snapshots at 5min intervals)
|
||||||
|
find /persist/services-standby -maxdepth 1 -name 'services@*' -mmin +2880 -exec btrfs subvolume delete {} \; || true
|
||||||
|
'';
|
||||||
|
|
||||||
|
serviceConfig = {
|
||||||
|
Type = "oneshot";
|
||||||
|
User = "root";
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
systemd.timers.cleanup-services-standby-snapshots = {
|
||||||
|
description = "Timer for cleaning up old snapshots on standby";
|
||||||
|
wantedBy = [ "timers.target" ];
|
||||||
|
timerConfig = {
|
||||||
|
OnCalendar = "daily";
|
||||||
|
Persistent = true;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
@@ -5,6 +5,7 @@
|
|||||||
settings = {
|
settings = {
|
||||||
PasswordAuthentication = false;
|
PasswordAuthentication = false;
|
||||||
KbdInteractiveAuthentication = false;
|
KbdInteractiveAuthentication = false;
|
||||||
|
PermitRootLogin = "prohibit-password"; # Allow root login with SSH keys only
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -146,7 +146,7 @@ fileSystems."/data/services" = {
|
|||||||
## Migration Steps
|
## Migration Steps
|
||||||
|
|
||||||
**Important path simplification note:**
|
**Important path simplification note:**
|
||||||
- All service paths use `/data/services/*` directly (not `/data/services/appdata/*`)
|
- All service paths use `/data/services/*` directly (not `/data/services/*`)
|
||||||
- Example: `/data/compute/appdata/mysql` → `/data/services/mysql`
|
- Example: `/data/compute/appdata/mysql` → `/data/services/mysql`
|
||||||
- Simpler, cleaner, easier to manage
|
- Simpler, cleaner, easier to manage
|
||||||
|
|
||||||
@@ -1024,9 +1024,9 @@ EOF
|
|||||||
- **Priority**: CRITICAL
|
- **Priority**: CRITICAL
|
||||||
- **Current**: Uses `/data/compute/appdata/mysql`
|
- **Current**: Uses `/data/compute/appdata/mysql`
|
||||||
- **Target**: Affinity for zippy, allow c1/c2
|
- **Target**: Affinity for zippy, allow c1/c2
|
||||||
- **Data**: `/data/services/appdata/mysql` (NFS from zippy)
|
- **Data**: `/data/services/mysql` (NFS from zippy)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/mysql` → `/data/services/appdata/mysql`
|
- ✏️ Volume path: `/data/compute/appdata/mysql` → `/data/services/mysql`
|
||||||
- ✏️ Add affinity:
|
- ✏️ Add affinity:
|
||||||
```hcl
|
```hcl
|
||||||
affinity {
|
affinity {
|
||||||
@@ -1050,9 +1050,9 @@ EOF
|
|||||||
- **Priority**: CRITICAL
|
- **Priority**: CRITICAL
|
||||||
- **Current**: Uses `/data/compute/appdata/postgres`, `/data/compute/appdata/pgadmin`
|
- **Current**: Uses `/data/compute/appdata/postgres`, `/data/compute/appdata/pgadmin`
|
||||||
- **Target**: Affinity for zippy, allow c1/c2
|
- **Target**: Affinity for zippy, allow c1/c2
|
||||||
- **Data**: `/data/services/appdata/postgres`, `/data/services/appdata/pgadmin` (NFS)
|
- **Data**: `/data/services/postgres`, `/data/services/pgadmin` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/appdata/*`
|
- ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/*`
|
||||||
- ✏️ Add affinity and constraint (same as mysql)
|
- ✏️ Add affinity and constraint (same as mysql)
|
||||||
- **Notes**: Core database for authentik, gitea, plausible, netbox, etc.
|
- **Notes**: Core database for authentik, gitea, plausible, netbox, etc.
|
||||||
|
|
||||||
@@ -1061,9 +1061,9 @@ EOF
|
|||||||
- **Priority**: CRITICAL
|
- **Priority**: CRITICAL
|
||||||
- **Current**: Uses `/data/compute/appdata/redis`
|
- **Current**: Uses `/data/compute/appdata/redis`
|
||||||
- **Target**: Affinity for zippy, allow c1/c2
|
- **Target**: Affinity for zippy, allow c1/c2
|
||||||
- **Data**: `/data/services/appdata/redis` (NFS)
|
- **Data**: `/data/services/redis` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/redis` → `/data/services/appdata/redis`
|
- ✏️ Volume path: `/data/compute/appdata/redis` → `/data/services/redis`
|
||||||
- ✏️ Add affinity and constraint (same as mysql)
|
- ✏️ Add affinity and constraint (same as mysql)
|
||||||
- **Notes**: Used by authentik, wordpress. Should co-locate with databases.
|
- **Notes**: Used by authentik, wordpress. Should co-locate with databases.
|
||||||
|
|
||||||
@@ -1093,9 +1093,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/prometheus`
|
- **Current**: Uses `/data/compute/appdata/prometheus`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/prometheus` (NFS)
|
- **Data**: `/data/services/prometheus` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/prometheus` → `/data/services/appdata/prometheus`
|
- ✏️ Volume path: `/data/compute/appdata/prometheus` → `/data/services/prometheus`
|
||||||
- **Notes**: Metrics database. Important for monitoring but not critical for services.
|
- **Notes**: Metrics database. Important for monitoring but not critical for services.
|
||||||
|
|
||||||
#### grafana
|
#### grafana
|
||||||
@@ -1103,9 +1103,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/grafana`
|
- **Current**: Uses `/data/compute/appdata/grafana`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/grafana` (NFS)
|
- **Data**: `/data/services/grafana` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/grafana` → `/data/services/appdata/grafana`
|
- ✏️ Volume path: `/data/compute/appdata/grafana` → `/data/services/grafana`
|
||||||
- **Notes**: Monitoring UI. Depends on prometheus.
|
- **Notes**: Monitoring UI. Depends on prometheus.
|
||||||
|
|
||||||
#### loki
|
#### loki
|
||||||
@@ -1113,9 +1113,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/loki`
|
- **Current**: Uses `/data/compute/appdata/loki`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/loki` (NFS)
|
- **Data**: `/data/services/loki` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/loki` → `/data/services/appdata/loki`
|
- ✏️ Volume path: `/data/compute/appdata/loki` → `/data/services/loki`
|
||||||
- **Notes**: Log aggregation. Important for debugging.
|
- **Notes**: Log aggregation. Important for debugging.
|
||||||
|
|
||||||
#### vector
|
#### vector
|
||||||
@@ -1136,9 +1136,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/clickhouse`
|
- **Current**: Uses `/data/compute/appdata/clickhouse`
|
||||||
- **Target**: Affinity for zippy (large dataset), allow c1/c2/c3
|
- **Target**: Affinity for zippy (large dataset), allow c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/clickhouse` (NFS)
|
- **Data**: `/data/services/clickhouse` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/clickhouse` → `/data/services/appdata/clickhouse`
|
- ✏️ Volume path: `/data/compute/appdata/clickhouse` → `/data/services/clickhouse`
|
||||||
- ✏️ Add affinity for zippy (optional, but helps with performance)
|
- ✏️ Add affinity for zippy (optional, but helps with performance)
|
||||||
- **Notes**: Used by plausible. Large time-series data. Important but can be recreated.
|
- **Notes**: Used by plausible. Large time-series data. Important but can be recreated.
|
||||||
|
|
||||||
@@ -1147,7 +1147,7 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/unifi/mongodb`
|
- **Current**: Uses `/data/compute/appdata/unifi/mongodb`
|
||||||
- **Target**: Float on c1/c2/c3 (with unifi)
|
- **Target**: Float on c1/c2/c3 (with unifi)
|
||||||
- **Data**: `/data/services/appdata/unifi/mongodb` (NFS)
|
- **Data**: `/data/services/unifi/mongodb` (NFS)
|
||||||
- **Changes**: See unifi below
|
- **Changes**: See unifi below
|
||||||
- **Notes**: Only used by unifi. Should stay with unifi controller.
|
- **Notes**: Only used by unifi. Should stay with unifi controller.
|
||||||
|
|
||||||
@@ -1158,9 +1158,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/sync/wordpress` (syncthing-managed to avoid slow GlusterFS)
|
- **Current**: Uses `/data/sync/wordpress` (syncthing-managed to avoid slow GlusterFS)
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/wordpress` (NFS from zippy)
|
- **Data**: `/data/services/wordpress` (NFS from zippy)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/sync/wordpress` → `/data/services/appdata/wordpress`
|
- ✏️ Volume path: `/data/sync/wordpress` → `/data/services/wordpress`
|
||||||
- 📋 **Before cutover**: Copy data from syncthing to zippy: `rsync -av /data/sync/wordpress/ zippy:/persist/services/appdata/wordpress/`
|
- 📋 **Before cutover**: Copy data from syncthing to zippy: `rsync -av /data/sync/wordpress/ zippy:/persist/services/appdata/wordpress/`
|
||||||
- 📋 **After migration**: Remove syncthing configuration for wordpress sync
|
- 📋 **After migration**: Remove syncthing configuration for wordpress sync
|
||||||
- **Notes**: Production website. Important but can tolerate brief downtime during migration.
|
- **Notes**: Production website. Important but can tolerate brief downtime during migration.
|
||||||
@@ -1170,9 +1170,9 @@ EOF
|
|||||||
- **Priority**: no longer used, should wipe
|
- **Priority**: no longer used, should wipe
|
||||||
- **Current**: Uses `/data/compute/appdata/ghost`
|
- **Current**: Uses `/data/compute/appdata/ghost`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/ghost` (NFS)
|
- **Data**: `/data/services/ghost` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/ghost` → `/data/services/appdata/ghost`
|
- ✏️ Volume path: `/data/compute/appdata/ghost` → `/data/services/ghost`
|
||||||
- **Notes**: Blog platform (alo.land). Can tolerate downtime.
|
- **Notes**: Blog platform (alo.land). Can tolerate downtime.
|
||||||
|
|
||||||
#### gitea
|
#### gitea
|
||||||
@@ -1180,9 +1180,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/gitea/data`, `/data/compute/appdata/gitea/config`
|
- **Current**: Uses `/data/compute/appdata/gitea/data`, `/data/compute/appdata/gitea/config`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/gitea/*` (NFS)
|
- **Data**: `/data/services/gitea/*` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: `/data/compute/appdata/gitea/*` → `/data/services/appdata/gitea/*`
|
- ✏️ Volume paths: `/data/compute/appdata/gitea/*` → `/data/services/gitea/*`
|
||||||
- **Notes**: Git server. Contains code repositories. Important.
|
- **Notes**: Git server. Contains code repositories. Important.
|
||||||
|
|
||||||
#### wiki (tiddlywiki)
|
#### wiki (tiddlywiki)
|
||||||
@@ -1190,7 +1190,7 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/wiki` via host volume mount
|
- **Current**: Uses `/data/compute/appdata/wiki` via host volume mount
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/wiki` (NFS)
|
- **Data**: `/data/services/wiki` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume mount path in `volume_mount` blocks
|
- ✏️ Volume mount path in `volume_mount` blocks
|
||||||
- ⚠️ Uses `exec` driver with host volumes - verify NFS mount works with this
|
- ⚠️ Uses `exec` driver with host volumes - verify NFS mount works with this
|
||||||
@@ -1201,9 +1201,9 @@ EOF
|
|||||||
- **Priority**: LOW
|
- **Priority**: LOW
|
||||||
- **Current**: Uses `/data/compute/appdata/code`
|
- **Current**: Uses `/data/compute/appdata/code`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/code` (NFS)
|
- **Data**: `/data/services/code` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/code` → `/data/services/appdata/code`
|
- ✏️ Volume path: `/data/compute/appdata/code` → `/data/services/code`
|
||||||
- **Notes**: Web IDE. Low priority, for development only.
|
- **Notes**: Web IDE. Low priority, for development only.
|
||||||
|
|
||||||
#### beancount (fava)
|
#### beancount (fava)
|
||||||
@@ -1211,9 +1211,9 @@ EOF
|
|||||||
- **Priority**: MEDIUM
|
- **Priority**: MEDIUM
|
||||||
- **Current**: Uses `/data/compute/appdata/beancount`
|
- **Current**: Uses `/data/compute/appdata/beancount`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/beancount` (NFS)
|
- **Data**: `/data/services/beancount` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume path: `/data/compute/appdata/beancount` → `/data/services/appdata/beancount`
|
- ✏️ Volume path: `/data/compute/appdata/beancount` → `/data/services/beancount`
|
||||||
- **Notes**: Finance tracking. Low priority.
|
- **Notes**: Finance tracking. Low priority.
|
||||||
|
|
||||||
#### adminer
|
#### adminer
|
||||||
@@ -1239,9 +1239,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/evcc/evcc.yaml`, `/data/compute/appdata/evcc/evcc`
|
- **Current**: Uses `/data/compute/appdata/evcc/evcc.yaml`, `/data/compute/appdata/evcc/evcc`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/evcc/*` (NFS)
|
- **Data**: `/data/services/evcc/*` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: `/data/compute/appdata/evcc/*` → `/data/services/appdata/evcc/*`
|
- ✏️ Volume paths: `/data/compute/appdata/evcc/*` → `/data/services/evcc/*`
|
||||||
- **Notes**: EV charging controller. Important for daily use.
|
- **Notes**: EV charging controller. Important for daily use.
|
||||||
|
|
||||||
#### vikunja
|
#### vikunja
|
||||||
@@ -1249,9 +1249,9 @@ EOF
|
|||||||
- **Priority**: no longer used, should delete
|
- **Priority**: no longer used, should delete
|
||||||
- **Current**: Likely uses `/data/compute/appdata/vikunja`
|
- **Current**: Likely uses `/data/compute/appdata/vikunja`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/vikunja` (NFS)
|
- **Data**: `/data/services/vikunja` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Update to `/data/services/appdata/vikunja`
|
- ✏️ Volume paths: Update to `/data/services/vikunja`
|
||||||
- **Notes**: Task management. Low priority.
|
- **Notes**: Task management. Low priority.
|
||||||
|
|
||||||
#### leantime
|
#### leantime
|
||||||
@@ -1259,9 +1259,9 @@ EOF
|
|||||||
- **Priority**: no longer used, should delete
|
- **Priority**: no longer used, should delete
|
||||||
- **Current**: Likely uses `/data/compute/appdata/leantime`
|
- **Current**: Likely uses `/data/compute/appdata/leantime`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/leantime` (NFS)
|
- **Data**: `/data/services/leantime` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Update to `/data/services/appdata/leantime`
|
- ✏️ Volume paths: Update to `/data/services/leantime`
|
||||||
- **Notes**: Project management. Low priority.
|
- **Notes**: Project management. Low priority.
|
||||||
|
|
||||||
### Network Infrastructure
|
### Network Infrastructure
|
||||||
@@ -1271,9 +1271,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Uses `/data/compute/appdata/unifi/data`, `/data/compute/appdata/unifi/mongodb`
|
- **Current**: Uses `/data/compute/appdata/unifi/data`, `/data/compute/appdata/unifi/mongodb`
|
||||||
- **Target**: Float on c1/c2/c3/fractal/zippy
|
- **Target**: Float on c1/c2/c3/fractal/zippy
|
||||||
- **Data**: `/data/services/appdata/unifi/*` (NFS)
|
- **Data**: `/data/services/unifi/*` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: `/data/compute/appdata/unifi/*` → `/data/services/appdata/unifi/*`
|
- ✏️ Volume paths: `/data/compute/appdata/unifi/*` → `/data/services/unifi/*`
|
||||||
- **Notes**: UniFi network controller. Critical for network management. Has keepalived VIP for stable inform address. Floating is fine.
|
- **Notes**: UniFi network controller. Critical for network management. Has keepalived VIP for stable inform address. Floating is fine.
|
||||||
|
|
||||||
### Media Stack
|
### Media Stack
|
||||||
@@ -1284,10 +1284,10 @@ EOF
|
|||||||
- **Current**: Uses `/data/compute/appdata/radarr`, `/data/compute/appdata/sonarr`, etc. and `/data/media`
|
- **Current**: Uses `/data/compute/appdata/radarr`, `/data/compute/appdata/sonarr`, etc. and `/data/media`
|
||||||
- **Target**: **MUST run on fractal** (local /data/media access)
|
- **Target**: **MUST run on fractal** (local /data/media access)
|
||||||
- **Data**:
|
- **Data**:
|
||||||
- `/data/services/appdata/radarr` (NFS) - config data
|
- `/data/services/radarr` (NFS) - config data
|
||||||
- `/data/media` (local CIFS mount on fractal, local disk on fractal)
|
- `/data/media` (local CIFS mount on fractal, local disk on fractal)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/appdata/*`
|
- ✏️ Volume paths: `/data/compute/appdata/*` → `/data/services/*`
|
||||||
- ✏️ **Add constraint**:
|
- ✏️ **Add constraint**:
|
||||||
```hcl
|
```hcl
|
||||||
constraint {
|
constraint {
|
||||||
@@ -1304,9 +1304,9 @@ EOF
|
|||||||
- **Priority**: HIGH
|
- **Priority**: HIGH
|
||||||
- **Current**: Likely uses `/data/compute/appdata/weewx`
|
- **Current**: Likely uses `/data/compute/appdata/weewx`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/weewx` (NFS)
|
- **Data**: `/data/services/weewx` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Update to `/data/services/appdata/weewx`
|
- ✏️ Volume paths: Update to `/data/services/weewx`
|
||||||
- **Notes**: Weather station. Low priority.
|
- **Notes**: Weather station. Low priority.
|
||||||
|
|
||||||
#### maps
|
#### maps
|
||||||
@@ -1314,7 +1314,7 @@ EOF
|
|||||||
- **Priority**: MEDIUM
|
- **Priority**: MEDIUM
|
||||||
- **Current**: Likely uses `/data/compute/appdata/maps`
|
- **Current**: Likely uses `/data/compute/appdata/maps`
|
||||||
- **Target**: Float on c1/c2/c3 (or fractal if large tile data)
|
- **Target**: Float on c1/c2/c3 (or fractal if large tile data)
|
||||||
- **Data**: `/data/services/appdata/maps` (NFS) or `/data/media/maps` if large
|
- **Data**: `/data/services/maps` (NFS) or `/data/media/maps` if large
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Check data size, may want to move to /data/media
|
- ✏️ Volume paths: Check data size, may want to move to /data/media
|
||||||
- **Notes**: Map tiles. Low priority.
|
- **Notes**: Map tiles. Low priority.
|
||||||
@@ -1324,9 +1324,9 @@ EOF
|
|||||||
- **Priority**: LOW
|
- **Priority**: LOW
|
||||||
- **Current**: Likely uses `/data/compute/appdata/netbox`
|
- **Current**: Likely uses `/data/compute/appdata/netbox`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/netbox` (NFS)
|
- **Data**: `/data/services/netbox` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Update to `/data/services/appdata/netbox`
|
- ✏️ Volume paths: Update to `/data/services/netbox`
|
||||||
- **Notes**: IPAM/DCIM. Low priority, for documentation.
|
- **Notes**: IPAM/DCIM. Low priority, for documentation.
|
||||||
|
|
||||||
#### farmos
|
#### farmos
|
||||||
@@ -1334,9 +1334,9 @@ EOF
|
|||||||
- **Priority**: LOW
|
- **Priority**: LOW
|
||||||
- **Current**: Likely uses `/data/compute/appdata/farmos`
|
- **Current**: Likely uses `/data/compute/appdata/farmos`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/farmos` (NFS)
|
- **Data**: `/data/services/farmos` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Update to `/data/services/appdata/farmos`
|
- ✏️ Volume paths: Update to `/data/services/farmos`
|
||||||
- **Notes**: Farm management. Low priority.
|
- **Notes**: Farm management. Low priority.
|
||||||
|
|
||||||
#### urbit
|
#### urbit
|
||||||
@@ -1344,9 +1344,9 @@ EOF
|
|||||||
- **Priority**: LOW
|
- **Priority**: LOW
|
||||||
- **Current**: Likely uses `/data/compute/appdata/urbit`
|
- **Current**: Likely uses `/data/compute/appdata/urbit`
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/urbit` (NFS)
|
- **Data**: `/data/services/urbit` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Update to `/data/services/appdata/urbit`
|
- ✏️ Volume paths: Update to `/data/services/urbit`
|
||||||
- **Notes**: Urbit node. Experimental, low priority.
|
- **Notes**: Urbit node. Experimental, low priority.
|
||||||
|
|
||||||
#### webodm
|
#### webodm
|
||||||
@@ -1354,9 +1354,9 @@ EOF
|
|||||||
- **Priority**: LOW
|
- **Priority**: LOW
|
||||||
- **Current**: Likely uses `/data/compute/appdata/webodm`
|
- **Current**: Likely uses `/data/compute/appdata/webodm`
|
||||||
- **Target**: Float on c1/c2/c3 (or fractal if processing large imagery from /data/media)
|
- **Target**: Float on c1/c2/c3 (or fractal if processing large imagery from /data/media)
|
||||||
- **Data**: `/data/services/appdata/webodm` (NFS)
|
- **Data**: `/data/services/webodm` (NFS)
|
||||||
- **Changes**:
|
- **Changes**:
|
||||||
- ✏️ Volume paths: Update to `/data/services/appdata/webodm`
|
- ✏️ Volume paths: Update to `/data/services/webodm`
|
||||||
- 🤔 May benefit from running on fractal if it processes files from /data/media
|
- 🤔 May benefit from running on fractal if it processes files from /data/media
|
||||||
- **Notes**: Drone imagery processing. Low priority.
|
- **Notes**: Drone imagery processing. Low priority.
|
||||||
|
|
||||||
@@ -1411,7 +1411,7 @@ EOF
|
|||||||
- **Priority**: MEDIUM
|
- **Priority**: MEDIUM
|
||||||
- **Current**: Likely same as wiki.hcl
|
- **Current**: Likely same as wiki.hcl
|
||||||
- **Target**: Float on c1/c2/c3
|
- **Target**: Float on c1/c2/c3
|
||||||
- **Data**: `/data/services/appdata/tiddlywiki` (NFS)
|
- **Data**: `/data/services/tiddlywiki` (NFS)
|
||||||
- **Changes**: Same as wiki.hcl
|
- **Changes**: Same as wiki.hcl
|
||||||
- **Notes**: May be duplicate of wiki.hcl.
|
- **Notes**: May be duplicate of wiki.hcl.
|
||||||
|
|
||||||
@@ -1660,7 +1660,7 @@ nomad alloc status <alloc-id>
|
|||||||
|
|
||||||
1. ✅ **Where is `/data/sync/wordpress` mounted from?**
|
1. ✅ **Where is `/data/sync/wordpress` mounted from?**
|
||||||
- **Answer**: Syncthing-managed to avoid slow GlusterFS
|
- **Answer**: Syncthing-managed to avoid slow GlusterFS
|
||||||
- **Action**: Migrate to `/data/services/appdata/wordpress`, remove syncthing config
|
- **Action**: Migrate to `/data/services/wordpress`, remove syncthing config
|
||||||
|
|
||||||
2. ✅ **Which services use `/data/media` directly?**
|
2. ✅ **Which services use `/data/media` directly?**
|
||||||
- **Answer**: Only media.hcl (radarr, sonarr, plex, qbittorrent)
|
- **Answer**: Only media.hcl (radarr, sonarr, plex, qbittorrent)
|
||||||
|
|||||||
153
docs/MIGRATION_TODO.md
Normal file
153
docs/MIGRATION_TODO.md
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
# Cluster Revamp Migration TODO
|
||||||
|
|
||||||
|
Track migration progress from GlusterFS to NFS-based architecture.
|
||||||
|
See [CLUSTER_REVAMP.md](./CLUSTER_REVAMP.md) for detailed procedures.
|
||||||
|
|
||||||
|
## Phase 0: Preparation
|
||||||
|
- [x] Review cluster revamp plan
|
||||||
|
- [ ] Backup everything (kopia snapshots current)
|
||||||
|
- [ ] Document current state (nomad jobs, consul services)
|
||||||
|
|
||||||
|
## Phase 1: Convert fractal to NixOS (DEFERRED - do after GlusterFS migration)
|
||||||
|
- [ ] Document fractal's current ZFS layout
|
||||||
|
- [ ] Install NixOS on fractal
|
||||||
|
- [ ] Import ZFS pools (double1, double2, double3)
|
||||||
|
- [ ] Create fractal NixOS configuration
|
||||||
|
- [ ] Configure Samba server for media/shared/homes
|
||||||
|
- [ ] Configure Kopia backup server
|
||||||
|
- [ ] Deploy and verify fractal base config
|
||||||
|
- [ ] Join fractal to cluster (5-server quorum)
|
||||||
|
- [ ] Update all cluster configs for 5-server quorum
|
||||||
|
- [ ] Verify fractal fully operational
|
||||||
|
|
||||||
|
## Phase 2: Setup zippy storage layer
|
||||||
|
- [x] Create btrfs subvolume `/persist/services` on zippy
|
||||||
|
- [x] Configure NFS server on zippy (nfs-services-server.nix)
|
||||||
|
- [x] Configure Consul service registration for NFS
|
||||||
|
- [x] Setup btrfs replication to c1 (incremental, 5min intervals)
|
||||||
|
- [x] Fix replication script to handle SSH command restrictions
|
||||||
|
- [x] Setup standby storage on c1 (`/persist/services-standby`)
|
||||||
|
- [x] Configure c1 as standby (nfs-services-standby.nix)
|
||||||
|
- [x] Configure Kopia to exclude replication snapshots
|
||||||
|
- [x] Deploy and verify NFS server on zippy
|
||||||
|
- [x] Verify replication working to c1
|
||||||
|
- [ ] Setup standby storage on c2 (if desired)
|
||||||
|
- [ ] Configure replication to c2 (if desired)
|
||||||
|
|
||||||
|
## Phase 3: Migrate from GlusterFS to NFS
|
||||||
|
- [x] Update all nodes to mount NFS at `/data/services`
|
||||||
|
- [x] Deploy updated configs (NFS client on all nodes)
|
||||||
|
- [ ] Stop all Nomad jobs temporarily
|
||||||
|
- [ ] Copy data from GlusterFS to zippy NFS
|
||||||
|
- [ ] Copy `/data/compute/appdata/*` → `/persist/services/appdata/`
|
||||||
|
- [ ] Copy `/data/compute/config/*` → `/persist/services/config/`
|
||||||
|
- [ ] Copy `/data/sync/wordpress` → `/persist/services/appdata/wordpress`
|
||||||
|
- [ ] Verify data integrity
|
||||||
|
- [ ] Verify NFS mounts working on all nodes
|
||||||
|
- [ ] Stop GlusterFS volume
|
||||||
|
- [ ] Delete GlusterFS volume
|
||||||
|
- [ ] Remove GlusterFS from NixOS configs
|
||||||
|
- [ ] Remove syncthing wordpress sync configuration
|
||||||
|
|
||||||
|
## Phase 4: Update and redeploy Nomad jobs
|
||||||
|
|
||||||
|
### Core Infrastructure (CRITICAL)
|
||||||
|
- [x] mysql.hcl - moved to zippy, using `/data/services`
|
||||||
|
- [ ] postgres.hcl - update paths, add affinity for zippy
|
||||||
|
- [ ] redis.hcl - update paths, add affinity for zippy
|
||||||
|
- [ ] traefik.hcl - update paths (already floating)
|
||||||
|
- [ ] authentik.hcl - verify (stateless, no changes needed)
|
||||||
|
|
||||||
|
### Monitoring Stack (HIGH)
|
||||||
|
- [ ] prometheus.hcl - update paths
|
||||||
|
- [ ] grafana.hcl - update paths
|
||||||
|
- [ ] loki.hcl - update paths
|
||||||
|
- [ ] vector.hcl - remove glusterfs log collection
|
||||||
|
|
||||||
|
### Databases (HIGH)
|
||||||
|
- [ ] clickhouse.hcl - update paths, add affinity for zippy
|
||||||
|
- [ ] unifi.hcl - update paths (includes mongodb)
|
||||||
|
|
||||||
|
### Web Applications (HIGH-MEDIUM)
|
||||||
|
- [ ] wordpress.hcl - update from `/data/sync/wordpress` to `/data/services/wordpress`
|
||||||
|
- [ ] gitea.hcl - update paths
|
||||||
|
- [ ] wiki.hcl - update paths, verify with exec driver
|
||||||
|
- [ ] plausible.hcl - verify (stateless)
|
||||||
|
|
||||||
|
### Web Applications (LOW, may be deprecated)
|
||||||
|
- [ ] ghost.hcl - update paths or remove (no longer used?)
|
||||||
|
- [ ] vikunja.hcl - update paths or remove (no longer used?)
|
||||||
|
- [ ] leantime.hcl - update paths or remove (no longer used?)
|
||||||
|
|
||||||
|
### Network Infrastructure (HIGH)
|
||||||
|
- [ ] unifi.hcl - update paths (already listed above)
|
||||||
|
|
||||||
|
### Media Stack (MEDIUM)
|
||||||
|
- [ ] media.hcl - update paths, add constraint for fractal
|
||||||
|
- [ ] radarr, sonarr, bazarr, plex, qbittorrent
|
||||||
|
|
||||||
|
### Utility Services (MEDIUM-LOW)
|
||||||
|
- [ ] evcc.hcl - update paths
|
||||||
|
- [ ] weewx.hcl - update paths
|
||||||
|
- [ ] code-server.hcl - update paths
|
||||||
|
- [ ] beancount.hcl - update paths
|
||||||
|
- [ ] adminer.hcl - verify (stateless)
|
||||||
|
- [ ] maps.hcl - update paths
|
||||||
|
- [ ] netbox.hcl - update paths
|
||||||
|
- [ ] farmos.hcl - update paths
|
||||||
|
- [ ] urbit.hcl - update paths
|
||||||
|
- [ ] webodm.hcl - update paths
|
||||||
|
- [ ] velutrack.hcl - verify paths
|
||||||
|
- [ ] resol-gateway.hcl - verify paths
|
||||||
|
- [ ] igsync.hcl - update paths
|
||||||
|
- [ ] jupyter.hcl - verify paths
|
||||||
|
- [ ] whoami.hcl - verify (stateless test service)
|
||||||
|
- [ ] tiddlywiki.hcl - update paths (if separate from wiki.hcl)
|
||||||
|
|
||||||
|
### Backup Jobs (HIGH)
|
||||||
|
- [x] mysql-backup - moved to zippy, verified
|
||||||
|
- [ ] postgres-backup.hcl - verify destination
|
||||||
|
- [ ] wordpress-backup.hcl - verify destination
|
||||||
|
|
||||||
|
### Verification
|
||||||
|
- [ ] All services healthy in Nomad
|
||||||
|
- [ ] All services registered in Consul
|
||||||
|
- [ ] Traefik routes working
|
||||||
|
- [ ] Database jobs running on zippy (verify via nomad alloc status)
|
||||||
|
- [ ] Media jobs running on fractal (verify via nomad alloc status)
|
||||||
|
|
||||||
|
## Phase 5: Convert sunny to NixOS (OPTIONAL - can defer)
|
||||||
|
- [ ] Document current sunny setup (ethereum containers/VMs)
|
||||||
|
- [ ] Backup ethereum data
|
||||||
|
- [ ] Install NixOS on sunny
|
||||||
|
- [ ] Restore ethereum data to `/persist/ethereum`
|
||||||
|
- [ ] Create sunny container-based config (besu, lighthouse, rocketpool)
|
||||||
|
- [ ] Deploy and verify ethereum stack
|
||||||
|
- [ ] Monitor sync status and validation
|
||||||
|
|
||||||
|
## Phase 6: Verification and cleanup
|
||||||
|
- [ ] Test NFS failover procedure (zippy → c1)
|
||||||
|
- [ ] Verify backups include `/persist/services` data
|
||||||
|
- [ ] Verify backups exclude replication snapshots
|
||||||
|
- [ ] Update documentation (README.md, architecture diagrams)
|
||||||
|
- [ ] Clean up old GlusterFS data (only after everything verified!)
|
||||||
|
- [ ] Remove old glusterfs directories from all nodes
|
||||||
|
|
||||||
|
## Post-Migration Checklist
|
||||||
|
- [ ] All 5 servers in quorum (consul members)
|
||||||
|
- [ ] NFS mounts working on all nodes
|
||||||
|
- [ ] Btrfs replication running (check systemd timers on zippy)
|
||||||
|
- [ ] Critical services up (mysql, postgres, redis, traefik, authentik)
|
||||||
|
- [ ] Monitoring working (prometheus, grafana, loki)
|
||||||
|
- [ ] Media stack on fractal
|
||||||
|
- [ ] Database jobs on zippy
|
||||||
|
- [ ] Consul DNS working (dig @localhost -p 8600 data-services.service.consul)
|
||||||
|
- [ ] Backups running (kopia snapshots include /persist/services)
|
||||||
|
- [ ] GlusterFS removed (no processes, volumes deleted)
|
||||||
|
- [ ] Documentation updated
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last updated**: 2025-10-22
|
||||||
|
**Current phase**: Phase 2 complete (zippy storage setup done), ready for Phase 3 (GlusterFS → NFS migration)
|
||||||
|
**Note**: Phase 1 (fractal NixOS conversion) deferred until after GlusterFS migration is complete
|
||||||
438
docs/NFS_FAILOVER.md
Normal file
438
docs/NFS_FAILOVER.md
Normal file
@@ -0,0 +1,438 @@
|
|||||||
|
# NFS Services Failover Procedures
|
||||||
|
|
||||||
|
This document describes how to fail over the `/data/services` NFS server between hosts and how to fail back.
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
- **Primary NFS Server**: Typically `zippy`
|
||||||
|
- Exports `/persist/services` via NFS
|
||||||
|
- Has local bind mount: `/data/services` → `/persist/services` (same path as clients)
|
||||||
|
- Registers `data-services.service.consul` in Consul
|
||||||
|
- Sets Nomad node meta: `storage_role = "primary"`
|
||||||
|
- Replicates snapshots to standbys every 5 minutes via btrfs send
|
||||||
|
- **Safety check**: Refuses to start if another NFS server is already active in Consul
|
||||||
|
|
||||||
|
- **Standby**: Typically `c1`
|
||||||
|
- Receives snapshots at `/persist/services-standby/services@<timestamp>`
|
||||||
|
- Can be promoted to NFS server during failover
|
||||||
|
- No special Nomad node meta (not primary)
|
||||||
|
|
||||||
|
- **Clients**: All cluster nodes (c1, c2, c3, zippy)
|
||||||
|
- Mount `/data/services` from `data-services.service.consul:/persist/services`
|
||||||
|
- Automatically connect to whoever is registered in Consul
|
||||||
|
|
||||||
|
### Nomad Job Constraints
|
||||||
|
|
||||||
|
Jobs that need to run on the primary storage node should use:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This is useful for:
|
||||||
|
- Database jobs (mysql, postgres, redis) that benefit from local storage
|
||||||
|
- Jobs that need guaranteed fast disk I/O
|
||||||
|
|
||||||
|
During failover, the `storage_role = "primary"` meta attribute moves to the new NFS server, and Nomad automatically reschedules constrained jobs to the new primary.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Standby has been receiving snapshots (check: `ls /persist/services-standby/services@*`)
|
||||||
|
- Last successful replication was recent (< 5-10 minutes)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failover: Promoting Standby to Primary
|
||||||
|
|
||||||
|
**Scenario**: `zippy` is down and you need to promote `c1` to be the NFS server.
|
||||||
|
|
||||||
|
### Step 1: Choose Latest Snapshot
|
||||||
|
|
||||||
|
On the standby (c1):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh c1
|
||||||
|
sudo ls -lt /persist/services-standby/services@* | head -5
|
||||||
|
```
|
||||||
|
|
||||||
|
Find the most recent snapshot. Note the timestamp to estimate data loss (typically < 5 minutes).
|
||||||
|
|
||||||
|
### Step 2: Promote Snapshot to Read-Write Subvolume
|
||||||
|
|
||||||
|
On c1:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find the latest snapshot
|
||||||
|
LATEST=$(sudo ls -t /persist/services-standby/services@* | head -1)
|
||||||
|
|
||||||
|
# Create writable subvolume from snapshot
|
||||||
|
sudo btrfs subvolume snapshot "$LATEST" /persist/services
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
ls -la /persist/services
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Update NixOS Configuration
|
||||||
|
|
||||||
|
Edit your configuration to swap the NFS server role:
|
||||||
|
|
||||||
|
**In `hosts/c1/default.nix`**:
|
||||||
|
```nix
|
||||||
|
imports = [
|
||||||
|
# ... existing imports ...
|
||||||
|
# ../../common/nfs-services-standby.nix # REMOVE THIS
|
||||||
|
../../common/nfs-services-server.nix # ADD THIS
|
||||||
|
];
|
||||||
|
|
||||||
|
# Add standbys if desired (optional - can leave empty during emergency)
|
||||||
|
nfsServicesServer.standbys = []; # Or ["c2"] to add a new standby
|
||||||
|
```
|
||||||
|
|
||||||
|
**Optional: Prepare zippy config for when it comes back**:
|
||||||
|
|
||||||
|
In `hosts/zippy/default.nix` (can do this later too):
|
||||||
|
```nix
|
||||||
|
imports = [
|
||||||
|
# ... existing imports ...
|
||||||
|
# ../../common/nfs-services-server.nix # REMOVE THIS
|
||||||
|
../../common/nfs-services-standby.nix # ADD THIS
|
||||||
|
];
|
||||||
|
|
||||||
|
# Add the replication key from c1 (get it from c1:/persist/root/.ssh/btrfs-replication.pub)
|
||||||
|
nfsServicesStandby.replicationKeys = [
|
||||||
|
"ssh-ed25519 AAAA... root@c1-replication"
|
||||||
|
];
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Deploy Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From your workstation
|
||||||
|
deploy -s '.#c1'
|
||||||
|
|
||||||
|
# If zippy is still down, updating its config will fail, but that's okay
|
||||||
|
# You can update it later when it comes back
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Verify NFS Server is Running
|
||||||
|
|
||||||
|
On c1:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl status nfs-server
|
||||||
|
sudo showmount -e localhost
|
||||||
|
dig @localhost -p 8600 data-services.service.consul # Should show c1's IP
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 6: Verify Clients Can Access
|
||||||
|
|
||||||
|
From any node:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
df -h | grep services
|
||||||
|
ls /data/services
|
||||||
|
```
|
||||||
|
|
||||||
|
The mount should automatically reconnect via Consul DNS.
|
||||||
|
|
||||||
|
### Step 7: Check Nomad Jobs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nomad job status mysql
|
||||||
|
nomad job status postgres
|
||||||
|
# Verify critical services are healthy
|
||||||
|
|
||||||
|
# Jobs constrained to ${meta.storage_role} = "primary" will automatically
|
||||||
|
# reschedule to c1 once it's deployed with the NFS server module
|
||||||
|
```
|
||||||
|
|
||||||
|
**Recovery Time Objective (RTO)**: ~10-15 minutes
|
||||||
|
**Recovery Point Objective (RPO)**: Last replication interval (5 minutes max)
|
||||||
|
|
||||||
|
**Note**: Jobs with the `storage_role = "primary"` constraint will automatically move to c1 because it now has that node meta attribute. No job spec changes needed!
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What Happens When zippy Comes Back?
|
||||||
|
|
||||||
|
**IMPORTANT**: If zippy reboots while still configured as NFS server, it will **refuse to start** the NFS service because it detects c1 is already active in Consul.
|
||||||
|
|
||||||
|
You'll see this error in `journalctl -u nfs-server`:
|
||||||
|
|
||||||
|
```
|
||||||
|
ERROR: Another NFS server is already active at 192.168.1.X
|
||||||
|
This host (192.168.1.2) is configured as NFS server but should be standby.
|
||||||
|
To fix:
|
||||||
|
1. If this is intentional (failback), first demote the other server
|
||||||
|
2. Update this host's config to use nfs-services-standby.nix instead
|
||||||
|
3. Sync data from active server before promoting this host
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a **safety feature** to prevent split-brain and data corruption.
|
||||||
|
|
||||||
|
### Options when zippy comes back:
|
||||||
|
|
||||||
|
**Option A: Keep c1 as primary** (zippy becomes standby)
|
||||||
|
1. Update zippy's config to use `nfs-services-standby.nix`
|
||||||
|
2. Deploy to zippy
|
||||||
|
3. c1 will start replicating to zippy
|
||||||
|
|
||||||
|
**Option B: Fail back to zippy as primary**
|
||||||
|
Follow the "Failing Back to Original Primary" procedure below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failing Back to Original Primary
|
||||||
|
|
||||||
|
**Scenario**: `zippy` is repaired and you want to move the NFS server role back from `c1` to `zippy`.
|
||||||
|
|
||||||
|
### Step 1: Sync Latest Data from c1 to zippy
|
||||||
|
|
||||||
|
On c1 (current primary):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create readonly snapshot of current state
|
||||||
|
sudo btrfs subvolume snapshot -r /persist/services /persist/services@failback-$(date +%Y%m%d-%H%M%S)
|
||||||
|
|
||||||
|
# Find the snapshot
|
||||||
|
FAILBACK=$(sudo ls -t /persist/services@failback-* | head -1)
|
||||||
|
|
||||||
|
# Send to zippy (use root SSH key if available, or generate temporary key)
|
||||||
|
sudo btrfs send "$FAILBACK" | ssh root@zippy "btrfs receive /persist/"
|
||||||
|
```
|
||||||
|
|
||||||
|
On zippy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify snapshot arrived
|
||||||
|
ls -la /persist/services@failback-*
|
||||||
|
|
||||||
|
# Create writable subvolume from the snapshot
|
||||||
|
FAILBACK=$(ls -t /persist/services@failback-* | head -1)
|
||||||
|
sudo btrfs subvolume snapshot "$FAILBACK" /persist/services
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
ls -la /persist/services
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Update NixOS Configuration
|
||||||
|
|
||||||
|
Swap the roles back:
|
||||||
|
|
||||||
|
**In `hosts/zippy/default.nix`**:
|
||||||
|
```nix
|
||||||
|
imports = [
|
||||||
|
# ... existing imports ...
|
||||||
|
# ../../common/nfs-services-standby.nix # REMOVE THIS
|
||||||
|
../../common/nfs-services-server.nix # ADD THIS
|
||||||
|
];
|
||||||
|
|
||||||
|
nfsServicesServer.standbys = ["c1"];
|
||||||
|
```
|
||||||
|
|
||||||
|
**In `hosts/c1/default.nix`**:
|
||||||
|
```nix
|
||||||
|
imports = [
|
||||||
|
# ... existing imports ...
|
||||||
|
# ../../common/nfs-services-server.nix # REMOVE THIS
|
||||||
|
../../common/nfs-services-standby.nix # ADD THIS
|
||||||
|
];
|
||||||
|
|
||||||
|
nfsServicesStandby.replicationKeys = [
|
||||||
|
"ssh-ed25519 AAAA... root@zippy-replication" # Get from zippy:/persist/root/.ssh/btrfs-replication.pub
|
||||||
|
];
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Deploy Configurations
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# IMPORTANT: Deploy c1 FIRST to demote it
|
||||||
|
deploy -s '.#c1'
|
||||||
|
|
||||||
|
# Wait for c1 to stop NFS server
|
||||||
|
ssh c1 sudo systemctl status nfs-server # Should be inactive
|
||||||
|
|
||||||
|
# Then deploy zippy to promote it
|
||||||
|
deploy -s '.#zippy'
|
||||||
|
```
|
||||||
|
|
||||||
|
The order matters! If you deploy zippy first, it will see c1 is still active and refuse to start.
|
||||||
|
|
||||||
|
### Step 4: Verify Failback
|
||||||
|
|
||||||
|
Check Consul DNS points to zippy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dig @c1 -p 8600 data-services.service.consul # Should show zippy's IP
|
||||||
|
```
|
||||||
|
|
||||||
|
Check clients are mounting from zippy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for host in c1 c2 c3; do
|
||||||
|
ssh $host "df -h | grep services"
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Clean Up Temporary Snapshots
|
||||||
|
|
||||||
|
On c1:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Remove the failback snapshot and the promoted subvolume
|
||||||
|
sudo btrfs subvolume delete /persist/services@failback-*
|
||||||
|
sudo btrfs subvolume delete /persist/services
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Adding a New Standby
|
||||||
|
|
||||||
|
**Scenario**: You want to add `c2` as an additional standby.
|
||||||
|
|
||||||
|
### Step 1: Create Standby Subvolume on c2
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh c2
|
||||||
|
sudo btrfs subvolume create /persist/services-standby
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Update c2 Configuration
|
||||||
|
|
||||||
|
**In `hosts/c2/default.nix`**:
|
||||||
|
```nix
|
||||||
|
imports = [
|
||||||
|
# ... existing imports ...
|
||||||
|
../../common/nfs-services-standby.nix
|
||||||
|
];
|
||||||
|
|
||||||
|
nfsServicesStandby.replicationKeys = [
|
||||||
|
"ssh-ed25519 AAAA... root@zippy-replication" # Get from current NFS server
|
||||||
|
];
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Update NFS Server Configuration
|
||||||
|
|
||||||
|
On the current NFS server (e.g., zippy), update the standbys list:
|
||||||
|
|
||||||
|
**In `hosts/zippy/default.nix`**:
|
||||||
|
```nix
|
||||||
|
nfsServicesServer.standbys = ["c1" "c2"]; # Added c2
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Deploy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
deploy -s '.#c2'
|
||||||
|
deploy -s '.#zippy'
|
||||||
|
```
|
||||||
|
|
||||||
|
The next replication cycle (within 5 minutes) will do a full send to c2, then switch to incremental.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Replication Failed
|
||||||
|
|
||||||
|
Check the replication service logs:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On NFS server
|
||||||
|
sudo journalctl -u replicate-services-to-c1 -f
|
||||||
|
```
|
||||||
|
|
||||||
|
Common issues:
|
||||||
|
- SSH key not found → Run key generation step (see stateful-commands.txt)
|
||||||
|
- Permission denied → Check authorized_keys on standby
|
||||||
|
- Snapshot already exists → Old snapshot with same timestamp, wait for next cycle
|
||||||
|
|
||||||
|
### Clients Can't Mount
|
||||||
|
|
||||||
|
Check Consul:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
dig @localhost -p 8600 data-services.service.consul
|
||||||
|
consul catalog services | grep data-services
|
||||||
|
```
|
||||||
|
|
||||||
|
If Consul isn't resolving:
|
||||||
|
- NFS server might not have registered → Check `sudo systemctl status nfs-server`
|
||||||
|
- Consul agent might be down → Check `sudo systemctl status consul`
|
||||||
|
|
||||||
|
### Mount is Stale
|
||||||
|
|
||||||
|
Force remount:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl restart data-services.mount
|
||||||
|
```
|
||||||
|
|
||||||
|
Or unmount and let automount handle it:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo umount /data/services
|
||||||
|
ls /data/services # Triggers automount
|
||||||
|
```
|
||||||
|
|
||||||
|
### Split-Brain Prevention: NFS Server Won't Start
|
||||||
|
|
||||||
|
If you see:
|
||||||
|
```
|
||||||
|
ERROR: Another NFS server is already active at 192.168.1.X
|
||||||
|
```
|
||||||
|
|
||||||
|
This is **intentional** - the safety check is working! You have two options:
|
||||||
|
|
||||||
|
1. **Keep the other server as primary**: Update this host's config to be a standby instead
|
||||||
|
2. **Fail back to this host**: First demote the other server, sync data, then deploy both hosts in correct order
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Monitoring
|
||||||
|
|
||||||
|
### Check Replication Status
|
||||||
|
|
||||||
|
On NFS server:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List recent snapshots
|
||||||
|
ls -lt /persist/services@* | head
|
||||||
|
|
||||||
|
# Check last replication run
|
||||||
|
sudo systemctl status replicate-services-to-c1
|
||||||
|
|
||||||
|
# Check replication logs
|
||||||
|
sudo journalctl -u replicate-services-to-c1 --since "1 hour ago"
|
||||||
|
```
|
||||||
|
|
||||||
|
On standby:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List received snapshots
|
||||||
|
ls -lt /persist/services-standby/services@* | head
|
||||||
|
|
||||||
|
# Check how old the latest snapshot is
|
||||||
|
stat /persist/services-standby/services@* | grep Modify | head -1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Verify NFS Exports
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo showmount -e localhost
|
||||||
|
```
|
||||||
|
|
||||||
|
Should show:
|
||||||
|
```
|
||||||
|
/persist/services 192.168.1.0/24
|
||||||
|
```
|
||||||
|
|
||||||
|
### Check Consul Registration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
consul catalog services | grep data-services
|
||||||
|
dig @localhost -p 8600 data-services.service.consul
|
||||||
|
```
|
||||||
@@ -4,6 +4,11 @@
|
|||||||
../../common/encrypted-btrfs-layout.nix
|
../../common/encrypted-btrfs-layout.nix
|
||||||
../../common/global
|
../../common/global
|
||||||
../../common/compute-node.nix
|
../../common/compute-node.nix
|
||||||
|
../../common/nfs-services-standby.nix # NFS standby for /data/services
|
||||||
|
# To promote to NFS server (during failover):
|
||||||
|
# 1. Follow procedure in docs/NFS_FAILOVER.md
|
||||||
|
# 2. Replace above line with: ../../common/nfs-services-server.nix
|
||||||
|
# 3. Add nfsServicesServer.standbys = [ "c2" ]; (or leave empty)
|
||||||
./hardware.nix
|
./hardware.nix
|
||||||
];
|
];
|
||||||
|
|
||||||
@@ -15,4 +20,9 @@
|
|||||||
|
|
||||||
networking.hostName = "c1";
|
networking.hostName = "c1";
|
||||||
services.tailscaleAutoconnect.authkey = "tskey-auth-k2nQ771YHM11CNTRL-YVpoumL2mgR6nLPG51vNhRpEKMDN7gLAi";
|
services.tailscaleAutoconnect.authkey = "tskey-auth-k2nQ771YHM11CNTRL-YVpoumL2mgR6nLPG51vNhRpEKMDN7gLAi";
|
||||||
|
|
||||||
|
# NFS standby configuration: accept replication from zippy
|
||||||
|
nfsServicesStandby.replicationKeys = [
|
||||||
|
"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHyTKsMCbwCIlMcC/aopgz5Yfx/Q9QdlWC9jzMLgYFAV root@zippy-replication"
|
||||||
|
];
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -5,6 +5,11 @@
|
|||||||
../../common/global
|
../../common/global
|
||||||
../../common/compute-node.nix
|
../../common/compute-node.nix
|
||||||
# ../../common/ethereum.nix
|
# ../../common/ethereum.nix
|
||||||
|
../../common/nfs-services-server.nix # NFS server for /data/services
|
||||||
|
# To move NFS server role to another host:
|
||||||
|
# 1. Follow procedure in docs/NFS_FAILOVER.md
|
||||||
|
# 2. Replace above line with: ../../common/nfs-services-standby.nix
|
||||||
|
# 3. Add nfsServicesStandby.replicationKeys with the new server's public key
|
||||||
./hardware.nix
|
./hardware.nix
|
||||||
];
|
];
|
||||||
|
|
||||||
@@ -16,4 +21,7 @@
|
|||||||
|
|
||||||
networking.hostName = "zippy";
|
networking.hostName = "zippy";
|
||||||
services.tailscaleAutoconnect.authkey = "tskey-auth-ktKyQ59f2p11CNTRL-ut8E71dLWPXsVtb92hevNX9RTjmk4owBf";
|
services.tailscaleAutoconnect.authkey = "tskey-auth-ktKyQ59f2p11CNTRL-ut8E71dLWPXsVtb92hevNX9RTjmk4owBf";
|
||||||
|
|
||||||
|
# NFS server configuration: replicate to c1 as standby
|
||||||
|
nfsServicesServer.standbys = [ "c1" ];
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -114,5 +114,5 @@ variable "secret_key" {
|
|||||||
|
|
||||||
variable "authentik_version" {
|
variable "authentik_version" {
|
||||||
type = string
|
type = string
|
||||||
default = "2025.4"
|
default = "2025.6"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -6,6 +6,13 @@ job "clickhouse" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
group "db" {
|
group "db" {
|
||||||
|
# Run on primary storage node (zippy) for local disk performance
|
||||||
|
# TODO: move to fractal once it's converted to NixOS (spinning disks OK for time-series data)
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
network {
|
network {
|
||||||
port "clickhouse" {
|
port "clickhouse" {
|
||||||
static = 8123
|
static = 8123
|
||||||
@@ -18,7 +25,7 @@ job "clickhouse" {
|
|||||||
config {
|
config {
|
||||||
image = "clickhouse/clickhouse-server:25.9"
|
image = "clickhouse/clickhouse-server:25.9"
|
||||||
volumes = [
|
volumes = [
|
||||||
"/data/compute/appdata/clickhouse:/var/lib/clickhouse",
|
"/data/services/clickhouse:/var/lib/clickhouse",
|
||||||
"local/clickhouse-config.xml:/etc/clickhouse-server/config.d/logging.xml:ro",
|
"local/clickhouse-config.xml:/etc/clickhouse-server/config.d/logging.xml:ro",
|
||||||
"local/clickhouse-user-config.xml:/etc/clickhouse-server/users.d/logging.xml:ro",
|
"local/clickhouse-user-config.xml:/etc/clickhouse-server/users.d/logging.xml:ro",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -1,51 +0,0 @@
|
|||||||
job "leantime" {
|
|
||||||
datacenters = ["alo"]
|
|
||||||
|
|
||||||
group "web" {
|
|
||||||
network {
|
|
||||||
port "http" {
|
|
||||||
to = 80
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
task "server" {
|
|
||||||
driver = "docker"
|
|
||||||
|
|
||||||
config {
|
|
||||||
image = "leantime/leantime:latest"
|
|
||||||
ports = ["http"]
|
|
||||||
volumes = [
|
|
||||||
"/data/compute/appdata/leantime:/var/www/html/userfiles",
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
env {
|
|
||||||
LEAN_DEFAULT_TIMEZONE = "Europe/Lisbon"
|
|
||||||
LEAN_DB_HOST = "mysql.service.consul"
|
|
||||||
LEAN_DB_USER = "leantime"
|
|
||||||
LEAN_DB_PASSWORD = "Xuphaedoo9kuaseeQuei"
|
|
||||||
LEAN_DB_DATABASE = "leantime"
|
|
||||||
LEAN_EMAIL_RETURN = "leantime@paler.net"
|
|
||||||
LEAN_APP_URL = "https://leantime.v.paler.net"
|
|
||||||
LEAN_EMAIL_SMTP_HOSTS = "192.168.1.1"
|
|
||||||
LEAN_EMAIL_SMTP_AUTH = "false"
|
|
||||||
LEAN_OIDC_ENABLE = "true"
|
|
||||||
LEAN_OIDC_CREATE_USER = "true"
|
|
||||||
LEAN_OIDC_PROVIDER_URL = "https://authentik.v.paler.net/application/o/leantime/"
|
|
||||||
LEAN_OIDC_CLIENT_ID = "nWqJu9g4avhdpmUzqqvjsExCA1Jrick7GSMd0D6u"
|
|
||||||
LEAN_OIDC_CLIENT_SECRET = "VvPQi5q3kkVTCwN8QWwwPTCqjWc9VbRanCFxa0zB2mhr1ZPxUYXP7Ygg6naMInE4P5vyqJd5w8XiWkuecW14G4KxgXpFtWChKnCOOpe47gjZGNbkYIEDZUmkUB99Saxx"
|
|
||||||
}
|
|
||||||
|
|
||||||
service {
|
|
||||||
name = "leantime"
|
|
||||||
port = "http"
|
|
||||||
|
|
||||||
tags = [
|
|
||||||
"traefik.enable=true",
|
|
||||||
"traefik.http.routers.leantime.entryPoints=websecure",
|
|
||||||
"traefik.http.routers.leantime.middlewares=authentik@file",
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
@@ -10,6 +10,14 @@ job "loki" {
|
|||||||
}
|
}
|
||||||
group "loki" {
|
group "loki" {
|
||||||
count = 1
|
count = 1
|
||||||
|
|
||||||
|
# Run on primary storage node (zippy) for local disk performance
|
||||||
|
# TODO: move to fractal once it's converted to NixOS (spinning disks OK for log data)
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
restart {
|
restart {
|
||||||
attempts = 3
|
attempts = 3
|
||||||
interval = "5m"
|
interval = "5m"
|
||||||
@@ -31,7 +39,7 @@ job "loki" {
|
|||||||
"local/loki/local-config.yaml",
|
"local/loki/local-config.yaml",
|
||||||
]
|
]
|
||||||
ports = ["loki"]
|
ports = ["loki"]
|
||||||
volumes = ["/data/compute/appdata/loki:/loki"]
|
volumes = ["/data/services/loki:/loki"]
|
||||||
}
|
}
|
||||||
template {
|
template {
|
||||||
data = <<EOH
|
data = <<EOH
|
||||||
|
|||||||
@@ -3,11 +3,17 @@ job "mysql-backup" {
|
|||||||
type = "batch"
|
type = "batch"
|
||||||
|
|
||||||
periodic {
|
periodic {
|
||||||
cron = "23 23 * * * *"
|
crons = ["23 23 * * * *"]
|
||||||
prohibit_overlap = true
|
prohibit_overlap = true
|
||||||
}
|
}
|
||||||
|
|
||||||
group "db" {
|
group "db" {
|
||||||
|
# Run on primary storage node for fast local disk access
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
task "backup" {
|
task "backup" {
|
||||||
driver = "raw_exec"
|
driver = "raw_exec"
|
||||||
|
|
||||||
@@ -21,7 +27,7 @@ job "mysql-backup" {
|
|||||||
data = <<EOH
|
data = <<EOH
|
||||||
set -e
|
set -e
|
||||||
/run/current-system/sw/bin/nomad alloc exec -job -task=mysqld mysql \
|
/run/current-system/sw/bin/nomad alloc exec -job -task=mysqld mysql \
|
||||||
mysqldump -u root --password="$MYSQL_ROOT_PASS" --all-databases > /data/compute/appdata/db-backups/mysql/backup.sql && \
|
mysqldump -u root --password="$MYSQL_ROOT_PASS" --all-databases > /data/services/db-backups/mysql/backup.sql && \
|
||||||
echo "last_success $(date +%s)" | \
|
echo "last_success $(date +%s)" | \
|
||||||
/run/current-system/sw/bin/curl --data-binary @- http://pushgateway.service.consul:9091/metrics/job/mysql_backup
|
/run/current-system/sw/bin/curl --data-binary @- http://pushgateway.service.consul:9091/metrics/job/mysql_backup
|
||||||
EOH
|
EOH
|
||||||
|
|||||||
@@ -6,6 +6,12 @@ job "mysql" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
group "db" {
|
group "db" {
|
||||||
|
# Run on primary storage node (zippy) for local disk performance
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
network {
|
network {
|
||||||
port "db" {
|
port "db" {
|
||||||
static = 3306
|
static = 3306
|
||||||
@@ -19,13 +25,9 @@ job "mysql" {
|
|||||||
|
|
||||||
config {
|
config {
|
||||||
image = "mysql:9.4"
|
image = "mysql:9.4"
|
||||||
args = [
|
|
||||||
# 300M, up from default of 100M
|
|
||||||
"--innodb-redo-log-capacity=314572800",
|
|
||||||
]
|
|
||||||
ports = ["db"]
|
ports = ["db"]
|
||||||
volumes = [
|
volumes = [
|
||||||
"/data/compute/appdata/mysql:/var/lib/mysql",
|
"/data/services/mysql:/var/lib/mysql",
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -3,11 +3,17 @@ job "postgres-backup" {
|
|||||||
type = "batch"
|
type = "batch"
|
||||||
|
|
||||||
periodic {
|
periodic {
|
||||||
cron = "22 22 * * * *"
|
crons = ["22 22 * * * *"]
|
||||||
prohibit_overlap = true
|
prohibit_overlap = true
|
||||||
}
|
}
|
||||||
|
|
||||||
group "db" {
|
group "db" {
|
||||||
|
# Run on primary storage node (zippy) where postgres runs
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
task "backup" {
|
task "backup" {
|
||||||
driver = "raw_exec"
|
driver = "raw_exec"
|
||||||
|
|
||||||
@@ -21,7 +27,7 @@ job "postgres-backup" {
|
|||||||
data = <<EOH
|
data = <<EOH
|
||||||
set -e
|
set -e
|
||||||
/run/current-system/sw/bin/nomad alloc exec -job -task=postgres postgres \
|
/run/current-system/sw/bin/nomad alloc exec -job -task=postgres postgres \
|
||||||
pg_dumpall -U postgres > /data/compute/appdata/db-backups/postgresql/backup.sql && \
|
pg_dumpall -U postgres > /data/services/db-backups/postgresql/backup.sql && \
|
||||||
echo "last_success $(date +%s)" | \
|
echo "last_success $(date +%s)" | \
|
||||||
/run/current-system/sw/bin/curl --data-binary @- http://pushgateway.service.consul:9091/metrics/job/postgres_backup
|
/run/current-system/sw/bin/curl --data-binary @- http://pushgateway.service.consul:9091/metrics/job/postgres_backup
|
||||||
EOH
|
EOH
|
||||||
|
|||||||
@@ -7,6 +7,12 @@ job "postgres" {
|
|||||||
|
|
||||||
|
|
||||||
group "db" {
|
group "db" {
|
||||||
|
# Run on primary storage node (zippy) for local disk performance
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
network {
|
network {
|
||||||
port "db" {
|
port "db" {
|
||||||
static = 5432
|
static = 5432
|
||||||
@@ -23,7 +29,7 @@ job "postgres" {
|
|||||||
config {
|
config {
|
||||||
image = "postgis/postgis:15-3.4-alpine"
|
image = "postgis/postgis:15-3.4-alpine"
|
||||||
ports = ["db"]
|
ports = ["db"]
|
||||||
volumes = [ "/data/compute/appdata/postgres:/var/lib/postgresql/data" ]
|
volumes = [ "/data/services/postgres:/var/lib/postgresql/data" ]
|
||||||
}
|
}
|
||||||
|
|
||||||
env {
|
env {
|
||||||
@@ -72,7 +78,7 @@ job "postgres" {
|
|||||||
config {
|
config {
|
||||||
image = "dpage/pgadmin4:latest"
|
image = "dpage/pgadmin4:latest"
|
||||||
ports = ["admin"]
|
ports = ["admin"]
|
||||||
volumes = [ "/data/compute/appdata/pgadmin:/var/lib/pgadmin" ]
|
volumes = [ "/data/services/pgadmin:/var/lib/pgadmin" ]
|
||||||
}
|
}
|
||||||
|
|
||||||
env {
|
env {
|
||||||
|
|||||||
@@ -10,6 +10,13 @@ job "prometheus" {
|
|||||||
group "monitoring" {
|
group "monitoring" {
|
||||||
count = 1
|
count = 1
|
||||||
|
|
||||||
|
# Run on primary storage node (zippy) for local disk performance
|
||||||
|
# TODO: move to fractal once it's converted to NixOS (spinning disks OK for time-series data)
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
network {
|
network {
|
||||||
port "http" {
|
port "http" {
|
||||||
#host_network = "tailscale"
|
#host_network = "tailscale"
|
||||||
@@ -37,7 +44,7 @@ job "prometheus" {
|
|||||||
volumes = [
|
volumes = [
|
||||||
"local/alerts.yml:/prometheus/alerts.yml",
|
"local/alerts.yml:/prometheus/alerts.yml",
|
||||||
"local/prometheus.yml:/prometheus/prometheus.yml",
|
"local/prometheus.yml:/prometheus/prometheus.yml",
|
||||||
"/data/compute/appdata/prometheus:/opt/prometheus",
|
"/data/services/prometheus:/opt/prometheus",
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -6,6 +6,12 @@ job "redis" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
group "db" {
|
group "db" {
|
||||||
|
# Run on primary storage node (zippy) for local disk performance
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
network {
|
network {
|
||||||
port "redis" {
|
port "redis" {
|
||||||
static = 6379
|
static = 6379
|
||||||
@@ -21,7 +27,7 @@ job "redis" {
|
|||||||
config {
|
config {
|
||||||
image = "redis:alpine"
|
image = "redis:alpine"
|
||||||
ports = ["redis"]
|
ports = ["redis"]
|
||||||
volumes = [ "/data/compute/appdata/redis:/data" ]
|
volumes = [ "/data/services/redis:/data" ]
|
||||||
}
|
}
|
||||||
|
|
||||||
service {
|
service {
|
||||||
|
|||||||
@@ -6,6 +6,14 @@ job "unifi" {
|
|||||||
}
|
}
|
||||||
|
|
||||||
group "net" {
|
group "net" {
|
||||||
|
# Run on primary storage node (zippy) for local disk performance
|
||||||
|
# MongoDB needs local disk, not NFS
|
||||||
|
# TODO: can move to fractal once it's converted to NixOS
|
||||||
|
constraint {
|
||||||
|
attribute = "${meta.storage_role}"
|
||||||
|
value = "primary"
|
||||||
|
}
|
||||||
|
|
||||||
network {
|
network {
|
||||||
port "p8443" { static = 8443 }
|
port "p8443" { static = 8443 }
|
||||||
port "p3478" { static = 3478 }
|
port "p3478" { static = 3478 }
|
||||||
@@ -38,7 +46,7 @@ job "unifi" {
|
|||||||
"p5514",
|
"p5514",
|
||||||
]
|
]
|
||||||
volumes = [
|
volumes = [
|
||||||
"/data/compute/appdata/unifi/data:/config",
|
"/data/services/unifi/data:/config",
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -105,8 +113,8 @@ job "unifi" {
|
|||||||
image = "mongo:8.0"
|
image = "mongo:8.0"
|
||||||
ports = ["mongodb"]
|
ports = ["mongodb"]
|
||||||
volumes = [
|
volumes = [
|
||||||
"/data/compute/appdata/unifi/mongodb:/data/db",
|
"/data/services/unifi/mongodb:/data/db",
|
||||||
"/data/compute/appdata/unifi/init-mongo.sh:/docker-entrypoint-initdb.d/init-mongo.sh:ro"
|
"/data/services/unifi/init-mongo.sh:/docker-entrypoint-initdb.d/init-mongo.sh:ro"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ job "wordpress-backup" {
|
|||||||
type = "batch"
|
type = "batch"
|
||||||
|
|
||||||
periodic {
|
periodic {
|
||||||
cron = "*/5 * * * * *"
|
crons = ["*/5 * * * * *"]
|
||||||
prohibit_overlap = true
|
prohibit_overlap = true
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -39,3 +39,22 @@ kopia repository server setup (on a non-NixOS host at the time):
|
|||||||
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key --tls-generate-cert (first time)
|
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key --tls-generate-cert (first time)
|
||||||
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key (subsequent)
|
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key (subsequent)
|
||||||
[TLS is mandatory for this]
|
[TLS is mandatory for this]
|
||||||
|
|
||||||
|
NFS services server setup (one-time on the NFS server host, e.g. zippy):
|
||||||
|
* sudo btrfs subvolume create /persist/services
|
||||||
|
* sudo mkdir -p /persist/root/.ssh
|
||||||
|
* sudo ssh-keygen -t ed25519 -f /persist/root/.ssh/btrfs-replication -N "" -C "root@$(hostname)-replication"
|
||||||
|
* Get the public key: sudo cat /persist/root/.ssh/btrfs-replication.pub
|
||||||
|
Then add this public key to each standby's nfsServicesStandby.replicationKeys option
|
||||||
|
|
||||||
|
NFS services standby setup (one-time on each standby host, e.g. c1):
|
||||||
|
* sudo btrfs subvolume create /persist/services-standby
|
||||||
|
|
||||||
|
Moving NFS server role between hosts (e.g. from zippy to c1):
|
||||||
|
See docs/NFS_FAILOVER.md for detailed procedure
|
||||||
|
Summary:
|
||||||
|
1. On current primary: create final snapshot and send to new primary
|
||||||
|
2. On new primary: promote snapshot to /persist/services
|
||||||
|
3. Update configs: remove nfs-services-server.nix from old primary, add to new primary
|
||||||
|
4. Update configs: add nfs-services-standby.nix to old primary (with replication keys)
|
||||||
|
5. Deploy old primary first (to demote), then new primary (to promote)
|
||||||
|
|||||||
Reference in New Issue
Block a user