alo-cluster/stateful-commands.txt

glusterfs setup on c1:
  * for h in c1 c2 c3; do ssh $h sudo mkdir /persist/glusterfs/compute; done
  * gluster peer probe c2
  * gluster peer probe c3
  * gluster volume create compute replica 3 c{1,2,3}:/persist/glusterfs/compute/brick1
  * gluster volume start compute
  * gluster volume bitrot compute enable

mysql credentials
  * Put secrets/mysql_root_password into a Nomad var named secrets/mysql.root_password

postgres credentials
  * Put secrets/postgres_password into a Nomad var named secrets/postgresql.postgres_password

adding a new gluster node to the compute volume, with c3 having failed:
(instructions from https://icicimov.github.io/blog/high-availability/Replacing-GlusterFS-failed-node/)
  * zippy: sudo mkdir /persist/glusterfs/compute -p
  * c1: gluster peer probe 192.168.1.2 (by IP because zippy resolved to a tailscale address)
  * c1: gluster volume replace-brick compute c3:/persist/glusterfs/compute/brick1 192.168.1.2:/persist/glusterfs/compute/brick1 commit force
  * c1: gluster volume heal compute full
  * c1: gluster peer detach c3

same to then later replace 192.168.1.2 with 192.168.1.73

replacing failed / reinstalled gluster volume (c1 in this case). all commands on c2:
  * gluster volume remove-brick compute replica 2 c1:/persist/glusterfs/compute/brick1 force
  * gluster peer detach c1
  * gluster peer probe 192.168.1.71 (not c1 because switching to IPs to avoid DNS/tailscale issues)
  * gluster volume add-brick compute replica 3 192.168.1.71:/persist/glusterfs/compute/brick1

kopia repository server setup (on a non-NixOS host at the time):
  * kopia repository create filesystem --path /backup/persist
  * kopia repository connect filesystem --path=/backup/persist
  * kopia server user add root@zippy
    then, add the password to secrets/zippy.yaml -- the key needs to be "kopia"
  * kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key --tls-generate-cert (first time)
  * kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key (subsequent)
[TLS is mandatory for this]

NFS services server setup (one-time on the NFS server host, e.g. zippy):
  * sudo btrfs subvolume create /persist/services
  * sudo mkdir -p /persist/root/.ssh
  * sudo ssh-keygen -t ed25519 -f /persist/root/.ssh/btrfs-replication -N "" -C "root@$(hostname)-replication"
  * Get the public key: sudo cat /persist/root/.ssh/btrfs-replication.pub
    Then add this public key to each standby's nfsServicesStandby.replicationKeys option

NFS services standby setup (one-time on each standby host, e.g. c1):
  * sudo btrfs subvolume create /persist/services-standby

Moving NFS server role between hosts (e.g. from zippy to c1):
  See docs/NFS_FAILOVER.md for detailed procedure
  Summary:
  1. On current primary: create final snapshot and send to new primary
  2. On new primary: promote snapshot to /persist/services
  3. Update configs: remove nfs-services-server.nix from old primary, add to new primary
  4. Update configs: add nfs-services-standby.nix to old primary (with replication keys)
  5. Deploy old primary first (to demote), then new primary (to promote)