Files
alo-cluster/stateful-commands.txt

58 lines
3.2 KiB
Plaintext

glusterfs setup on c1:
* for h in c1 c2 c3; do ssh $h sudo mkdir /persist/glusterfs/compute; done
* gluster peer probe c2
* gluster peer probe c3
* gluster volume create compute replica 3 c{1,2,3}:/persist/glusterfs/compute/brick1
* gluster volume start compute
* gluster volume bitrot compute enable
mysql credentials
* Put secrets/mysql_root_password into a Nomad var named secrets/mysql.root_password
postgres credentials
* Put secrets/postgres_password into a Nomad var named secrets/postgresql.postgres_password
adding a new gluster node to the compute volume, with c3 having failed:
(instructions from https://icicimov.github.io/blog/high-availability/Replacing-GlusterFS-failed-node/)
* zippy: sudo mkdir /persist/glusterfs/compute -p
* c1: gluster peer probe 192.168.1.2 (by IP because zippy resolved to a tailscale address)
* c1: gluster volume replace-brick compute c3:/persist/glusterfs/compute/brick1 192.168.1.2:/persist/glusterfs/compute/brick1 commit force
* c1: gluster volume heal compute full
* c1: gluster peer detach c3
same to then later replace 192.168.1.2 with 192.168.1.73
replacing failed / reinstalled gluster volume (c1 in this case). all commands on c2:
* gluster volume remove-brick compute replica 2 c1:/persist/glusterfs/compute/brick1 force
* gluster peer detach c1
* gluster peer probe 192.168.1.71 (not c1 because switching to IPs to avoid DNS/tailscale issues)
* gluster volume add-brick compute replica 3 192.168.1.71:/persist/glusterfs/compute/brick1
kopia repository server setup (on a non-NixOS host at the time):
* kopia repository create filesystem --path /backup/persist
* kopia repository connect filesystem --path=/backup/persist
* kopia server user add root@zippy
then, add the password to secrets/zippy.yaml -- the key needs to be "kopia"
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key --tls-generate-cert (first time)
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key (subsequent)
[TLS is mandatory for this]
NFS services server setup (one-time on the NFS server host, e.g. zippy):
* sudo btrfs subvolume create /persist/services
* sudo mkdir -p /persist/root/.ssh
* sudo ssh-keygen -t ed25519 -f /persist/root/.ssh/btrfs-replication -N "" -C "root@$(hostname)-replication"
* Get the public key: sudo cat /persist/root/.ssh/btrfs-replication.pub
Then add this public key to each standby's nfsServicesStandby.replicationKeys option
NFS services standby setup (one-time on each standby host, e.g. c1):
* sudo btrfs subvolume create /persist/services-standby
Moving NFS server role between hosts (e.g. from zippy to c1):
See docs/NFS_FAILOVER.md for detailed procedure
Summary:
1. On current primary: create final snapshot and send to new primary
2. On new primary: promote snapshot to /persist/services
3. Update configs: remove nfs-services-server.nix from old primary, add to new primary
4. Update configs: add nfs-services-standby.nix to old primary (with replication keys)
5. Deploy old primary first (to demote), then new primary (to promote)