58 lines
3.2 KiB
Plaintext
58 lines
3.2 KiB
Plaintext
glusterfs setup on c1:
|
|
* for h in c1 c2 c3; do ssh $h sudo mkdir /persist/glusterfs/compute; done
|
|
* gluster peer probe c2
|
|
* gluster peer probe c3
|
|
* gluster volume create compute replica 3 c{1,2,3}:/persist/glusterfs/compute/brick1
|
|
* gluster volume start compute
|
|
* gluster volume bitrot compute enable
|
|
|
|
mysql credentials
|
|
* Put secrets/mysql_root_password into a Nomad var named secrets/mysql.root_password
|
|
|
|
postgres credentials
|
|
* Put secrets/postgres_password into a Nomad var named secrets/postgresql.postgres_password
|
|
|
|
adding a new gluster node to the compute volume, with c3 having failed:
|
|
(instructions from https://icicimov.github.io/blog/high-availability/Replacing-GlusterFS-failed-node/)
|
|
* zippy: sudo mkdir /persist/glusterfs/compute -p
|
|
* c1: gluster peer probe 192.168.1.2 (by IP because zippy resolved to a tailscale address)
|
|
* c1: gluster volume replace-brick compute c3:/persist/glusterfs/compute/brick1 192.168.1.2:/persist/glusterfs/compute/brick1 commit force
|
|
* c1: gluster volume heal compute full
|
|
* c1: gluster peer detach c3
|
|
|
|
same to then later replace 192.168.1.2 with 192.168.1.73
|
|
|
|
replacing failed / reinstalled gluster volume (c1 in this case). all commands on c2:
|
|
* gluster volume remove-brick compute replica 2 c1:/persist/glusterfs/compute/brick1 force
|
|
* gluster peer detach c1
|
|
* gluster peer probe 192.168.1.71 (not c1 because switching to IPs to avoid DNS/tailscale issues)
|
|
* gluster volume add-brick compute replica 3 192.168.1.71:/persist/glusterfs/compute/brick1
|
|
|
|
kopia repository server setup (on a non-NixOS host at the time):
|
|
* kopia repository create filesystem --path /backup/persist
|
|
* kopia repository connect filesystem --path=/backup/persist
|
|
* kopia server user add root@zippy
|
|
then, add the password to secrets/zippy.yaml -- the key needs to be "kopia"
|
|
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key --tls-generate-cert (first time)
|
|
* kopia server start --address 0.0.0.0:51515 --tls-cert-file ~/kopia-certs/kopia.cert --tls-key-file ~/kopia-certs/kopia.key (subsequent)
|
|
[TLS is mandatory for this]
|
|
|
|
NFS services server setup (one-time on the NFS server host, e.g. zippy):
|
|
* sudo btrfs subvolume create /persist/services
|
|
* sudo mkdir -p /persist/root/.ssh
|
|
* sudo ssh-keygen -t ed25519 -f /persist/root/.ssh/btrfs-replication -N "" -C "root@$(hostname)-replication"
|
|
* Get the public key: sudo cat /persist/root/.ssh/btrfs-replication.pub
|
|
Then add this public key to each standby's nfsServicesStandby.replicationKeys option
|
|
|
|
NFS services standby setup (one-time on each standby host, e.g. c1):
|
|
* sudo btrfs subvolume create /persist/services-standby
|
|
|
|
Moving NFS server role between hosts (e.g. from zippy to c1):
|
|
See docs/NFS_FAILOVER.md for detailed procedure
|
|
Summary:
|
|
1. On current primary: create final snapshot and send to new primary
|
|
2. On new primary: promote snapshot to /persist/services
|
|
3. Update configs: remove nfs-services-server.nix from old primary, add to new primary
|
|
4. Update configs: add nfs-services-standby.nix to old primary (with replication keys)
|
|
5. Deploy old primary first (to demote), then new primary (to promote)
|