Add reusable CI/CD workflow and documentation

- .gitea/workflows/deploy-nomad.yaml: Shared workflow for build/push/deploy
- docs/CICD_SETUP.md: Guide for adding CI/CD to new services
- nix-runner/README.md: Document the custom Nix runner image

Services can now use a 10-line workflow that calls the shared one:
  uses: ppetru/alo-cluster/.gitea/workflows/deploy-nomad.yaml@master

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-05 07:47:01 +00:00
parent c548ead4f7
commit ed2c899915
3 changed files with 402 additions and 0 deletions

206
docs/CICD_SETUP.md Normal file
View File

@@ -0,0 +1,206 @@
# CI/CD Setup for Nomad Services
Guide for adding automated builds and deployments to a service.
## Prerequisites
### 1. Service Repository
Your service needs a `flake.nix` that exports a Docker image:
```nix
{
outputs = { self, nixpkgs, ... }: {
# The workflow looks for this output by default
dockerImage = pkgs.dockerTools.buildImage {
name = "gitea.v.paler.net/ppetru/<service>";
tag = "latest";
# ... image config
};
};
}
```
**Important**: Use `extraCommands` instead of `runAsRoot` in your Docker build - the CI runner doesn't have KVM.
### 2. Nomad Job
Your job in `services/<name>.hcl` needs:
```hcl
job "<service>" {
# Required: UUID changes trigger deployments
meta {
uuid = uuidv4()
}
# Required: enables deployment tracking and auto-rollback
update {
max_parallel = 1
health_check = "checks"
min_healthy_time = "30s"
healthy_deadline = "5m"
auto_revert = true
}
# Required: pulls new image on each deployment
task "app" {
config {
force_pull = true
}
# Recommended: health check for deployment validation
service {
check {
type = "http"
path = "/healthz"
interval = "10s"
timeout = "5s"
}
}
}
}
```
## Quick Start
### 1. Create Workflow
Add `.gitea/workflows/deploy.yaml` to your service repo:
```yaml
name: Deploy
on:
push:
branches: [master]
workflow_dispatch:
jobs:
deploy:
uses: ppetru/alo-cluster/.gitea/workflows/deploy-nomad.yaml@master
with:
service_name: <your-service> # Must match Nomad job ID
secrets: inherit
```
### 2. Add Secrets
In Gitea → Your Repo → Settings → Actions → Secrets, add:
| Secret | Value |
|--------|-------|
| `REGISTRY_USERNAME` | `ppetru` |
| `REGISTRY_PASSWORD` | Gitea access token with `packages:write` |
| `NOMAD_ADDR` | `http://nomad.service.consul:4646` |
### 3. Push
Push to `master` branch. The workflow will:
1. Build your Docker image with Nix
2. Push to Gitea registry
3. Update the Nomad job to trigger deployment
4. Monitor until deployment succeeds or fails
## Workflow Options
The shared workflow accepts these inputs:
| Input | Default | Description |
|-------|---------|-------------|
| `service_name` | (required) | Nomad job ID |
| `flake_output` | `dockerImage` | Flake output to build |
| `registry` | `gitea.v.paler.net` | Container registry |
Example with custom flake output:
```yaml
jobs:
deploy:
uses: ppetru/alo-cluster/.gitea/workflows/deploy-nomad.yaml@master
with:
service_name: myservice
flake_output: packages.x86_64-linux.docker
secrets: inherit
```
## How It Works
```
Push to master
Build: nix build .#dockerImage
Push: skopeo → gitea.v.paler.net/ppetru/<service>:latest
Deploy: Update job meta.uuid → Nomad creates deployment
Monitor: Poll deployment status for up to 5 minutes
Success: Deployment healthy
OR
Failure: Nomad auto-reverts to previous version
```
## Troubleshooting
### Build fails with KVM error
```
Required system: 'x86_64-linux' with features {kvm}
```
Use `extraCommands` instead of `runAsRoot` in your `docker.nix`:
```nix
# Bad - requires KVM
runAsRoot = ''
mkdir -p /tmp
'';
# Good - no KVM needed
extraCommands = ''
mkdir -p tmp
chmod 1777 tmp
'';
```
### No deployment created
Ensure your Nomad job has the `update` stanza with `auto_revert = true`.
### Image not updating
Check that `force_pull = true` is set in the Nomad job's Docker config.
### Deployment fails health checks
- Check your `/healthz` endpoint works
- Increase `healthy_deadline` if startup is slow
- Check `nomad alloc logs <alloc-id>` for errors
### Workflow can't access alo-cluster
If Gitea can't pull the reusable workflow, you may need to make alo-cluster public or use a token. As a fallback, copy the workflow content directly.
## Manual Deployment
If CI fails, you can deploy manually:
```bash
cd <service-repo>
nix build .#dockerImage
skopeo copy --dest-authfile ~/.docker/config.json \
docker-archive:result \
docker://gitea.v.paler.net/ppetru/<service>:latest
nomad run /path/to/alo-cluster/services/<service>.hcl
```
## Rollback
Nomad auto-reverts on health check failure. For manual rollback:
```bash
nomad job history <service> # List versions
nomad job revert <service> <version> # Revert to specific version
```