Files
alo-cluster/docs/CICD_SETUP.md
Petru Paler ed2c899915 Add reusable CI/CD workflow and documentation
- .gitea/workflows/deploy-nomad.yaml: Shared workflow for build/push/deploy
- docs/CICD_SETUP.md: Guide for adding CI/CD to new services
- nix-runner/README.md: Document the custom Nix runner image

Services can now use a 10-line workflow that calls the shared one:
  uses: ppetru/alo-cluster/.gitea/workflows/deploy-nomad.yaml@master

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-05 07:47:01 +00:00

4.4 KiB

CI/CD Setup for Nomad Services

Guide for adding automated builds and deployments to a service.

Prerequisites

1. Service Repository

Your service needs a flake.nix that exports a Docker image:

{
  outputs = { self, nixpkgs, ... }: {
    # The workflow looks for this output by default
    dockerImage = pkgs.dockerTools.buildImage {
      name = "gitea.v.paler.net/ppetru/<service>";
      tag = "latest";
      # ... image config
    };
  };
}

Important: Use extraCommands instead of runAsRoot in your Docker build - the CI runner doesn't have KVM.

2. Nomad Job

Your job in services/<name>.hcl needs:

job "<service>" {
  # Required: UUID changes trigger deployments
  meta {
    uuid = uuidv4()
  }

  # Required: enables deployment tracking and auto-rollback
  update {
    max_parallel      = 1
    health_check      = "checks"
    min_healthy_time  = "30s"
    healthy_deadline  = "5m"
    auto_revert       = true
  }

  # Required: pulls new image on each deployment
  task "app" {
    config {
      force_pull = true
    }

    # Recommended: health check for deployment validation
    service {
      check {
        type     = "http"
        path     = "/healthz"
        interval = "10s"
        timeout  = "5s"
      }
    }
  }
}

Quick Start

1. Create Workflow

Add .gitea/workflows/deploy.yaml to your service repo:

name: Deploy

on:
  push:
    branches: [master]
  workflow_dispatch:

jobs:
  deploy:
    uses: ppetru/alo-cluster/.gitea/workflows/deploy-nomad.yaml@master
    with:
      service_name: <your-service>  # Must match Nomad job ID
    secrets: inherit

2. Add Secrets

In Gitea → Your Repo → Settings → Actions → Secrets, add:

Secret Value
REGISTRY_USERNAME ppetru
REGISTRY_PASSWORD Gitea access token with packages:write
NOMAD_ADDR http://nomad.service.consul:4646

3. Push

Push to master branch. The workflow will:

  1. Build your Docker image with Nix
  2. Push to Gitea registry
  3. Update the Nomad job to trigger deployment
  4. Monitor until deployment succeeds or fails

Workflow Options

The shared workflow accepts these inputs:

Input Default Description
service_name (required) Nomad job ID
flake_output dockerImage Flake output to build
registry gitea.v.paler.net Container registry

Example with custom flake output:

jobs:
  deploy:
    uses: ppetru/alo-cluster/.gitea/workflows/deploy-nomad.yaml@master
    with:
      service_name: myservice
      flake_output: packages.x86_64-linux.docker
    secrets: inherit

How It Works

Push to master
     ↓
Build: nix build .#dockerImage
     ↓
Push: skopeo → gitea.v.paler.net/ppetru/<service>:latest
     ↓
Deploy: Update job meta.uuid → Nomad creates deployment
     ↓
Monitor: Poll deployment status for up to 5 minutes
     ↓
Success: Deployment healthy
   OR
Failure: Nomad auto-reverts to previous version

Troubleshooting

Build fails with KVM error

Required system: 'x86_64-linux' with features {kvm}

Use extraCommands instead of runAsRoot in your docker.nix:

# Bad - requires KVM
runAsRoot = ''
  mkdir -p /tmp
'';

# Good - no KVM needed
extraCommands = ''
  mkdir -p tmp
  chmod 1777 tmp
'';

No deployment created

Ensure your Nomad job has the update stanza with auto_revert = true.

Image not updating

Check that force_pull = true is set in the Nomad job's Docker config.

Deployment fails health checks

  • Check your /healthz endpoint works
  • Increase healthy_deadline if startup is slow
  • Check nomad alloc logs <alloc-id> for errors

Workflow can't access alo-cluster

If Gitea can't pull the reusable workflow, you may need to make alo-cluster public or use a token. As a fallback, copy the workflow content directly.

Manual Deployment

If CI fails, you can deploy manually:

cd <service-repo>
nix build .#dockerImage
skopeo copy --dest-authfile ~/.docker/config.json \
  docker-archive:result \
  docker://gitea.v.paler.net/ppetru/<service>:latest
nomad run /path/to/alo-cluster/services/<service>.hcl

Rollback

Nomad auto-reverts on health check failure. For manual rollback:

nomad job history <service>          # List versions
nomad job revert <service> <version> # Revert to specific version