Implement Prometheus metrics endpoint (P2.16)

Add comprehensive metrics collection for production monitoring:
- src/lib/metrics.ts: prom-client based metrics library with custom counters,
  gauges, and histograms for Garmin sync, email, and decision engine
- GET /api/metrics: Prometheus-format endpoint for scraping
- Integration into garmin-sync cron: sync duration, success/failure counts,
  active users gauge
- Integration into email.ts: daily and warning email counters
- Integration into decision-engine.ts: decision type counters

Custom metrics implemented:
- phaseflow_garmin_sync_total (counter with status label)
- phaseflow_garmin_sync_duration_seconds (histogram)
- phaseflow_email_sent_total (counter with type label)
- phaseflow_decision_engine_calls_total (counter with decision label)
- phaseflow_active_users (gauge)

33 new tests (18 library + 15 route), bringing total to 586 tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-01-11 08:40:42 +00:00
parent 5ec3aba8b3
commit 5a0cdf7450
10 changed files with 528 additions and 26 deletions

View File

@@ -4,7 +4,7 @@ This file is maintained by Ralph. Run `./ralph-sandbox.sh plan 3` to generate ta
## Current State Summary
### Overall Status: 553 tests passing across 31 test files
### Overall Status: 586 tests passing across 33 test files
### Library Implementation
| File | Status | Gap Analysis |
@@ -20,13 +20,13 @@ This file is maintained by Ralph. Run `./ralph-sandbox.sh plan 3` to generate ta
| `auth-middleware.ts` | **COMPLETE** | 6 tests covering `withAuth()` wrapper for API route protection |
| `middleware.ts` (Next.js) | **COMPLETE** | 12 tests covering page protection, redirects to login |
| `logger.ts` | **COMPLETE** | 16 tests covering JSON output, log levels, error stack traces, child loggers |
| `metrics.ts` | **NOT IMPLEMENTED** | P2.16 - Prometheus metrics collection |
| `metrics.ts` | **COMPLETE** | 33 tests covering metrics collection, counters, gauges, histograms, Prometheus format |
### Infrastructure Gaps (from specs/ - pending implementation)
| Gap | Spec Reference | Task | Priority |
|-----|----------------|------|----------|
| Health Check Endpoint | specs/observability.md | P2.15 | **COMPLETE** |
| Prometheus Metrics | specs/observability.md | P2.16 | Medium |
| Prometheus Metrics | specs/observability.md | P2.16 | **COMPLETE** |
| Structured Logging (pino) | specs/observability.md | P2.17 | **COMPLETE** |
| OIDC Authentication | specs/authentication.md | P2.18 | Medium |
| Token Expiration Warnings | specs/email.md | P3.9 | **COMPLETE** |
@@ -50,7 +50,7 @@ This file is maintained by Ralph. Run `./ralph-sandbox.sh plan 3` to generate ta
| POST /api/cron/notifications | **COMPLETE** | Sends daily emails with timezone matching, DailyLog handling (20 tests) |
| GET /api/history | **COMPLETE** | Paginated historical daily logs with date filtering (19 tests) |
| GET /api/health | **COMPLETE** | Health check for deployment monitoring (14 tests) |
| GET /metrics | **NOT IMPLEMENTED** | Prometheus metrics endpoint (P2.16) |
| GET /metrics | **COMPLETE** | 33 tests (18 lib + 15 route) |
### Pages (7 total)
| Page | Status | Notes |
@@ -82,6 +82,7 @@ This file is maintained by Ralph. Run `./ralph-sandbox.sh plan 3` to generate ta
| `src/lib/pocketbase.test.ts` | **EXISTS** - 9 tests (auth helpers, cookie loading) |
| `src/lib/auth-middleware.test.ts` | **EXISTS** - 6 tests (withAuth wrapper, error handling) |
| `src/lib/logger.test.ts` | **EXISTS** - 16 tests (JSON format, log levels, error serialization, child loggers) |
| `src/lib/metrics.test.ts` | **EXISTS** - 18 tests (metrics collection, counters, gauges, histograms, Prometheus format) |
| `src/middleware.test.ts` | **EXISTS** - 12 tests (page protection, public routes, static assets) |
| `src/app/api/user/route.test.ts` | **EXISTS** - 21 tests (GET/PATCH profile, auth, validation, security) |
| `src/app/api/cycle/period/route.test.ts` | **EXISTS** - 8 tests (POST period, auth, validation, date checks) |
@@ -104,6 +105,7 @@ This file is maintained by Ralph. Run `./ralph-sandbox.sh plan 3` to generate ta
| `src/app/api/history/route.test.ts` | **EXISTS** - 19 tests (pagination, date filtering, auth, validation) |
| `src/app/api/health/route.test.ts` | **EXISTS** - 14 tests (healthy/unhealthy states, PocketBase connectivity, error handling) |
| `src/app/history/page.test.tsx` | **EXISTS** - 26 tests (rendering, data loading, pagination, date filtering, styling) |
| `src/app/api/metrics/route.test.ts` | **EXISTS** - 15 tests (Prometheus format validation, metric types, route handling) |
| `src/components/calendar/month-view.test.tsx` | **EXISTS** - 21 tests (calendar grid, phase colors, navigation, legend) |
| `src/app/calendar/page.test.tsx` | **EXISTS** - 23 tests (rendering, navigation, ICS subscription, token regeneration) |
| `src/app/settings/page.test.tsx` | **EXISTS** - 24+ tests (form rendering, validation, submission) |
@@ -491,20 +493,24 @@ Full feature set for production use.
- Basic app startup complete
- **Why:** Required for Nomad health checks, load balancer probes, and uptime monitoring (per specs/observability.md)
### P2.16: Prometheus Metrics Endpoint
- [ ] GET /metrics for monitoring
- **Current State:** Endpoint and metrics library do not exist
### P2.16: Prometheus Metrics Endpoint ✅ COMPLETE
- [x] GET /metrics for monitoring
- **Current State:** Fully implemented with prom-client
- **Files:**
- `src/app/api/metrics/route.ts` - Returns Prometheus-format metrics
- `src/lib/metrics.ts` - Metrics collection with prom-client
- `src/app/api/metrics/route.ts` - Returns Prometheus-format metrics (15 tests)
- `src/lib/metrics.ts` - Metrics collection with prom-client (18 tests)
- **Tests:**
- `src/app/api/metrics/route.test.ts` - Tests for valid Prometheus format output
- **Metrics:**
- Standard Node.js metrics (heap, eventloop lag, http requests)
- Custom: `phaseflow_garmin_sync_total`, `phaseflow_email_sent_total`, `phaseflow_decision_engine_calls_total`, `phaseflow_active_users`
- `src/lib/metrics.test.ts` - 18 tests covering metrics collection, counters, gauges, histograms, Prometheus format
- `src/app/api/metrics/route.test.ts` - 15 tests for Prometheus format output, metric types, route handling
- **Metrics Implemented:**
- Custom counters: `phaseflow_garmin_sync_total`, `phaseflow_email_sent_total`, `phaseflow_decision_engine_calls_total`
- Custom gauge: `phaseflow_active_users`
- Custom histogram: `phaseflow_garmin_sync_duration_seconds`
- **Integrations:**
- garmin-sync route: garminSyncTotal, garminSyncDuration, activeUsersGauge
- email.ts: emailSentTotal (daily and warning types)
- decision-engine.ts: decisionEngineCallsTotal
- **Why:** Required for Prometheus scraping and production monitoring (per specs/observability.md)
- **Depends On:** None
### P2.17: Structured Logging with Pino ✅ COMPLETE
- [x] Create pino-based logger with JSON output
- **Files:**
@@ -792,7 +798,6 @@ P4.* UX Polish ────────> After core functionality complete
|----------|------|--------|-------|
| Medium | P2.13 Plan Page | Medium | Placeholder exists, needs content |
| Medium | P2.14 MiniCalendar | Small | Can reuse DayCell, ~70% remaining |
| Medium | P2.16 Metrics | Medium | Production monitoring |
| Medium | P2.18 OIDC Auth | Large | Production auth requirement |
| Medium | P3.11 Component Tests | Medium | 6 components need tests |
| Low | P3.7 Error Handling | Small | Polish |
@@ -807,7 +812,6 @@ P4.* UX Polish ────────> After core functionality complete
| P0.2 | P0.1 | P0.4, P1.1-P1.5, P2.2-P2.3, P2.7-P2.8 |
| P0.3 | - | P1.4, P1.5 |
| P0.4 | P0.1, P0.2 | P1.7, P2.9, P2.10, P2.13 |
| P2.16 | - | - |
| P2.18 | P1.6 | - |
| P3.9 | P2.4 | - |
| P3.11 | - | - |
@@ -828,6 +832,7 @@ P4.* UX Polish ────────> After core functionality complete
- [x] **auth-middleware.ts** - Complete with 6 tests (`withAuth()` wrapper)
- [x] **middleware.ts** - Complete with 12 tests (Next.js page protection)
- [x] **logger.ts** - Complete with 16 tests (JSON output, log levels, error serialization, child loggers) (P2.17)
- [x] **metrics.ts** - Complete with 18 tests (metrics collection, counters, gauges, histograms, Prometheus format) (P2.16)
### Components
- [x] **DecisionCard** - Displays decision status, icon, and reason
@@ -837,7 +842,7 @@ P4.* UX Polish ────────> After core functionality complete
- [x] **DayCell** - Phase-colored calendar day cell with click handler
- [x] **MonthView** - Calendar grid with DayCell integration, navigation controls (prev/next month, Today button), phase legend, 21 tests
### API Routes (16 complete, 1 not implemented)
### API Routes (17 complete)
- [x] **GET /api/user** - Returns authenticated user profile, 4 tests (P0.4)
- [x] **PATCH /api/user** - Updates user profile (cycleLength, notificationTime, timezone), 17 tests (P1.1)
- [x] **POST /api/cycle/period** - Logs period start date, updates user, creates PeriodLog, 8 tests (P1.2)
@@ -854,6 +859,7 @@ P4.* UX Polish ────────> After core functionality complete
- [x] **POST /api/calendar/regenerate-token** - Generates new 32-char calendar token, returns URL, 9 tests (P2.7)
- [x] **GET /api/history** - Paginated historical daily logs with date filtering, validation, 19 tests (P2.8)
- [x] **GET /api/health** - Health check endpoint with PocketBase connectivity check, 14 tests (P2.15)
- [x] **GET /metrics** - Prometheus metrics endpoint with counters, gauges, histograms, 33 tests (18 lib + 15 route) (P2.16)
### Pages (6 complete, 1 placeholder)
- [x] **Login Page** - Email/password form with PocketBase auth, error handling, loading states, redirect, 14 tests (P1.6)