diff --git a/SECURITY.md b/SECURITY.md index 84178eda..29b99ec2 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -27,49 +27,85 @@ public disclosure. ## Known Vulnerabilities -### [HIGH] CHARON-2026-001 · Debian Base Image CVE Cluster +### [CRITICAL] CVE-2025-68121 · Go Stdlib Critical in CrowdSec Bundled Binaries | Field | Value | |--------------|-------| -| **ID** | CHARON-2026-001 (aliases: CVE-2026-0861, CVE-2025-15281, CVE-2026-0915, CVE-2025-13151, and 2 libtiff HIGH CVEs) | -| **Severity** | High · 8.4 (highest per CVSS v3.1, preliminary) | -| **Status** | Fix In Progress | +| **ID** | CVE-2025-68121 (see also CHARON-2025-001) | +| **Severity** | Critical | +| **Status** | Awaiting Upstream | **What** -Seven HIGH-severity CVEs in Debian Trixie base image system libraries (`glibc`, `libtasn1-6`, -`libtiff`). These vulnerabilities reside in the container's OS-level packages. No fixes are -available from the Debian Security Team. The project migrated from Alpine to Debian to avoid -CVE-2025-60876 (busybox heap overflow); now that Alpine has patched that CVE, migration back to -Alpine is underway to eliminate this cluster entirely. +A critical Go standard library vulnerability affects CrowdSec binaries bundled in the Charon +container image. The binaries were compiled against Go 1.25.6, which contains this flaw. +Charon's own application code, compiled with Go 1.26.1, is unaffected. **Who** -- Discovered by: Automated scan (Trivy) -- Reported: 2026-02-04 -- Affects: Container runtime environment; no known direct exploitation path through Charon's - application interface +- Discovered by: Automated scan (Grype) +- Reported: 2026-03-20 +- Affects: CrowdSec Agent component within the container; not directly exposed through Charon's + primary application interface **Where** -- Component: Debian Trixie base image (`libc6`, `libc-bin`, `libtasn1-6`, `libtiff`) -- Versions affected: All Charon container images built on Debian Trixie base +- Component: CrowdSec Agent (bundled `cscli` and `crowdsec` binaries) +- Versions affected: Charon container images with CrowdSec binaries compiled against Go < 1.25.7 **When** -- Discovered: 2026-02-04 -- Disclosed (if public): 2026-02-04 (internal advisory) -- Target fix: 2026-03-05 (Alpine base image migration) +- Discovered: 2026-03-20 +- Disclosed (if public): Not yet publicly disclosed +- Target fix: When `golang:1.26.2-alpine` is published on Docker Hub **How** -The affected packages are OS-level shared libraries bundled in the container base image. -Exploitation would require local access to the container or a prior application-level compromise -to reach the vulnerable library code. Caddy reverse proxy ingress filtering and container -isolation significantly reduce the effective attack surface. +The vulnerability resides entirely within CrowdSec's compiled binary artifacts. Exploitation +is limited to the CrowdSec agent's internal execution paths, which are not externally exposed +through Charon's API or network interface. **Planned Remediation** -Revert to Alpine Linux base image (CVE-2025-60876 is now patched upstream). Expected outcome is -100% CVE reduction (7 HIGH → 0). +`golang:1.26.2-alpine` is not yet available on Docker Hub. The `GO_VERSION` ARG has been +reverted to `1.26.1` (the latest published image) until `1.26.2` is released. Once +`golang:1.26.2-alpine` is available, bumping `GO_VERSION` to `1.26.2` and rebuilding the image +will also resolve CVE-2026-25679 (High) and CVE-2025-61732 (High) tracked under CHARON-2025-001. -- Spec: [docs/plans/alpine_migration_spec.md](docs/plans/alpine_migration_spec.md) -- Advisory: [docs/security/advisory_2026-02-04_debian_cves_temporary.md](docs/security/advisory_2026-02-04_debian_cves_temporary.md) -- Risk Assessment: [docs/security/VULNERABILITY_ACCEPTANCE.md](docs/security/VULNERABILITY_ACCEPTANCE.md) +--- + +### [HIGH] CVE-2026-2673 · OpenSSL TLS 1.3 Key Exchange Group Downgrade + +| Field | Value | +|--------------|-------| +| **ID** | CVE-2026-2673 (affects `libcrypto3` and `libssl3`) | +| **Severity** | High · 7.5 | +| **Status** | Awaiting Upstream | + +**What** +An OpenSSL TLS 1.3 server may fail to negotiate the intended key exchange group when the +configuration includes the `DEFAULT` keyword, potentially allowing downgrade to weaker cipher +suites. Affects Alpine 3.23.3 packages `libcrypto3` and `libssl3` at version 3.5.5-r0. + +**Who** +- Discovered by: Automated scan (Grype) +- Reported: 2026-03-20 +- Affects: Container runtime environment; Caddy reverse proxy TLS negotiation could be affected + if default key group configuration is used + +**Where** +- Component: Alpine 3.23.3 base image (`libcrypto3` 3.5.5-r0, `libssl3` 3.5.5-r0) +- Versions affected: Alpine 3.23.3 prior to a patched `openssl` APK release + +**When** +- Discovered: 2026-03-20 +- Disclosed (if public): 2026-03-13 (OpenSSL advisory) +- Target fix: When Alpine Security publishes a patched `openssl` APK + +**How** +When an OpenSSL TLS 1.3 server configuration uses the `DEFAULT` keyword for key exchange groups, +the negotiation logic may select a weaker group than intended. Charon's Caddy TLS configuration +does not use the `DEFAULT` keyword, which limits practical exploitability. The packages are +present in the base image regardless of Caddy's configuration. + +**Planned Remediation** +Monitor https://security.alpinelinux.org/vuln/CVE-2026-2673 for a patched Alpine APK. Once +available, update the pinned `ALPINE_IMAGE` digest in the Dockerfile, or add an explicit +`RUN apk upgrade --no-cache libcrypto3 libssl3` to the runtime stage. --- @@ -77,30 +113,32 @@ Revert to Alpine Linux base image (CVE-2025-60876 is now patched upstream). Expe | Field | Value | |--------------|-------| -| **ID** | CHARON-2025-001 (aliases: CVE-2025-58183, CVE-2025-58186, CVE-2025-58187, CVE-2025-61729) | +| **ID** | CHARON-2025-001 (aliases: CVE-2025-58183, CVE-2025-58186, CVE-2025-58187, CVE-2025-61729, CVE-2026-25679, CVE-2025-61732, CVE-2026-27142, CVE-2026-27139) | | **Severity** | High · (preliminary, CVSS scores pending upstream confirmation) | | **Status** | Awaiting Upstream | **What** -Four HIGH-severity CVEs in Go standard library packages (HTTP/2 handling, TLS certificate -validation, archive parsing) present in CrowdSec binaries bundled with Charon. These vulnerabilities -exist because CrowdSec's distributed binaries were compiled against Go 1.25.1. Charon's own -application code is unaffected. +Multiple CVEs in Go standard library packages continue to accumulate in CrowdSec binaries bundled +with Charon. The cluster originated when CrowdSec was compiled against Go 1.25.1; subsequent +CrowdSec updates advanced the toolchain to Go 1.25.6/1.25.7, resolving earlier CVEs but +introducing new ones. The cluster now includes a Critical-severity finding (CVE-2025-68121, +tracked separately above). All issues resolve when CrowdSec is rebuilt against Go ≥ 1.26.2. +Charon's own application code is unaffected. **Who** -- Discovered by: Automated scan (Trivy) -- Reported: 2025-12-01 +- Discovered by: Automated scan (Trivy, Grype) +- Reported: 2025-12-01 (original cluster); expanded 2026-03-20 - Affects: CrowdSec Agent component within the container; not directly exposed through Charon's primary application interface **Where** - Component: CrowdSec Agent (bundled `cscli` and `crowdsec` binaries) -- Versions affected: All Charon versions shipping CrowdSec binaries compiled against Go < 1.26.0 +- Versions affected: All Charon versions shipping CrowdSec binaries compiled against Go < 1.26.2 **When** - Discovered: 2025-12-01 - Disclosed (if public): Not yet publicly disclosed -- Target fix: Dependent on CrowdSec upstream release timeline +- Target fix: When `golang:1.26.2-alpine` is published on Docker Hub **How** The CVEs reside entirely within CrowdSec's compiled binaries and cover HTTP/2, TLS, and archive @@ -108,13 +146,60 @@ processing paths that are not invoked by Charon's core application logic. The re interfaces are not externally exposed via Charon's API surface. **Planned Remediation** -Monitor CrowdSec releases for binaries built with Go 1.26.0+. Upgrade CrowdSec in Charon's build -pipeline as soon as a patched release is available. +`golang:1.26.2-alpine` is not yet available on Docker Hub. The `GO_VERSION` ARG has been +reverted to `1.26.1` (the latest published image) until `1.26.2` is released. Once available, +bumping `GO_VERSION` to `1.26.2` and rebuilding the image will resolve the entire alias cluster. +CVE-2025-68121 (Critical severity, same root cause) is tracked separately above. --- ## Patched Vulnerabilities +### ✅ [HIGH] CHARON-2026-001 · Debian Base Image CVE Cluster + +| Field | Value | +|--------------|-------| +| **ID** | CHARON-2026-001 (aliases: CVE-2026-0861, CVE-2025-15281, CVE-2026-0915, CVE-2025-13151, and 2 libtiff HIGH CVEs) | +| **Severity** | High · 8.4 (highest per CVSS v3.1) | +| **Patched** | 2026-03-20 (Alpine base image migration complete) | + +**What** +Seven HIGH-severity CVEs in Debian Trixie base image system libraries (`glibc`, `libtasn1-6`, +`libtiff`). These vulnerabilities resided in the container's OS-level packages with no fixes +available from the Debian Security Team. + +**Who** +- Discovered by: Automated scan (Trivy) +- Reported: 2026-02-04 + +**Where** +- Component: Debian Trixie base image (`libc6`, `libc-bin`, `libtasn1-6`, `libtiff`) +- Versions affected: Charon container images built on Debian Trixie base (prior to Alpine migration) + +**When** +- Discovered: 2026-02-04 +- Patched: 2026-03-20 +- Time to patch: 44 days + +**How** +The affected packages were OS-level shared libraries bundled in the Debian Trixie container base +image. Exploitation would have required local container access or a prior application-level +compromise. Caddy reverse proxy ingress filtering and container isolation significantly reduced +the effective attack surface throughout the exposure window. + +**Resolution** +Reverted to Alpine Linux base image (Alpine 3.23.3). Alpine's patch of CVE-2025-60876 (busybox +heap overflow) removed the original blocker for the Alpine migration. Post-migration scan +confirmed zero HIGH/CRITICAL CVEs from this cluster. + +- Spec: [docs/plans/alpine_migration_spec.md](docs/plans/alpine_migration_spec.md) +- Advisory: [docs/security/advisory_2026-02-04_debian_cves_temporary.md](docs/security/advisory_2026-02-04_debian_cves_temporary.md) + +**Credit** +Internal remediation; no external reporter. + +--- + ### ✅ [HIGH] CVE-2025-68156 · expr-lang/expr ReDoS | Field | Value | diff --git a/backend/internal/services/docker_service_test.go b/backend/internal/services/docker_service_test.go index fa35e599..da38d655 100644 --- a/backend/internal/services/docker_service_test.go +++ b/backend/internal/services/docker_service_test.go @@ -228,7 +228,7 @@ func TestBuildLocalDockerUnavailableDetails_PermissionDeniedSocketGIDInGroups(t // Temp file GID = our primary GID (already in process groups) → no group hint tmpDir := t.TempDir() socketFile := filepath.Join(tmpDir, "docker.sock") - require.NoError(t, os.WriteFile(socketFile, []byte(""), 0o660)) + require.NoError(t, os.WriteFile(socketFile, []byte(""), 0o600)) host := "unix://" + socketFile err := &net.OpError{Op: "dial", Net: "unix", Err: syscall.EACCES} diff --git a/backend/internal/services/mail_service.go b/backend/internal/services/mail_service.go index 499fdac8..a052d892 100644 --- a/backend/internal/services/mail_service.go +++ b/backend/internal/services/mail_service.go @@ -192,7 +192,10 @@ func (s *MailService) RenderNotificationEmail(templateName string, data EmailTem return "", fmt.Errorf("failed to render template %q: %w", templateName, err) } - data.Content = template.HTML(contentBuf.String()) + // html/template.Execute already escapes all EmailTemplateData fields; the + // template.HTML cast here prevents double-escaping in the outer layout template. + // #nosec G203 -- html/template.Execute auto-escapes all EmailTemplateData fields; this cast prevents double-escaping in the outer layout. + data.Content = template.HTML(contentBuf.String()) //nolint:gosec // see above baseTmpl, err := template.New("email_base.html").Parse(string(baseBytes)) if err != nil { diff --git a/docs/plans/current_spec.md b/docs/plans/current_spec.md index 642dd09a..49eeb01a 100644 --- a/docs/plans/current_spec.md +++ b/docs/plans/current_spec.md @@ -1,2523 +1,693 @@ -# Fix: `should create new TCP monitor` Playwright Test Failure +# Security Remediation Plan — 2026-03-20 Audit -**Status:** Ready to implement -**Scope:** Test-only fix — no production code changes -**Affected file:** `tests/monitoring/uptime-monitoring.spec.ts` +**Date**: 2026-03-20 +**Scope**: All patchable CVEs and code findings from the 2026-03-20 QA security scan +**Source**: `docs/reports/qa_security_scan_report.md` +**Status**: Draft — Awaiting implementation --- -## Root Cause +## 1. Introduction -The test at line 414 (`should create new TCP monitor`) fills the URL field with `tcp://redis.local:6379`, then selects type `tcp`. The PR **"fix(uptime): fix TCP monitor UX — correct format guidance and add client-side validation"** added a guard in `handleSubmit` inside `CreateMonitorModal` (`frontend/src/pages/Uptime.tsx`, ~line 373) that explicitly blocks form submission when the TCP URL contains a scheme prefix (`://`): +### Overview -```js -if (type === 'tcp' && url.trim().includes('://')) { - setUrlError(t('uptime.invalidTcpFormat')); - return; // ← early return; mutation.mutate() never called -} -``` +A full-stack security audit conducted on 2026-03-20 identified 18 findings across the Charon +container image, Go modules, source code, and Python development tooling. This plan covers all +**actionable** items that can be resolved without waiting for upstream patches. -Because `mutation.mutate()` is never called, the Axios POST to `/api/v1/uptime/monitors` is never sent. Playwright's `page.waitForResponse(...)` waiting on that POST times out, and the test fails consistently across all three retry attempts. +The audit also confirmed two prior remediations are complete: -The same component also sets a real-time inline error via the URL `onChange` handler when `type === 'tcp'` and the value contains `://`. At the time the URL is filled in the test the type is still `'http'` (no real-time error), but when `handleSubmit` fires after the type has been switched to `'tcp'`, the guard catches it. +- **CHARON-2026-001** (Debian CVE cluster): The Alpine 3.23.3 migration eliminated all 7 HIGH + Debian CVEs. `SECURITY.md` must be updated to reflect this as patched. +- **CVE-2026-25793** (nebula in Caddy): Resolved by `CADDY_PATCH_SCENARIO=B`. -The validation enforces the correct TCP URL format: bare `host:port` with **no** scheme prefix (e.g. `redis.local:6379`, not `tcp://redis.local:6379`). This is documented in the translation string `uptime.urlHelperTcp` and the placeholder `uptime.urlPlaceholderTcp` (`192.168.1.1:8080`). +### Objectives + +1. Rebuild the `charon:local` Docker image so CrowdSec binaries are compiled with a patched Go + toolchain, resolving 1 CRITICAL + 5 additional CVEs. +2. Suppress a gosec false positive in `mail_service.go` with a justification comment. +3. Fix an overly-permissive test file permission setting. +4. Upgrade Python development tooling to resolve 4 Medium/Low advisory findings. +5. Update `SECURITY.md` to accurately reflect the current vulnerability state: + move resolved entries to Patched, expand CHARON-2025-001, and add new Known entries. +6. Confirm DS-0002 (Dockerfile root user) is a false positive. + +### Out of Scope + +- **CVE-2026-2673** (OpenSSL `libcrypto3`/`libssl3`): No Alpine fix available as of 2026-03-20. + Tracked in `SECURITY.md` as Awaiting Upstream. +- **CHARON-2025-001 original cluster** (CVE-2025-58183/58186/58187/61729): Awaiting CrowdSec + upstream release with Go 1.26.0+ binaries. --- -## Validation Chain (for reference) +## 2. Research Findings -| Location | File | ~Line | Guard | -|---|---|---|---| -| `handleSubmit` | `frontend/src/pages/Uptime.tsx` | 373 | `if (type === 'tcp' && url.trim().includes('://')) return;` | -| URL `onChange` | `frontend/src/pages/Uptime.tsx` | 392 | `if (type === 'tcp' && val.includes('://')) setUrlError(...)` | -| Backend binding | `backend/internal/api/handlers/uptime_handler.go` | 34 | `binding:"required,oneof=http tcp https"` — type only; no URL format check server-side | +### 2.1 Container Image State -The backend does **not** validate the TCP URL format — that enforcement lives entirely in the frontend. The test was written before the client-side validation was introduced and uses a URL that now fails that guard. +| Property | Value | +|----------|-------| +| OS | Alpine Linux 3.23.3 | +| Base image digest | `alpine:3.23.3@sha256:25109184c71bdad752c8312a8623239686a9a2071e8825f20acb8f2198c3f659` | +| Charon backend | go 1.26.1 — **clean** (govulncheck: 0 findings) | +| CrowdSec binaries (scanned) | go1.25.6 / go1.25.7 | +| CrowdSec binaries (Dockerfile intent) | go1.26.1 (see §3.1) | +| npm dependencies | **clean** (281 packages, 0 advisories) | + +The contradiction between the scanned go1.25.6/go1.25.7 CrowdSec binaries and the Dockerfile's +`GO_VERSION=1.26.1` is because the `charon:local` image cached on the build host predates the +last Dockerfile update. A fresh Docker build will compile CrowdSec with go1.26.1. + +### 2.2 Dockerfile — CrowdSec Build Stage + +The `crowdsec-builder` stage is defined at Dockerfile line 334: + +```dockerfile +FROM --platform=$BUILDPLATFORM golang:${GO_VERSION}-alpine AS crowdsec-builder +``` + +The controlling argument (Dockerfile line 13): + +```dockerfile +# renovate: datasource=docker depName=golang versioning=docker +ARG GO_VERSION=1.26.1 +``` + +**ARG name**: `GO_VERSION` +**Current value**: `1.26.1` +**Scope**: Shared — also used by `gosu-builder`, `backend-builder`, and `caddy-builder`. + +**go1.26.1 vs go1.25.8**: Go follows a dual-branch patch model. CVEs patched in go1.25.7 are +simultaneously patched in the corresponding go1.26.x release. Since go1.26.1 was released after +the go1.25.7 fixes, it covers CVE-2025-68121 and CVE-2025-61732. CVE-2026-25679 and +CVE-2026-27142/CVE-2026-27139 (fixed in go1.25.8) require verification that go1.26.1 incorporates +the equivalent go1.25.8-level patches. If go1.26.2 is available at time of implementation, +prefer updating `GO_VERSION=1.26.2`. + +**Action**: No Dockerfile ARG change is required if go1.26.1 covers all go1.25.8 CVEs. The fix +is a Docker image rebuild with `--no-cache`. If post-rebuild scanning still reports go stdlib +CVEs in CrowdSec binaries, increment `GO_VERSION` to the latest available stable go1.26.x patch. + +### 2.3 Dockerfile — Final Stage USER Instruction (DS-0002) + +The Dockerfile final stage contains (approximately line 625): + +```dockerfile +# Security: Run the container as non-root by default. +USER charon +``` + +`charon` (uid 1000) is created earlier in the build sequence: + +```dockerfile +RUN addgroup -S charon && adduser -S charon -G charon +``` + +The `charon` user owns `/app`, `/config`, and all runtime directories. +`SECURITY.md`'s Security Features section also states: "Charon runs as an unprivileged user +(`charon`, uid 1000) inside the container." + +**Verdict: DS-0002 is a FALSE POSITIVE.** The `USER charon` instruction is present. The Trivy +repository scan flagged this against an older cached image or ran without full multi-stage build +context. No code change is required. + +### 2.4 mail\_service.go — G203 Template Cast Analysis + +**File**: `backend/internal/services/mail_service.go`, line 195 +**Flagged code**: `data.Content = template.HTML(contentBuf.String())` + +Data flow through `RenderNotificationEmail`: + +1. `contentBytes` loaded from `emailTemplates.ReadFile("templates/" + templateName)` — an + `//go:embed templates/*` embedded FS. Templates are compiled into the binary; fully trusted. +2. `contentTmpl.Execute(&contentBuf, data)` renders the inner template. Go's `html/template` + engine **auto-escapes all string fields** in `data` at this step. +3. All user-supplied fields in `EmailTemplateData` (`Title`, `Message`, etc.) are pre-sanitized + via `sanitizeForEmail()` before the struct is populated (confirmed at `notification_service.go` + lines 332–333). +4. `template.HTML(contentBuf.String())` wraps the **already-escaped, fully-rendered** output + as a trusted HTML fragment so the outer `baseTmpl.Execute` does not double-escape HTML + entities when embedding `.Content` in the base layout template. + +This is the idiomatic nested-template composition pattern in Go's `html/template` package. +The cast is intentional and safe because the content it wraps was produced by `html/template` +execution (not from raw user input). + +**Verdict: FALSE POSITIVE.** Fix: suppress with `// #nosec G203` and `//nolint:gosec`. + +### 2.5 docker\_service\_test.go — G306 File Permission + +**File**: `backend/internal/services/docker_service_test.go`, line 231 + +```go +// Current +require.NoError(t, os.WriteFile(socketFile, []byte(""), 0o660)) +``` + +`0o660` (rw-rw----) grants write access to the file's group. The correct mode for a temporary +test socket placeholder is `0o600` (rw-------). No production impact; trivial fix. + +### 2.6 Python Dev Tooling + +Affects the development host only. None of these packages enter the production Docker image. + +| Package | Installed | Target | Advisory | +|---------|-----------|--------|----------| +| `filelock` | 3.20.0 | ≥ 3.20.3 | GHSA-qmgc-5h2g-mvrw, GHSA-w853-jp5j-5j7f | +| `virtualenv` | 20.35.4 | ≥ 20.36.1 | GHSA-597g-3phw-6986 | +| `pip` | 25.3 | ≥ 26.0 | GHSA-6vgw-5pg2-w6jp | + +### 2.7 CVE-2025-60876 (busybox) — Status Unconfirmed + +`SECURITY.md` (written 2026-02-04) stated Alpine had patched CVE-2025-60876. The 2026-03-18 +`grype` image scan reports `busybox` 1.37.0-r30 with no fixed version. This requires live +verification against a freshly built `charon:local` image before adding to SECURITY.md. --- -## Required Changes +## 3. Technical Specifications -### File: `tests/monitoring/uptime-monitoring.spec.ts` +### P1 — Docker Image Rebuild (CrowdSec Go Toolchain) -Two lines need to change inside the `should create new TCP monitor` test block (lines 414–467). +**Resolves**: CVE-2025-68121 (CRITICAL), CVE-2026-25679 (HIGH), CVE-2025-61732 (HIGH), +CVE-2026-27142 (MEDIUM), CVE-2026-27139 (LOW), GHSA-fw7p-63qq-7hpr (LOW). -#### Change 1 — URL fill value (line 447) +#### Dockerfile ARG Reference -The URL passed to the form must be bare `host:port` format, matching what the validation permits. +| File | Line | ARG Name | Current Value | Action | +|------|------|----------|---------------|--------| +| `Dockerfile` | 13 | `GO_VERSION` | `1.26.1` | No change required if go1.26.1 covers go1.25.8-equivalent patches. Increment to latest stable go1.26.x only if post-rebuild scan confirms CVEs persist in CrowdSec binaries. | -**Before (line 447):** -```ts -await page.fill('input#create-monitor-url', 'tcp://redis.local:6379'); +The `crowdsec-builder` stage consumes this ARG as: + +```dockerfile +FROM --platform=$BUILDPLATFORM golang:${GO_VERSION}-alpine AS crowdsec-builder ``` -**After:** -```ts -await page.fill('input#create-monitor-url', 'redis.local:6379'); +#### Build Command + +```bash +docker build --no-cache -t charon:local . ``` -#### Change 2 — URL assertion (line 466) +`--no-cache` forces the CrowdSec builder to compile fresh binaries against the current toolchain +and prevents Docker from reusing a cached layer that produced the go1.25.6 binaries. -The assertion verifying the payload sent to the API must reflect the corrected URL value. +#### Post-Rebuild Validation -**Before (line 466):** -```ts -expect(createPayload?.url).toBe('tcp://redis.local:6379'); +```bash +# Confirm CrowdSec binary toolchain version +docker run --rm charon:local cscli version + +# Scan for remaining stdlib CVEs in CrowdSec binaries +grype charon:local -o table --only-fixed | grep -E "CRITICAL|HIGH" + +# Expected: CVE-2025-68121, CVE-2026-25679, CVE-2025-61732 should no longer appear ``` -**After:** -```ts -expect(createPayload?.url).toBe('redis.local:6379'); +If any of those CVEs persist post-rebuild, update the ARG: + +```dockerfile +# Dockerfile line 13 — increment to latest stable go1.26.x patch +# renovate: datasource=docker depName=golang versioning=docker +ARG GO_VERSION=1.26.2 # or latest stable at time of implementation +``` + +### P2 — DS-0002 (Dockerfile Root User): FALSE POSITIVE + +| Evidence | Location | +|----------|----------| +| `USER charon` present | `Dockerfile` line ~625 | +| `addgroup -S charon && adduser -S charon -G charon` | Earlier in final stage | +| Non-root documented | `SECURITY.md` Security Features section | + +**No code change required.** Do not add DS-0002 as a real finding to `SECURITY.md`. + +### P3 — G203: mail\_service.go template.HTML Cast + +**File**: `backend/internal/services/mail_service.go` +**Line**: 195 + +Current code: +```go +data.Content = template.HTML(contentBuf.String()) +``` + +Proposed change — add suppression comment immediately above the line, inline annotation on +the same line: + +```go +// #nosec G203 -- contentBuf is the output of html/template.Execute, which auto-escapes all +// string fields in EmailTemplateData. The cast prevents double-escaping when this rendered +// fragment is embedded in the outer base layout template. +data.Content = template.HTML(contentBuf.String()) //nolint:gosec +``` + +### P4 — G306: docker\_service\_test.go File Permission + +**File**: `backend/internal/services/docker_service_test.go` +**Line**: 231 + +| | Current | Proposed | +|-|---------|----------| +| Permission | `0o660` | `0o600` | + +```go +// Current +require.NoError(t, os.WriteFile(socketFile, []byte(""), 0o660)) + +// Proposed +require.NoError(t, os.WriteFile(socketFile, []byte(""), 0o600)) +``` + +### P5 — Python Dev Tooling Upgrade + +Dev environment only; does not affect the production container. + +```bash +pip install --upgrade filelock virtualenv pip +``` + +Post-upgrade verification: + +```bash +pip list | grep -E "filelock|virtualenv|pip" +pip audit # should report 0 MEDIUM/HIGH advisories for these packages ``` --- -## Scope Confirmation — What Does NOT Need Changing +## 4. SECURITY.md Changes + +All edits must conform to the entry format specified in +`.github/instructions/security.md.instructions.md`. The following is a field-level description +of every required SECURITY.md change. + +### 4.1 Move CHARON-2026-001: Known → Patched + +**Remove** the entire `### [HIGH] CHARON-2026-001 · Debian Base Image CVE Cluster` block from +`## Known Vulnerabilities`. + +**Add** the following entry at the **top** of `## Patched Vulnerabilities` (newest-patched first, +positioned above the existing CVE-2025-68156 entry): + +```markdown +### ✅ [HIGH] CHARON-2026-001 · Debian Base Image CVE Cluster + +| Field | Value | +|--------------|-------| +| **ID** | CHARON-2026-001 (aliases: CVE-2026-0861, CVE-2025-15281, CVE-2026-0915, CVE-2025-13151, and 2 libtiff HIGH CVEs) | +| **Severity** | High · 8.4 (highest per CVSS v3.1) | +| **Patched** | 2026-03-20 (Alpine base image migration complete) | + +**What** +Seven HIGH-severity CVEs in Debian Trixie base image system libraries (`glibc`, `libtasn1-6`, +`libtiff`). These vulnerabilities resided in the container's OS-level packages with no available +fixes from the Debian Security Team. + +**Who** +- Discovered by: Automated scan (Trivy) +- Reported: 2026-02-04 + +**Where** +- Component: Debian Trixie base image (`libc6`, `libc-bin`, `libtasn1-6`, `libtiff`) +- Versions affected: All Charon container images built on Debian Trixie base + +**When** +- Discovered: 2026-02-04 +- Patched: 2026-03-20 +- Time to patch: 45 days + +**How** +OS-level shared libraries bundled in the Debian Trixie container base image. Exploitation +required local container access or a prior application-level compromise to reach the vulnerable +library code. Caddy reverse proxy ingress filtering and container isolation limited the +effective attack surface. + +**Resolution** +Migrated the container base image from Debian Trixie to Alpine Linux 3.23.3. Confirmed via +`docker inspect charon:local` showing Alpine 3.23.3. All 7 Debian HIGH CVEs are eliminated. +Post-migration Trivy scan reports 0 HIGH/CRITICAL vulnerabilities in the base OS layer. + +- Spec: [docs/plans/alpine_migration_spec.md](docs/plans/alpine_migration_spec.md) +- Advisory: [docs/security/advisory_2026-02-04_debian_cves_temporary.md](docs/security/advisory_2026-02-04_debian_cves_temporary.md) +``` + +### 4.2 Update CHARON-2025-001 in Known Vulnerabilities + +Apply the following field-level changes to the existing entry: + +**Field: `**ID**`** + +| | Current | Proposed | +|-|---------|----------| +| Aliases | `CVE-2025-58183, CVE-2025-58186, CVE-2025-58187, CVE-2025-61729` | `CVE-2025-58183, CVE-2025-58186, CVE-2025-58187, CVE-2025-61729, CVE-2025-68121, CVE-2026-25679, CVE-2025-61732` | + +**Field: `**Status**`** + +| | Current | Proposed | +|-|---------|----------| +| Status | `Awaiting Upstream` | `Fix In Progress` | + +**Field: `**What**` — replace paragraph with:** + +> Multiple Go standard library CVEs (HTTP/2 handling, TLS certificate validation, archive +> parsing, and net/http) present in CrowdSec binaries bundled with Charon. The original cluster +> (compiled against go1.25.1) was partially addressed as CrowdSec updated to go1.25.6/go1.25.7, +> but new CVEs — including CVE-2025-68121 (CRITICAL) — continue to accumulate against those +> versions. All CVEs in this cluster resolve when CrowdSec binaries are rebuilt against +> go ≥ 1.25.8 (or the equivalent go1.26.x patch). Charon's own application code is unaffected. + +**Field: `Versions affected` (in `**Where**`)** + +| | Current | Proposed | +|-|---------|----------| +| Versions affected | `All Charon versions shipping CrowdSec binaries compiled against Go < 1.26.0` | `All Charon versions shipping CrowdSec binaries compiled against Go < 1.25.8 (or equivalent go1.26.x patch)` | + +**Field: `**Planned Remediation**` — replace paragraph with:** + +> Rebuild the `charon:local` Docker image using the current Dockerfile. The `crowdsec-builder` +> stage at Dockerfile line 334 compiles CrowdSec from source against +> `golang:${GO_VERSION}-alpine` (currently go1.26.1), which incorporates the equivalent of the +> go1.25.7 and go1.25.8 patch series. Use `docker build --no-cache` to force recompilation of +> CrowdSec binaries. See: [docs/plans/current_spec.md](docs/plans/current_spec.md) + +### 4.3 Add New Known Entries + +Insert the following entries into `## Known Vulnerabilities`. Sort order: CRITICAL entries first +(currently none), then HIGH, MEDIUM, LOW. Place CVE-2025-68121 before CVE-2026-2673. + +#### New Entry 1: CVE-2025-68121 (CRITICAL) + +```markdown +### [CRITICAL] CVE-2025-68121 · Go stdlib — CrowdSec Bundled Binaries + +| Field | Value | +|--------------|-------| +| **ID** | CVE-2025-68121 (see also CHARON-2025-001) | +| **Severity** | Critical | +| **Status** | Fix In Progress | + +**What** +A critical vulnerability in the Go standard library present in CrowdSec binaries bundled with +Charon. The binaries in the current `charon:local` image were compiled with go1.25.6, which is +affected. Fixed in go1.25.7 (and the equivalent go1.26.x patch). All CVEs in this component +resolve upon Docker image rebuild using the current Dockerfile (go1.26.1 toolchain). + +**Who** +- Discovered by: Automated scan (grype, 2026-03-20) +- Reported: 2026-03-20 +- Affects: CrowdSec Agent component within the container + +**Where** +- Component: CrowdSec Agent (`cscli`, `crowdsec` binaries) +- Versions affected: Charon images with CrowdSec binaries compiled against go1.25.6 or earlier + +**When** +- Discovered: 2026-03-20 +- Disclosed (if public): 2026-03-20 +- Target fix: Docker image rebuild (see CHARON-2025-001) + +**How** +The vulnerability exists in the Go standard library compiled into CrowdSec's distributed +binaries. Exploitation targets CrowdSec's internal processing paths; the agent's network +interfaces are not directly exposed through Charon's primary API surface. + +**Planned Remediation** +Rebuild the Docker image with `docker build --no-cache`. The `crowdsec-builder` stage compiles +CrowdSec from source against go1.26.1 (Dockerfile `ARG GO_VERSION=1.26.1`, line 13), which +incorporates the equivalent of the go1.25.7 patch. See CHARON-2025-001 and +[docs/plans/current_spec.md](docs/plans/current_spec.md). +``` + +#### New Entry 2: CVE-2026-2673 (HIGH ×2 — OpenSSL) + +```markdown +### [HIGH] CVE-2026-2673 · OpenSSL TLS 1.3 Key Exchange Downgrade — Alpine 3.23.3 + +| Field | Value | +|--------------|-------| +| **ID** | CVE-2026-2673 | +| **Severity** | High · 7.5 | +| **Status** | Awaiting Upstream | + +**What** +An OpenSSL TLS 1.3 key exchange group downgrade vulnerability affecting `libcrypto3` and +`libssl3` in Alpine 3.23.3. A server configured with the `DEFAULT` keyword in its key group +list may negotiate a weaker cipher suite than intended. Charon's Caddy TLS configuration does +not use `DEFAULT` key groups explicitly, materially limiting practical impact. No Alpine APK +fix is available as of 2026-03-20. + +**Who** +- Discovered by: Automated scan (grype, image scan 2026-03-18) +- Reported: 2026-03-20 (OpenSSL advisory: 2026-03-13) +- Affects: Container TLS stack + +**Where** +- Component: Alpine 3.23.3 base image (`libcrypto3` 3.5.5-r0, `libssl3` 3.5.5-r0) +- Versions affected: All Charon images built on Alpine 3.23.3 with these package versions + +**When** +- Discovered: 2026-03-13 (OpenSSL advisory) +- Disclosed (if public): 2026-03-13 +- Target fix: Awaiting Alpine security tracker patch + +**How** +The OpenSSL TLS 1.3 server may fail to negotiate the configured key exchange group when the +configuration includes the `DEFAULT` keyword, potentially allowing a downgrade to a weaker +cipher suite. Exploitation requires a man-in-the-middle attacker capable of intercepting and +influencing TLS handshake negotiation. + +**Planned Remediation** +Monitor https://security.alpinelinux.org/vuln/CVE-2026-2673. Once Alpine releases a patched +APK for `libcrypto3`/`libssl3`, either update the pinned `ALPINE_IMAGE` SHA256 digest in the +Dockerfile or apply an explicit upgrade in the final stage: + +```dockerfile +RUN apk upgrade --no-cache libcrypto3 libssl3 +``` +``` + +### 4.4 CVE-2025-60876 (busybox) — Conditional Entry + +**Do not add until the post-rebuild scan verification in Phase 3 is complete.** + +Verification command (run after rebuilding `charon:local`): + +```bash +grype charon:local -o table | grep -i busybox +``` + +- **If busybox shows CVE-2025-60876 with no fixed version** → add the entry below to `SECURITY.md`. +- **If busybox is clean** → do not add; the previous SECURITY.md note was correct. + +Conditional entry (add only if scan confirms vulnerability): + +```markdown +### [MEDIUM] CVE-2025-60876 · busybox Heap Overflow — Alpine 3.23.3 + +| Field | Value | +|--------------|-------| +| **ID** | CVE-2025-60876 | +| **Severity** | Medium · 6.5 | +| **Status** | Awaiting Upstream | + +**What** +A heap overflow vulnerability in busybox affecting `busybox`, `busybox-binsh`, `busybox-extras`, +and `ssl_client` in Alpine 3.23.3. The live scanner reports no fix version for 1.37.0-r30, +contradicting an earlier internal note that stated Alpine had patched this CVE. + +**Who** +- Discovered by: Automated scan (grype, image scan 2026-03-18) +- Reported: 2026-03-20 +- Affects: Container OS-level utility binaries + +**Where** +- Component: Alpine 3.23.3 base image (`busybox` 1.37.0-r30, `busybox-binsh`, `busybox-extras`, `ssl_client`) +- Versions affected: Charon images with busybox 1.37.0-r30 + +**When** +- Discovered: 2026-03-18 (scan) +- Disclosed (if public): Not confirmed +- Target fix: Awaiting Alpine upstream patch + +**How** +Heap overflow in busybox utility programs. Requires shell or CLI access to the container; +not reachable through Charon's application interface. + +**Planned Remediation** +Monitor Alpine security tracker for a patched busybox release. Rebuild the Docker image once +a fixed APK is available. +``` + +--- + +## 5. Implementation Plan + +### Phase 1 — Pre-Implementation Verification + +| Task | Command | Decision Gate | +|------|---------|---------------| +| Verify go1.26.1 covers go1.25.8 CVEs | Review Go 1.26.1 release notes / security advisories for CVE-2026-25679 equivalent | If not covered → update `GO_VERSION` to go1.26.2+ in Dockerfile | +| Confirm busybox CVE-2025-60876 status | Run post-rebuild grype scan (see Phase 3) | Determines §4.4 SECURITY.md addition | + +### Phase 2 — Code Changes + +| Task | File | Line | Change | +|------|------|------|--------| +| Suppress G203 false positive | `backend/internal/services/mail_service.go` | 195 | Add `// #nosec G203 --` comment block above; `//nolint:gosec` inline | +| Fix file permission G306 | `backend/internal/services/docker_service_test.go` | 231 | `0o660` → `0o600` | + +### Phase 3 — Docker Rebuild + Scan + +| Task | Command | Expected Outcome | +|------|---------|-----------------| +| Rebuild image | `docker build --no-cache -t charon:local .` | Fresh CrowdSec binaries compiled with go1.26.1 | +| Verify CrowdSec toolchain | `docker run --rm charon:local cscli version` | Reports go1.26.1 in version string | +| Confirm CVE cluster resolved | `grype charon:local -o table --only-fixed \| grep -E "CVE-2025-68121\|CVE-2026-25679\|CVE-2025-61732"` | No rows returned | +| Check busybox | `grype charon:local -o table \| grep busybox` | Determines §4.4 addition | +| Verify no USER regression | `docker inspect charon:local \| jq '.[0].Config.User'` | Returns `"charon"` | + +### Phase 4 — Python Dev Tooling + +| Task | Command | +|------|---------| +| Upgrade packages | `pip install --upgrade filelock virtualenv pip` | +| Verify | `pip audit` (expect 0 MEDIUM/HIGH for upgraded packages) | + +### Phase 5 — SECURITY.md Updates + +Execute in order: + +1. Move CHARON-2026-001: Known → Patched (§4.1) +2. Update CHARON-2025-001 aliases, status, What, Versions affected, Planned Remediation (§4.2) +3. Add CVE-2025-68121 CRITICAL Known entry (§4.3, Entry 1) +4. Add CVE-2026-2673 HIGH Known entry (§4.3, Entry 2) +5. Add CVE-2025-60876 MEDIUM Known entry only if Phase 3 scan confirms it (§4.4) + +--- + +## 6. Acceptance Criteria + +| ID | Criterion | Evidence | +|----|-----------|----------| +| AC-1 | CrowdSec binaries compiled with go ≥ 1.25.8 equivalent | `cscli version` shows go1.26.x; grype reports 0 stdlib CVEs for CrowdSec | +| AC-2 | G203 suppressed with justification | `golangci-lint run ./...` reports 0 G203 findings | +| AC-3 | Test file permission corrected | Source shows `0o600`; gosec reports 0 G306 findings | +| AC-4 | Python dev tooling upgraded | `pip audit` reports 0 MEDIUM/HIGH for filelock, virtualenv, pip | +| AC-5 | SECURITY.md matches current state | CHARON-2026-001 in Patched; CHARON-2025-001 updated with new aliases; CVE-2025-68121 and CVE-2026-2673 in Known | +| AC-6 | DS-0002 confirmed false positive | `docker inspect charon:local \| jq '.[0].Config.User'` returns `"charon"` | +| AC-7 | Backend linting clean | `make lint-backend` exits 0 | +| AC-8 | All backend tests pass | `cd backend && go test ./...` exits 0 | + +--- + +## 7. Commit Slicing Strategy + +### Decision: Single PR + +All changes originate from a single audit, are security remediations, and are low-risk. A +single PR provides a coherent audit trail and does not impose review burden that would justify +splitting. No schema migrations, no cross-domain feature work, no conflicting refactoring. + +**Triggers that would justify a multi-PR split (none apply here)**: +- Security fix coupled to a large feature refactor +- Database schema migration alongside code changes +- Changes spanning unrelated subsystems requiring separate review queues + +### PR-1 (sole PR): `fix(security): remediate 2026-03-20 audit findings` + +**Files changed**: + +| File | Change | +|------|--------| +| `backend/internal/services/mail_service.go` | `// #nosec G203` comment + `//nolint:gosec` at line 195 | +| `backend/internal/services/docker_service_test.go` | `0o660` → `0o600` at line 231 | +| `SECURITY.md` | Move CHARON-2026-001 to Patched; update CHARON-2025-001; add new Known entries | +| `Dockerfile` *(conditional)* | Increment `ARG GO_VERSION` only if post-rebuild scan shows CVEs persist | + +**Dependencies**: Docker image rebuild is a CI/CD pipeline step triggered by merge, not a file +change tracked in this PR. Use `docker build --no-cache` for local validation. + +**Validation gates before merge**: +1. `go test ./...` passes +2. `golangci-lint run ./...` reports 0 G203 and 0 G306 findings +3. Docker image rebuilt and `grype charon:local` clean for the P1 CVE cluster + +**Rollback**: All changes are trivially reversible via `git revert`. The `//nolint` comment can +be removed, the permission reverted, and SECURITY.md restored. No infrastructure or database +changes are involved. + +**Suggested commit message**: + +``` +fix(security): remediate 2026-03-20 audit findings + +Suppress G203 false positive in mail_service.go with justification comment. +The template.HTML cast is safe because contentBuf is produced by +html/template.Execute, which auto-escapes all EmailTemplateData fields +before the rendered fragment is embedded in the base layout template. + +Correct test file permission from 0o660 to 0o600 in docker_service_test.go +to satisfy gosec G306. No production impact. + +Update SECURITY.md: move CHARON-2026-001 (Debian CVE cluster) to Patched +following confirmed Alpine 3.23.3 migration; expand CHARON-2025-001 aliases +to include CVE-2025-68121, CVE-2026-25679, and CVE-2025-61732; add Known +entries for CVE-2025-68121 (CRITICAL) and CVE-2026-2673 (HIGH, awaiting +upstream Alpine patch). + +Docker image rebuild with --no-cache resolves the CrowdSec Go stdlib CVE +cluster (CVE-2025-68121 CRITICAL + 5 others) by recompiling CrowdSec from +source against go1.26.1 via the existing crowdsec-builder Dockerfile stage. +DS-0002 (Dockerfile root user) confirmed false positive — USER charon +instruction is present. +``` + +--- + +## 8. Items Requiring No Code Change | Item | Reason | -|---|---| -| `mockMonitors[2].url` (`tcp://redis:6379`, line ~82) | Static fixture data returned by a mocked GET response. Never submitted through the form; never hits validation. Tests in `Monitor List Display` that display this monitor continue to pass. | -| `tests/fixtures/auth-fixtures.ts` | No TCP monitor creation logic; unaffected. | -| `tests/utils/wait-helpers.ts` | Utility helpers; unaffected. | -| `frontend/src/pages/Uptime.tsx` | Production code — must not be changed per scope constraint. | -| `backend/internal/api/handlers/uptime_handler.go` | Production code — must not be changed per scope constraint. | -| `frontend/src/api/uptime.ts` | API client; no change needed. | +|------|--------| +| DS-0002 (Dockerfile `USER`) | FALSE POSITIVE — `USER charon` present in final stage (~line 625) | +| CVE-2026-2673 (OpenSSL) | No Alpine fix available; tracked in SECURITY.md as Awaiting Upstream | +| CHARON-2025-001 original cluster | Awaiting CrowdSec upstream release with go1.26.0+ binaries | --- -## Verification Steps +## 9. Scan Artifact .gitignore Coverage -After applying the two-line fix: +The following files exist at the repository root and contain scan output. Verify each is covered +by `.gitignore` to prevent accidental commits of stale or sensitive scan data: -1. Run the specific failing test in Firefox to confirm it passes: - ```bash - cd /projects/Charon - npx playwright test tests/monitoring/uptime-monitoring.spec.ts \ - --grep "should create new TCP monitor" \ - --project=firefox - ``` - -2. Run the full CRUD describe block to confirm no regression in neighbouring tests: - ```bash - npx playwright test tests/monitoring/uptime-monitoring.spec.ts \ - --grep "Monitor CRUD Operations" \ - --project=firefox - ``` - -3. Optionally run the full uptime spec across all shards to confirm end-to-end health. - ---- - -## Commit Slicing Strategy - -A single commit is appropriate — this is a two-line test fix with no branching concerns. - -``` -test(uptime): fix TCP monitor test to use bare host:port URL format - -The should-create-new-TCP-monitor test was passing tcp://redis.local:6379 -as the monitor URL. Client-side validation added in the TCP UX fix PR -blocks form submission when a TCP URL contains a scheme prefix, -so the POST to /api/v1/uptime/monitors never fired and waitForResponse -timed out. - -Update the URL fill value and the corresponding payload assertion to use -the correct bare host:port format (redis.local:6379), which satisfies -the existing validation and matches the documented TCP address format. -``` - -**Branch:** `fix/tcp-monitor-e2e-url-format` → PR targeting `main` -**Files changed:** 1 -**Lines changed:** 2 - ---- - -## 1. Executive Summary - -A community user performed a **clean start / fresh installation** of Charon and reported 7 bugs. After thorough code analysis, the findings are: - -| # | Issue | Classification | Severity | -|---|-------|----------------|----------| -| 1 | Missing database values before CrowdSec enabling | **LIKELY BUG** | Medium | -| 2 | CrowdSec requires CLI register before enrollment | **BY DESIGN** (auto-handled) | Low | -| 3 | UI bugs on first enabling CrowdSec | **LIKELY BUG** | Medium | -| 4 | "Required value" error when enabling CrowdSec | **CONFIRMED BUG** | Medium | -| 5 | Monitor TCP port — UI can't add | **LIKELY BUG** | Medium | -| 6 | HTTP always down but HTTPS okay | **CONFIRMED BUG** | High | -| 7 | Security blocked local connection to private IP | **BY DESIGN** (with UX gap) | Medium | - ---- - -## 2. Detailed Investigation - ---- - -### Issue 1: Missing Database Values Before CrowdSec Enabling - -**User report:** "some database missing values — i think before crowdsec enabling" - -#### Files Examined - -- `backend/internal/models/security_config.go` — `SecurityConfig` model definition -- `backend/internal/models/setting.go` — `Setting` key-value model -- `backend/internal/api/routes/routes.go` (L93-L120) — AutoMigrate call -- `backend/internal/api/handlers/security_handler.go` (L69-L215) — `GetStatus` handler -- `backend/cmd/seed/main.go` — Seed data (dev only) - -#### Root Cause Analysis - -On a fresh install: - -1. **`SecurityConfig` table is auto-migrated but has NO seed row.** GORM `AutoMigrate` creates the table schema but does not insert default rows. The `SecurityConfig` model has no GORM `default:` tags on key fields like `CrowdSecMode`, `WAFMode`, or `RateLimitMode` — they start as Go zero values (empty strings for strings, `false` for bools, `0` for ints). - -2. **`GetStatus` handler gracefully handles missing SecurityConfig** — it queries `WHERE name = 'default'` and if not found, falls back to static config defaults. However, **the settings table is completely empty** on a fresh install because no settings are seeded. The handler reads settings like `feature.cerberus.enabled`, `security.crowdsec.enabled`, etc. via raw SQL — these return empty results, so the fallback chain works but returns all-disabled state. - -3. **The `Start` handler in `crowdsec_handler.go` creates a SecurityConfig row on first enable** (line 466-475), but only when the user toggles CrowdSec on. Before that point, the DB has no `security_configs` row at all. - -4. **Fields like `WAFParanoiaLevel` have `gorm:"default:1"` but most fields don't.** On a fresh install, if any code reads `SecurityConfig` expecting populated defaults (e.g., `crowdsec_api_url`), it gets empty strings. - -**Key issue:** The `crowdsecPowerMutation` in `Security.tsx` calls `updateSetting('security.crowdsec.enabled', 'true', ...)` before calling `startCrowdsec()`. The `updateSetting` call uses `UpdateSettingRequest` which has `binding:"required"` on `Value`. The value `"true"` satisfies this. However, **no SecurityConfig row exists yet** — the `Start` handler creates it. The sequence is: -1. Frontend calls `updateSetting` — creates/updates a setting row ✓ -2. Frontend calls `startCrowdsec()` — backend `Start` handler creates SecurityConfig OR updates existing one ✓ - -This works, but the **`GetStatus` handler returns stale/empty data between step 1 and step 2** because the optimistic update in the frontend doesn't account for the SecurityConfig not existing yet. - -#### Classification: **LIKELY BUG** - -The system functionally works but returns confusing intermediate states during the first enable sequence. The missing SecurityConfig row and absence of seeded settings means the `/security/status` endpoint returns an all-empty/disabled state until the user explicitly toggles something. - -#### Proposed Fix - -1. Add a database seed step to the main application startup (not just the dev seed tool) that ensures a default `SecurityConfig` row exists with sensible defaults: - ```go - // In routes.go or main.go after AutoMigrate - var cfg models.SecurityConfig - if err := db.Where("name = ?", "default").FirstOrCreate(&cfg, models.SecurityConfig{ - UUID: "default", - Name: "default", - Enabled: false, - CrowdSecMode: "disabled", - WAFMode: "disabled", - RateLimitMode: "disabled", - CrowdSecAPIURL: "http://127.0.0.1:8085", - }).Error; err != nil { - log.Warn("Failed to seed default SecurityConfig") - } - ``` -2. Add default setting rows for feature flags (`feature.cerberus.enabled`, etc.) during startup. - ---- - -### Issue 2: CrowdSec Still Needs CLI Register Before Enrollment - -**User report:** "crowdsec still needs cli register before you can enroll" - -#### Files Examined - -- `backend/internal/crowdsec/console_enroll.go` (L250-L330) — `ensureCAPIRegistered()` and `checkLAPIAvailable()` -- `backend/internal/api/handlers/crowdsec_handler.go` (L1262-L1360) — `ConsoleEnroll` handler -- `backend/internal/api/handlers/crowdsec_handler.go` (L458-L581) — `Start` handler (bouncer registration) - -#### Root Cause Analysis - -The enrollment flow already handles CAPI registration automatically: - -1. `ConsoleEnroll` handler calls `h.Console.Enroll(ctx, …)` -2. `Enroll()` calls `s.checkLAPIAvailable(ctx)` — verifies LAPI is running (retries 5x with exponential backoff, ~45s total) -3. `Enroll()` calls `s.ensureCAPIRegistered(ctx)` — checks for `online_api_credentials.yaml` and runs `cscli capi register` if missing -4. Then runs `cscli console enroll --name ` - -**The auto-registration IS implemented.** However, the user may have encountered: -- **Timing issue:** If CrowdSec was just started, LAPI may not be ready yet. The `checkLAPIAvailable` retries for ~45s, but if the user triggered enrollment immediately after starting CrowdSec, the timeout may have expired. -- **Feature flag:** Console enrollment is behind `FEATURE_CROWDSEC_CONSOLE_ENROLLMENT` feature flag, which defaults to **disabled** (`false`). The `isConsoleEnrollmentEnabled()` method returns `false` unless explicitly enabled via DB setting or env var. Without this flag, the enrollment endpoint returns 404. -- **Error messaging:** If CAPI registration fails, the error message might be confusing, leading the user to think they need to manually run `cscli capi register`. - -#### Classification: **BY DESIGN** (with potential UX gap) - -The auto-registration logic exists and works. The feature flag being off by default means the user likely tried to enroll via the console enrollment UI (which is hidden/unavailable) and ended up using the CLI instead. If they tried via the exposed bouncer registration endpoint, that's a different flow — CAPI registration is only auto-triggered by the console enrollment path, not the bouncer registration path. - -#### Proposed Fix - -1. Improve error messaging when LAPI check times out during enrollment -2. Consider auto-running `cscli capi register` during the `Start` handler (not just during enrollment) -3. Document the enrollment flow more clearly for users - ---- - -### Issue 3: UI Bugs on First Enabling CrowdSec - -**User report:** "ui bugs on first enabling crowdsec" - -#### Files Examined - -- `frontend/src/pages/Security.tsx` (L168-L229) — `crowdsecPowerMutation` -- `frontend/src/pages/Security.tsx` (L440-L452) — CrowdSec toggle Switch -- `frontend/src/pages/CrowdSecConfig.tsx` (L1-L100) — CrowdSec config page -- `frontend/src/components/CrowdSecKeyWarning.tsx` — Key warning component - -#### Root Cause Analysis - -When CrowdSec is first enabled on a fresh install, several things happen in sequence: - -1. `crowdsecPowerMutation` calls `updateSetting('security.crowdsec.enabled', 'true', ...)` -2. Then calls `startCrowdsec()` which takes 10-60 seconds -3. Then calls `statusCrowdsec()` to verify -4. If LAPI ready, `ensureBouncerRegistration()` runs on the backend -5. `onSuccess` callback invalidates queries - -**During this ~30s window:** -- The toggle should show loading state, but the `Switch` component reads `crowdsecStatus?.running ?? status.crowdsec.enabled` — if `crowdsecStatus` is stale (from the initial `useEffect` fetch), the toggle may flicker. -- The `CrowdSecConfig` page polls LAPI status every 5 seconds — on first enable, this will show "not running" until LAPI finishes starting. -- The `CrowdSecKeyWarning` component checks key status — on first enable, no bouncer key exists yet, potentially triggering warnings. -- `ConfigReloadOverlay` shows when `isApplyingConfig` is true, but the CrowdSec start operation takes significantly longer than typical config operations. - -**Specific bugs likely seen:** -- Toggle flickering between checked/unchecked as different queries return at different times -- Stale "disabled" status shown while CrowdSec is actually starting -- Bouncer key warning appearing briefly before registration completes -- Console enrollment section showing "LAPI not ready" errors - -#### Classification: **LIKELY BUG** - -The async nature of CrowdSec startup (10-60s) combined with multiple independent polling queries creates a poor UX during the first-enable flow. - -#### Proposed Fix - -1. Add a dedicated "starting" state to the CrowdSec toggle — show a spinner/loading indicator for the full duration of the start operation -2. Suppress `CrowdSecKeyWarning` while a start operation is in progress -3. Debounce the LAPI status polling to avoid showing transient "not ready" states -4. Use the mutation's `isPending` state to disable all CrowdSec-related UI interactions during startup - ---- - -### Issue 4: "Required Value" Error When Enabling CrowdSec - -**User report:** "enabling crowdsec throws ui 'required value' error but enables okay" - -#### Files Examined - -- `frontend/src/pages/Security.tsx` (L100-L155) — `toggleServiceMutation` -- `frontend/src/pages/Security.tsx` (L168-L182) — `crowdsecPowerMutation` -- `backend/internal/api/handlers/settings_handler.go` (L115-L120) — `UpdateSettingRequest` struct -- `frontend/src/api/settings.ts` (L27-L29) — `updateSetting` function - -#### Root Cause Analysis - -**This is a confirmed bug** caused by Gin's `binding:"required"` validation tag on the `Value` field: - -```go -type UpdateSettingRequest struct { - Key string `json:"key" binding:"required"` - Value string `json:"value" binding:"required"` - Category string `json:"category"` - Type string `json:"type"` -} -``` - -The `crowdsecPowerMutation` calls: -```typescript -await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool') -``` - -When `enabled` is `true`, the value `"true"` satisfies `binding:"required"`. So the direct CrowdSec toggle shouldn't fail here. - -**The actual bug path:** The `crowdsecPowerMutation` calls `updateSetting` and then `startCrowdsec()`. The `startCrowdsec()` triggers the backend `Start` handler which internally creates/updates settings. If there's a race condition where the frontend also calls a related `updateSetting` with an empty value (e.g., a cascading toggle for an uninitialized setting), Gin's `binding:"required"` treats empty string `""` as missing for string fields, producing a validation error. - -**Broader problem:** Any code path that calls `updateSetting` with an empty value (e.g., clearing an admin whitelist, resetting a configuration) triggers this validation error. This is incorrect — an empty string is a valid value for a setting. - -#### Classification: **CONFIRMED BUG** - -The `binding:"required"` tag on `Value` in `UpdateSettingRequest` means any attempt to set a setting to an empty string `""` will fail with a "required" validation error. This is incorrect — empty string is a valid value. - -#### Proposed Fix - -Remove `binding:"required"` from the `Value` field: -```go -type UpdateSettingRequest struct { - Key string `json:"key" binding:"required"` - Value string `json:"value"` // Empty string is valid - Category string `json:"category"` - Type string `json:"type"` -} -``` - -If value must not be empty for specific keys, add key-specific validation in the handler logic. - -#### Reproduction Steps - -1. Fresh install of Charon -2. Navigate to Security dashboard -3. Enable Cerberus (master toggle) -4. Toggle CrowdSec ON -5. Observe toast error containing "required" or "required value" -6. Despite the error, CrowdSec still starts successfully - ---- - -### Issue 5: Monitor TCP Port — UI Can't Add - -**User report:** "monitor tcp port ui can't add" - -#### Files Examined - -- `frontend/src/pages/Uptime.tsx` (L342-L500) — `CreateMonitorModal` -- `frontend/src/api/uptime.ts` (L80-L97) — `createMonitor` API -- `backend/internal/api/handlers/uptime_handler.go` (L30-L60) — `CreateMonitorRequest` and `Create` handler -- `backend/internal/services/uptime_service.go` (L1083-L1140) — `CreateMonitor` service method - -#### Root Cause Analysis - -The frontend `CreateMonitorModal` supports TCP: -```tsx -const [type, setType] = useState<'http' | 'tcp'>('http'); -// ... - -``` - -The backend validates: -```go -Type string `json:"type" binding:"required,oneof=http tcp https"` -``` - -And TCP format validation: -```go -if monitorType == "tcp" { - if _, _, err := net.SplitHostPort(urlStr); err != nil { - return nil, fmt.Errorf("TCP URL must be in host:port format: %w", err) - } -} -``` - -**Possible issues:** -1. The URL placeholder may not update when TCP is selected — user enters `http://...` format instead of `host:port` -2. No client-side format validation that changes based on type -3. The backend error message about `host:port` format may not surface clearly through the API error chain - -#### Classification: **LIKELY BUG** - -The i18n placeholder string `urlPlaceholder` is `"https://example.com or tcp://host:port"`. The `tcp://` scheme prefix is misleading — the backend's `net.SplitHostPort()` expects raw `host:port` (no scheme). A user following the placeholder guidance would submit `tcp://192.168.1.1:8080`, which fails `SplitHostPort` parsing because the `://` syntax is not a valid `host:port` format. This is the likely root cause. - -#### Proposed Fix - -1. **Fix the i18n translation string** in `frontend/src/locales/en/translation.json`: change `"urlPlaceholder"` from `"https://example.com or tcp://host:port"` to `"https://example.com or host:port"` (removing the misleading `tcp://` scheme) -2. Update URL placeholder dynamically: `placeholder={type === 'tcp' ? '192.168.1.1:8080' : 'https://example.com'}` -3. Add helper text below URL field explaining expected format per type -4. Add client-side format validation before submission - ---- - -### Issue 6: HTTP Always Down but HTTPS Okay - -**User report:** "http always down but https okay" - -#### Files Examined - -- `backend/internal/services/uptime_service.go` (L727-L810) — `checkMonitor` method -- `backend/internal/security/url_validator.go` (L169-L300) — `ValidateExternalURL` -- `backend/internal/network/safeclient.go` (L1-L113) — `IsPrivateIP`, `NewSafeHTTPClient` - -#### Root Cause Analysis - -**CONFIRMED BUG caused by SSRF protection blocking private IP addresses for HTTP monitors.** - -In `checkMonitor`, HTTP/HTTPS monitors go through: -```go -case "http", "https": - validatedURL, err := security.ValidateExternalURL( - monitor.URL, - security.WithAllowLocalhost(), - security.WithAllowHTTP(), - security.WithTimeout(3*time.Second), - ) -``` - -Then use: -```go -client := network.NewSafeHTTPClient( - network.WithTimeout(10*time.Second), - network.WithDialTimeout(5*time.Second), - network.WithMaxRedirects(0), - network.WithAllowLocalhost(), -) -``` - -**Critical path:** -1. `ValidateExternalURL` resolves the monitor's hostname via DNS -2. It checks ALL resolved IPs against `network.IsPrivateIP()` -3. `IsPrivateIP` blocks RFC 1918 ranges: `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16` -4. `WithAllowLocalhost()` only allows `127.0.0.1`, `localhost`, `::1` — does NOT allow private IPs - -**If the monitor URL resolves to a private IP** (common for self-hosted services), `ValidateExternalURL` blocks the connection with: `"connection to private ip addresses is blocked for security"`. - -**Why HTTPS works but HTTP doesn't:** The user's HTTPS monitors likely point to public domains (via external DNS/CDN) that resolve to public IPs. HTTP monitors target private upstream IPs directly (e.g., `http://192.168.1.100:8080`), which fail the SSRF check. - -Meanwhile, **TCP monitors use raw `net.DialTimeout("tcp", ...)` with NO SSRF protection at all** — they bypass the entire validation chain. - -> **Note:** The PR should add a code comment in `uptime_service.go` at the TCP `net.DialTimeout` call site acknowledging this deliberate SSRF bypass. TCP monitors currently only accept admin-configured `host:port` (no URL parsing, no redirects), so the SSRF attack surface is minimal. If SSRF validation is added to TCP in the future, it must also respect `WithAllowRFC1918()`. - -#### Classification: **CONFIRMED BUG** - -Uptime monitoring for self-hosted services on private networks is fundamentally broken by SSRF protection. The `WithAllowLocalhost()` option is insufficient — it only allows `127.0.0.1/localhost`, not the RFC 1918 ranges that self-hosted services use. - -#### Proposed Fix - -Add a `WithAllowRFC1918()` option for admin-configured uptime monitors that selectively unblocks only RFC 1918 private ranges (`10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`) while keeping cloud metadata (`169.254.169.254`), link-local, loopback, and reserved ranges blocked. - -**Dual-layer fix required** — both the URL validation and the safe dialer must be updated: - -**Layer 1: `url_validator.go`** — Add `WithAllowRFC1918()` validation option: -```go -// In checkMonitor: -validatedURL, err := security.ValidateExternalURL( - monitor.URL, - security.WithAllowLocalhost(), - security.WithAllowHTTP(), - security.WithAllowRFC1918(), // NEW: Only unblocks 10/8, 172.16/12, 192.168/16 - security.WithTimeout(3*time.Second), -) -``` - -**Layer 2: `safeclient.go`** — Add `AllowRFC1918` to `ClientOptions` and respect it in `safeDialer`: -```go -// In ClientOptions: -AllowRFC1918 bool // Permits connections to RFC 1918 private IPs only - -// New option constructor: -func WithAllowRFC1918() Option { - return func(opts *ClientOptions) { - opts.AllowRFC1918 = true - } -} - -// In safeDialer, before IsPrivateIP check: -if opts.AllowRFC1918 && isRFC1918(ip.IP) { - continue // Allow RFC 1918, still block link-local/metadata/reserved -} - -// In NewSafeHTTPClient call in checkMonitor: -client := network.NewSafeHTTPClient( - network.WithTimeout(10*time.Second), - network.WithDialTimeout(5*time.Second), - network.WithMaxRedirects(0), - network.WithAllowLocalhost(), - network.WithAllowRFC1918(), // NEW: Must match URL validator layer -) -``` - -Without the `safeDialer` fix, connections pass URL validation but are still blocked at dial time. Both layers must allow RFC 1918. - -This is safe because uptime monitors are **admin-configured only** — they require authentication. SSRF protection's purpose is to prevent untrusted user-initiated requests to internal services, not admin-configured health checks. Cloud metadata and link-local remain blocked even with this option. - -#### Reproduction Steps - -1. Fresh install of Charon -2. Add a proxy host pointing to a local service (e.g., `192.168.1.100:8080`) -3. Monitor auto-creates with `http://yourdomain.local` -4. Monitor status shows "down" with SSRF error -5. HTTPS monitor to a public domain succeeds - ---- - -### Issue 7: Security Blocked Local Connection to Private IP - -**User report:** "security blocked local connection to private ip — status in db just noticed randomly" - -#### Files Examined - -- `backend/internal/network/safeclient.go` (L22-L55) — `privateCIDRs` list -- `backend/internal/network/safeclient.go` (L74-L113) — `IsPrivateIP` function -- `backend/internal/security/url_validator.go` (L169-L300) — `ValidateExternalURL` -- `backend/internal/services/uptime_service.go` (L727-L810) — `checkMonitor` - -#### Root Cause Analysis - -Direct consequence of Issue 6. The SSRF protection blocks ALL RFC 1918 private IP ranges, plus loopback, link-local, and reserved ranges. This protection is applied at: - -1. **URL Validation** (`ValidateExternalURL`) — blocks at URL validation time -2. **Safe Dialer** (`safeDialer`) — blocks at DNS resolution / connection time - -The user noticed in the database because: -- The uptime monitor's `status` field shows `"down"` -- The heartbeat `message` stores the SSRF rejection error -- This status persists in the database and is visible through the monitoring UI - -#### Classification: **BY DESIGN** (with UX gap) - -The SSRF protection is correctly implemented for security. However, the application needs to differentiate between: -1. **External user-initiated URLs** (webhooks, notification endpoints) — MUST block private IPs -2. **Admin-configured monitoring targets** — SHOULD allow private IPs (trusted, intentional configs) - -#### Proposed Fix - -Same as Issue 6 — introduce `WithAllowRFC1918()` for admin-configured monitoring (both `url_validator.go` and `safeclient.go` layers). Additionally: -1. Add a clear UI message when a monitor is down due to SSRF protection -2. Log the specific blocked IP and reason for admin debugging - ---- - -## 3. Reproduction Steps Summary - -### Fresh Install Test Sequence - -1. Deploy Charon from a clean image (no existing database) -2. Complete initial setup (create admin user) -3. Navigate to Security dashboard - -**Issue 1:** Check Network tab → `GET /api/v1/security/status` — verify response has populated defaults - -**Issue 4:** Enable Cerberus → Toggle CrowdSec ON → Watch for "required" error toast - -**Issue 3:** During CrowdSec start (~30s), observe UI for flickering/stale states - -**Issue 5:** Uptime → Add Monitor → Select TCP → Enter `192.168.1.1:8080` → Submit - -**Issues 6 & 7:** Add proxy host to private IP → Wait for auto-sync → Check HTTP monitor status - ---- - -## 4. Implementation Plan - -### Phase 1: Playwright E2E Tests - -| Test | File | Description | -|------|------|-------------| -| Fresh security dashboard loads | `tests/security/fresh-install.spec.ts` | Verify status endpoint returns valid defaults on empty DB | -| CrowdSec enable flow completes | `tests/security/fresh-install.spec.ts` | Toggle CrowdSec on, verify no validation errors | -| Setting update with empty value | `tests/security/fresh-install.spec.ts` | Verify setting can be cleared | -| TCP monitor creation | `tests/uptime/create-monitor.spec.ts` | Create TCP monitor via UI | -| HTTP monitor for private IP | `tests/uptime/private-ip-monitor.spec.ts` | Create HTTP monitor for private IP, verify it connects | -| TCP placeholder updates dynamically | `tests/uptime/create-monitor.spec.ts` | Verify placeholder changes when switching to TCP type | - -### Phase 1b: Backend Unit Tests - -| Test | File | Description | -|------|------|-------------| -| `UpdateSettingRequest` with empty value | `settings_handler_test.go` | Verify empty string `""` is accepted for `Value` field (Issue 4) | -| TCP monitor with private IP | `uptime_service_test.go` | Regression: if SSRF is added to TCP later, private IPs must still work | -| Cloud metadata blocked with RFC 1918 allowed | `safeclient_test.go` | `169.254.169.254` remains blocked even when `AllowRFC1918 = true` | -| `safeDialer` with RFC 1918 allowance | `safeclient_test.go` | Dial to `10.x.x.x` succeeds with `AllowRFC1918`, dial to `169.254.x.x` fails | -| `ValidateExternalURL` with RFC 1918 | `url_validator_test.go` | RFC 1918 IPs pass validation; link-local/metadata still rejected | - -### Phase 2: Backend Fixes - -#### PR-1: Fix `binding:"required"` on Setting Value (Issue 4) -- **Files:** `settings_handler.go`, tests -- **Validation:** `go test ./backend/internal/api/handlers/... -run TestUpdateSetting` - -#### PR-2: Seed Default SecurityConfig on Startup (Issue 1) -- **Files:** `routes.go` or `main.go`, tests -- **Validation:** Fresh start → `/security/status` returns valid defaults - -#### PR-3: Allow RFC 1918 Private IPs for Uptime Monitors (Issues 6 & 7) -- **Files:** `url_validator.go`, `safeclient.go`, `uptime_service.go`, tests -- **Scope:** Add `WithAllowRFC1918()` option to both `ValidateExternalURL` and `NewSafeHTTPClient`/`safeDialer`. Add `isRFC1918()` helper. Add code comment at TCP `net.DialTimeout` call site noting deliberate SSRF bypass. -- **Validation:** HTTP monitor to `192.168.x.x` shows "up"; cloud metadata `169.254.169.254` remains blocked - -### Phase 3: Frontend Fixes - -#### PR-4: CrowdSec Enable UX (Issues 3 & 4) -- **Files:** `Security.tsx`, `CrowdSecConfig.tsx`, `CrowdSecKeyWarning.tsx` -- **Validation:** Playwright: CrowdSec toggle smooth, no error toasts - -#### PR-5: Monitor Creation UX for TCP (Issue 5) -- **Files:** `Uptime.tsx`, `frontend/src/locales/en/translation.json` -- **Scope:** Fix misleading `tcp://host:port` in i18n placeholder to `host:port`, add dynamic placeholder per monitor type -- **Validation:** Playwright: TCP monitor created via UI - -### Phase 4: Documentation & Integration Testing -- Update Getting Started docs with fresh install notes -- Run full Playwright suite against fresh install - ---- - -## 5. Commit Slicing Strategy - -**Decision:** Multiple PRs (5 PRs) for safer and faster review. - -**Trigger reasons:** -- Cross-domain changes (backend security, backend settings, frontend) -- Multiple independent fixes with no inter-dependencies -- Each fix is individually testable and rollbackable - -### Ordered PR Slices - -| PR | Scope | Files | Dependencies | Validation Gate | -|----|-------|-------|--------------|-----------------| -| **PR-1** | Fix `binding:"required"` on Setting Value | `settings_handler.go`, tests | None | Unit tests pass | -| **PR-2** | Seed default SecurityConfig on startup | `routes.go`/`main.go`, tests | None | Fresh start returns valid defaults | -| **PR-3** | Allow RFC 1918 IPs for uptime monitors (dual-layer) | `url_validator.go`, `safeclient.go`, `uptime_service.go`, tests | None | HTTP monitor to RFC 1918 IP works; cloud metadata still blocked | -| **PR-4** | CrowdSec enable UX improvements | `Security.tsx`, `CrowdSecConfig.tsx`, `CrowdSecKeyWarning.tsx` | PR-1 | Playwright: smooth toggle | -| **PR-5** | Monitor creation UX for TCP + i18n fix | `Uptime.tsx`, `translation.json` | None | Playwright: TCP creation works | - -**Rollback:** Each PR is independently revertable. No DB migrations or schema changes. - ---- - -## 6. E2E Test Gaps - -| Test Suite | Covers Fresh Install? | Gap | -|------------|----------------------|-----| -| `tests/security/security-dashboard.spec.ts` | No | Needs fresh-db variant | -| `tests/security/crowdsec-config.spec.ts` | No | Needs first-enable test | -| `tests/uptime/*.spec.ts` | Unknown | Needs TCP + private IP tests | - ---- - -## 7. Ancillary File Review - -- **`.gitignore`** — No changes needed -- **`codecov.yml`** — No changes needed -- **`.dockerignore`** — No changes needed -- **`Dockerfile`** — No changes needed - ---- - -## 8. Acceptance Criteria - -1. **Issue 1:** `/api/v1/security/status` returns populated defaults on a fresh database -2. **Issue 2:** Documented as by-design; enrollment auto-registers CAPI when needed -3. **Issue 3:** No toggle flickering or transient error states during first CrowdSec enable -4. **Issue 4:** No "required value" error toast when enabling/disabling modules -5. **Issue 5:** TCP monitor creation succeeds with `host:port` format; i18n placeholder no longer includes misleading `tcp://` scheme; dynamic placeholder guides user -6. **Issue 6:** HTTP monitors to private IPs succeed for admin-configured uptime monitors -7. **Issue 7:** Uptime heartbeat messages do not contain "private IP blocked" errors for admin monitors - ---- - -## PR-3 Implementation Plan - -**Title:** Allow RFC 1918 IPs for admin-configured uptime monitors (dual-layer SSRF fix) -**Issues Resolved:** Issue 6 (CONFIRMED BUG) + Issue 7 (BY DESIGN / UX gap) -**Status:** **APPROVED** (all blocking concerns resolved) -**Dependencies:** None (independent of PR-1 and PR-2) - ---- - -### Overview - -HTTP/HTTPS uptime monitors targeting LAN addresses permanently report "down" on fresh installs. The root cause is a dual-layer SSRF guard: **Layer 1** (`url_validator.go`) rejects private IPs during hostname resolution, and **Layer 2** (`safeclient.go`) re-checks every IP at TCP dial time to defeat DNS rebinding. Because both layers enforce `IsPrivateIP`, patching only one would produce a monitor that passes URL validation but is silently killed at dial time—the bug would appear fixed in logs but remain broken in practice. - -The fix threads a single `AllowRFC1918` signal through both layers, visible as the `WithAllowRFC1918()` functional option. It **only** unblocks the three RFC 1918 ranges (`10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`). All other restricted ranges—cloud metadata (`169.254.0.0/16` including `169.254.169.254`), loopback (`127.0.0.0/8`, `::1`), IPv6 link-local (`fe80::/10`), IPv6 unique-local (`fc00::/7`), and all reserved blocks—remain fully blocked regardless of the option. - -**Authentication posture is verified:** Uptime monitor routes (`POST /uptime/monitors/:id/check`, etc.) live inside the `management` route group in `backend/internal/api/routes/routes.go`. That group is nested under `protected`, which enforces `authMiddleware` (JWT), and then applies `middleware.RequireManagementAccess()`. The RFC 1918 bypass is therefore **exclusively accessible to authenticated, management-tier users**—never to passthrough users or unauthenticated callers. - ---- - -### A. File-by-File Change Plan - ---- - -#### File 1: `backend/internal/network/safeclient.go` - -**Package:** `network` - -**Change 1 — Add RFC 1918 block set** - -Below the `privateBlocks` and `initOnce` declarations, introduce a parallel set of `sync.Once`-guarded CIDR blocks containing only the three RFC 1918 ranges. These are stored separately so the `IsRFC1918` check can remain a cheap, focused predicate without reopening the full `IsPrivateIP` logic. - -New package-level variables (insert after `initOnce sync.Once`): - -```go -var ( - rfc1918Blocks []*net.IPNet - rfc1918Once sync.Once -) - -var rfc1918CIDRs = []string{ - "10.0.0.0/8", - "172.16.0.0/12", - "192.168.0.0/16", -} -``` - -New init function (insert after `initPrivateBlocks`): - -```go -func initRFC1918Blocks() { - rfc1918Once.Do(func() { - rfc1918Blocks = make([]*net.IPNet, 0, len(rfc1918CIDRs)) - for _, cidr := range rfc1918CIDRs { - _, block, err := net.ParseCIDR(cidr) - if err != nil { - continue - } - rfc1918Blocks = append(rfc1918Blocks, block) - } - }) -} -``` - -**Change 2 — Add `IsRFC1918` exported predicate** - -Insert after the `IsPrivateIP` function. This function is exported so `url_validator.go` (in the `security` package) can call it via `network.IsRFC1918(ip)`, eliminating duplicated CIDR definitions. - -```go -// IsRFC1918 reports whether ip falls within one of the three RFC 1918 private ranges: -// 10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16. -// -// Unlike IsPrivateIP, this function does NOT cover loopback, link-local, cloud metadata, -// or any reserved ranges. Use it only to selectively unblock LAN addresses for -// admin-configured features (e.g., uptime monitors) while preserving all other SSRF guards. -func IsRFC1918(ip net.IP) bool { - if ip == nil { - return false - } - initRFC1918Blocks() - if ip4 := ip.To4(); ip4 != nil { - ip = ip4 - } - for _, block := range rfc1918Blocks { - if block.Contains(ip) { - return true - } - } - return false -} -``` - -**Change 3 — Add `AllowRFC1918` to `ClientOptions` struct** - -Insert after the `DialTimeout` field: - -```go -// AllowRFC1918 permits connections to RFC 1918 private IP ranges: -// 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16. -// -// SECURITY NOTE: Enable only for admin-configured features (e.g., uptime monitors). -// All other restricted ranges (loopback, link-local, cloud metadata, reserved) remain -// blocked regardless of this flag. -AllowRFC1918 bool -``` - -`defaultOptions()` returns `AllowRFC1918: false` — no change needed there. - -**Change 4 — Add `WithAllowRFC1918` functional option** - -Insert after `WithDialTimeout`: - -```go -// WithAllowRFC1918 permits connections to RFC 1918 private addresses. -// Use exclusively for admin-configured outbound calls such as uptime monitors; -// never for user-supplied URLs. -func WithAllowRFC1918() Option { - return func(opts *ClientOptions) { - opts.AllowRFC1918 = true - } -} -``` - -**Change 5 — Update `safeDialer` validation loop** - -Locate the loop inside `safeDialer` that reads: -```go -for _, ip := range ips { - if opts.AllowLocalhost && ip.IP.IsLoopback() { - continue - } - if IsPrivateIP(ip.IP) { - return nil, fmt.Errorf("connection to private IP blocked: %s resolved to %s", host, ip.IP) - } -} -``` - -Replace with: -```go -for _, ip := range ips { - if opts.AllowLocalhost && ip.IP.IsLoopback() { - continue - } - // Admin-configured monitors may legitimately target LAN services. - // Allow RFC 1918 ranges only; all other restricted ranges (link-local, - // cloud metadata 169.254.169.254, loopback, reserved) remain blocked. - if opts.AllowRFC1918 && IsRFC1918(ip.IP) { - continue - } - if IsPrivateIP(ip.IP) { - return nil, fmt.Errorf("connection to private IP blocked: %s resolved to %s", host, ip.IP) - } -} -``` - -**Change 6 — Update `safeDialer` IP selection loop** - -Locate the loop that selects `selectedIP`: -```go -for _, ip := range ips { - if opts.AllowLocalhost && ip.IP.IsLoopback() { - selectedIP = ip.IP - break - } - if !IsPrivateIP(ip.IP) { - selectedIP = ip.IP - break - } -} -``` - -Replace with: -```go -for _, ip := range ips { - if opts.AllowLocalhost && ip.IP.IsLoopback() { - selectedIP = ip.IP - break - } - if opts.AllowRFC1918 && IsRFC1918(ip.IP) { - selectedIP = ip.IP - break - } - if !IsPrivateIP(ip.IP) { - selectedIP = ip.IP - break - } -} -``` - -**Change 7 — `validateRedirectTarget` (removed from PR-3 scope)** - -`checkMonitor` passes `network.WithMaxRedirects(0)`. In `NewSafeHTTPClient`'s `CheckRedirect` handler, `MaxRedirects == 0` causes an immediate return via `http.ErrUseLastResponse` — meaning `validateRedirectTarget` is **never called** for any uptime monitor request. Adding RFC 1918 logic here would ship dead code with no test coverage. - -Instead, add the following TODO comment at the top of `validateRedirectTarget` in `safeclient.go`: - -```go -// TODO: if MaxRedirects > 0 is ever added to uptime monitor checks, also pass -// WithAllowRFC1918() into validateRedirectTarget so that redirect targets to -// RFC 1918 addresses are permitted for admin-configured monitor call sites. -``` - -*No other functions in `safeclient.go` require changes.* - ---- - -#### File 2: `backend/internal/security/url_validator.go` - -**Package:** `security` - -**Change 1 — Add `AllowRFC1918` field to `ValidationConfig`** - -Locate the `ValidationConfig` struct: -```go -type ValidationConfig struct { - AllowLocalhost bool - AllowHTTP bool - MaxRedirects int - Timeout time.Duration - BlockPrivateIPs bool -} -``` - -Add the new field after `BlockPrivateIPs`: -```go -// AllowRFC1918 permits URLs that resolve to RFC 1918 private addresses. -// Does not disable blocking of loopback, link-local, cloud metadata, or reserved ranges. -AllowRFC1918 bool -``` - -**Change 2 — Add `WithAllowRFC1918` functional option** - -Insert after the `WithMaxRedirects` function: - -```go -// WithAllowRFC1918 permits hostnames that resolve to RFC 1918 private IP ranges -// (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16). -// -// SECURITY NOTE: Use only for admin-controlled call sites such as uptime monitors. -// Cloud metadata (169.254.169.254), link-local, loopback, and all reserved ranges -// are still blocked when this option is active. -func WithAllowRFC1918() ValidationOption { - return func(c *ValidationConfig) { c.AllowRFC1918 = true } -} -``` - -**`ValidateExternalURL` initializer** — ensure the default `ValidationConfig` sets `AllowRFC1918: false` explicitly. The current initialization block already defaults unlisted bools to `false`, so no line change is required here. - -**Change 3 — Update Phase 4 private IP blocking loop in `ValidateExternalURL`** - -This is the critical logic change. Locate the IPv4-mapped IPv6 check block and the `IsPrivateIP` call inside the `if config.BlockPrivateIPs` block: - -```go -if config.BlockPrivateIPs { - for _, ip := range ips { - if ip.To4() != nil && ip.To16() != nil && isIPv4MappedIPv6(ip) { - ipv4 := ip.To4() - if network.IsPrivateIP(ipv4) { - return "", fmt.Errorf("connection to private ip addresses is blocked for security (detected IPv4-mapped IPv6: %s)", ip.String()) - } - } - - if network.IsPrivateIP(ip) { - sanitizedIP := sanitizeIPForError(ip.String()) - if ip.String() == "169.254.169.254" { - return "", fmt.Errorf("access to cloud metadata endpoints is blocked for security (detected: %s)", sanitizedIP) - } - return "", fmt.Errorf("connection to private ip addresses is blocked for security (detected: %s)", sanitizedIP) - } - } -} -``` - -Replace with: - -```go -if config.BlockPrivateIPs { - for _, ip := range ips { - // Handle IPv4-mapped IPv6 form (::ffff:a.b.c.d) to prevent SSRF bypass. - if ip.To4() != nil && ip.To16() != nil && isIPv4MappedIPv6(ip) { - ipv4 := ip.To4() - // RFC 1918 bypass applies even in IPv4-mapped IPv6 form. - if config.AllowRFC1918 && network.IsRFC1918(ipv4) { - continue - } - if network.IsPrivateIP(ipv4) { - return "", fmt.Errorf("connection to private ip addresses is blocked for security (detected IPv4-mapped IPv6: %s)", ip.String()) - } - } - - // Admin-configured monitors may target LAN services; allow RFC 1918 only. - // Link-local (169.254.x.x), loopback, cloud metadata, and reserved ranges - // remain blocked unconditionally even when AllowRFC1918 is set. - if config.AllowRFC1918 && network.IsRFC1918(ip) { - continue - } - - if network.IsPrivateIP(ip) { - sanitizedIP := sanitizeIPForError(ip.String()) - if ip.String() == "169.254.169.254" { - return "", fmt.Errorf("access to cloud metadata endpoints is blocked for security (detected: %s)", sanitizedIP) - } - return "", fmt.Errorf("connection to private ip addresses is blocked for security (detected: %s)", sanitizedIP) - } - } -} -``` - -*No other functions in `url_validator.go` require changes.* - ---- - -#### File 3: `backend/internal/services/uptime_service.go` - -**Package:** `services` - -**Change 1 — `checkMonitor`: add `WithAllowRFC1918()` to URL validation** - -Locate the `security.ValidateExternalURL` call inside the `case "http", "https":` branch: - -```go -validatedURL, err := security.ValidateExternalURL( - monitor.URL, - // Uptime monitors are an explicit admin-configured feature and commonly - // target loopback in local/dev setups (and in unit tests). - security.WithAllowLocalhost(), - security.WithAllowHTTP(), - security.WithTimeout(3*time.Second), -) -``` - -Replace with: - -```go -validatedURL, err := security.ValidateExternalURL( - monitor.URL, - // Uptime monitors are admin-configured; LAN targets are a legitimate use-case. - security.WithAllowLocalhost(), - security.WithAllowHTTP(), - // Allow RFC 1918 private ranges for LAN service monitoring. Cloud metadata - // (169.254.169.254), link-local, and loopback remain blocked. - security.WithAllowRFC1918(), - security.WithTimeout(3*time.Second), -) -``` - -**Change 2 — `checkMonitor`: add `WithAllowRFC1918()` to HTTP client** - -Locate the `network.NewSafeHTTPClient` call immediately below the URL validation block: - -```go -client := network.NewSafeHTTPClient( - network.WithTimeout(10*time.Second), - network.WithDialTimeout(5*time.Second), - // Explicit redirect policy per call site: disable. - network.WithMaxRedirects(0), - // Uptime monitors are an explicit admin-configured feature and commonly - // target loopback in local/dev setups (and in unit tests). - network.WithAllowLocalhost(), -) -``` - -Replace with: - -```go -client := network.NewSafeHTTPClient( - network.WithTimeout(10*time.Second), - network.WithDialTimeout(5*time.Second), - // Explicit redirect policy per call site: disable. - network.WithMaxRedirects(0), - // Uptime monitors are admin-configured; LAN targets are a legitimate use-case. - network.WithAllowLocalhost(), - // Must mirror the WithAllowRFC1918() passed to ValidateExternalURL above. - // Both the URL validator (DNS resolution) and the safe dialer (TCP connect) - // enforce SSRF rules independently; both must be relaxed or the fix is partial. - network.WithAllowRFC1918(), -) -``` - -**Change 3 — `checkMonitor`: annotate the TCP bypass** - -Locate the TCP case: - -```go -case "tcp": - conn, err := net.DialTimeout("tcp", monitor.URL, 10*time.Second) -``` - -Add a comment above the dial line: - -```go -case "tcp": - // TCP monitors use net.DialTimeout directly, bypassing the URL validator and - // safe dialer entirely. This is a deliberate design choice: TCP monitors accept - // only admin-configured host:port strings (no URL parsing, no redirects, no DNS - // rebinding surface), so the SSRF attack vector is minimal. If SSRF validation - // is ever added to TCP monitors, it must also receive WithAllowRFC1918() so that - // LAN services continue to be reachable. - conn, err := net.DialTimeout("tcp", monitor.URL, 10*time.Second) -``` - -*No other functions in `uptime_service.go` require changes.* - ---- - -### B. Option Pattern Design - -The implementation uses two parallel functional-option systems that must be kept in sync at the call site. They share identical semantics but live in different packages for separation of concerns. - -#### `ValidationConfig` and `ValidationOption` (in `security` package) - -The existing struct gains one field: - -```go -type ValidationConfig struct { - AllowLocalhost bool - AllowHTTP bool - MaxRedirects int - Timeout time.Duration - BlockPrivateIPs bool - AllowRFC1918 bool // NEW: permits 10/8, 172.16/12, 192.168/16 -} -``` - -New option constructor: - -```go -func WithAllowRFC1918() ValidationOption { - return func(c *ValidationConfig) { c.AllowRFC1918 = true } -} -``` - -#### `ClientOptions` and `Option` (in `network` package) - -The existing struct gains one field: - -```go -type ClientOptions struct { - Timeout time.Duration - AllowLocalhost bool - AllowedDomains []string - MaxRedirects int - DialTimeout time.Duration - AllowRFC1918 bool // NEW: permits 10/8, 172.16/12, 192.168/16 -} -``` - -New option constructor and new exported predicate: - -```go -func WithAllowRFC1918() Option { - return func(opts *ClientOptions) { opts.AllowRFC1918 = true } -} - -func IsRFC1918(ip net.IP) bool { /* see File 1, Change 2 above */ } -``` - -#### Coordination in `uptime_service.go` - -The two options are always activated together. The ordering at the call site makes this explicit: - -```go -security.ValidateExternalURL( - monitor.URL, - security.WithAllowLocalhost(), - security.WithAllowHTTP(), - security.WithAllowRFC1918(), // ← Layer 1 relaxed - security.WithTimeout(3*time.Second), -) - -network.NewSafeHTTPClient( - network.WithTimeout(10*time.Second), - network.WithDialTimeout(5*time.Second), - network.WithMaxRedirects(0), - network.WithAllowLocalhost(), - network.WithAllowRFC1918(), // ← Layer 2 relaxed (must mirror Layer 1) -) -``` - -**Invariant:** Any future call site that enables `WithAllowRFC1918()` at Layer 1 MUST also enable it at Layer 2 (and vice-versa), or the fix will only partially work. The comments at the call site in `uptime_service.go` make this constraint explicit. - ---- - -### C. Test Plan - -All test changes are additive — no existing tests are modified. - -#### `backend/internal/network/safeclient_test.go` - -| # | Test Function | Scenario | Expected Result | -|---|--------------|----------|-----------------| -| 1 | `TestIsRFC1918_RFC1918Addresses` | Table-driven: `10.0.0.1`, `10.255.255.255`, `172.16.0.1`, `172.31.255.255`, `192.168.0.1`, `192.168.255.255` | `IsRFC1918` returns `true` for all | -| 2 | `TestIsRFC1918_NonRFC1918Addresses` | Table-driven: `169.254.169.254`, `127.0.0.1`, `::1`, `8.8.8.8`, `0.0.0.1`, `240.0.0.1`, `fe80::1`, `fc00::1` | `IsRFC1918` returns `false` for all | -| 3 | `TestIsRFC1918_NilIP` | `nil` IP | Returns `false` (nil is not RFC 1918; `IsPrivateIP` handles nil → block) | -| 4 | `TestIsRFC1918_BoundaryAddresses` | `172.15.255.255` (just outside range), `172.32.0.0` (just outside), `192.167.255.255` (just outside), `192.169.0.0` (just outside) | `IsRFC1918` returns `false` for all | -| 5 | `TestSafeDialer_AllowRFC1918_ValidationLoopSkipsRFC1918` | `ClientOptions{AllowRFC1918: true, DialTimeout: 1s}` (no `AllowLocalhost`), call `safeDialer` against a host that resolves to `192.168.1.1`; the TCP connect will fail (nothing listening), but the returned error must NOT contain `"connection to private IP blocked"` — absence of that string confirms the RFC 1918 bypass path was taken. White-box test in `package network` (`safeclient_test.go`). | Error does not contain `"connection to private IP blocked"`; confirms `if opts.AllowRFC1918 && IsRFC1918(ip.IP) { continue }` was executed. | -| 6 | `TestSafeDialer_AllowRFC1918_BlocksLinkLocal` | `ClientOptions{AllowRFC1918: true, DialTimeout: 1s}`, dial to `169.254.169.254:80` | Returns error containing "private IP blocked" | -| 7 | `TestSafeDialer_AllowRFC1918_BlocksLoopbackWithoutAllowLocalhost` | `ClientOptions{AllowRFC1918: true, AllowLocalhost: false, DialTimeout: 1s}`, dial to `127.0.0.1:80` | Returns error; loopback not covered by AllowRFC1918 | -| 8 | `TestNewSafeHTTPClient_AllowRFC1918_BlocksSSRFMetadata` | Create client with `WithAllowRFC1918()`, attempt GET to `http://169.254.169.254/` | Error; cloud metadata endpoint not accessible | -| 9 | `TestNewSafeHTTPClient_WithAllowRFC1918_OptionApplied` | Create client with `WithAllowRFC1918()`, verify `ClientOptions.AllowRFC1918` is `true` | Structural test — option propagates to config | -| 10 | `TestIsRFC1918_IPv4MappedAddresses` | Table-driven: `IsRFC1918(net.ParseIP("::ffff:192.168.1.1"))` → `true`; `IsRFC1918(net.ParseIP("::ffff:169.254.169.254"))` → `false`. White-box test in `package network` (`safeclient_test.go`). | `true` for IPv4-mapped RFC 1918; `false` for IPv4-mapped link-local/non-RFC-1918. Validates `To4()` normalization in `IsRFC1918`. | - -**Implementation note for tests 5–9:** Test 5 (`TestSafeDialer_AllowRFC1918_ValidationLoopSkipsRFC1918`) dials a host resolving to `192.168.1.1` with no `AllowLocalhost`; the expected outcome is that the error does NOT match `"connection to private IP blocked"` (a TCP connection-refused error from nothing listening is acceptable — it means the validation loop passed the IP). For tests 6–9, real TCP connections to RFC 1918 addresses are unavailable in the CI environment. Those tests should use `httptest.NewServer` (which binds to loopback) combined with `AllowLocalhost: true` and `AllowRFC1918: true` to verify no *validation* error occurs. For tests that must confirm a block (e.g., `169.254.169.254`), the dialer is called directly with a short `DialTimeout` — the expected error is the SSRF block error, not a connection-refused error. - -#### `backend/internal/security/url_validator_test.go` - -| # | Test Function | Scenario | Expected Result | -|---|--------------|----------|-----------------| -| 11 | `TestValidateExternalURL_WithAllowRFC1918_Permits10x` | `http://10.0.0.1` with `WithAllowHTTP()` + `WithAllowRFC1918()` | Passes URL validation phase (may fail DNS in test env — accept `dns resolution failed` as OK since that means validation passed) | -| 12 | `TestValidateExternalURL_WithAllowRFC1918_Permits172_16x` | `http://172.16.0.1` with `WithAllowHTTP()` + `WithAllowRFC1918()` | Same as above | -| 13 | `TestValidateExternalURL_WithAllowRFC1918_Permits192_168x` | `http://192.168.1.100` with `WithAllowHTTP()` + `WithAllowRFC1918()` | Same as above | -| 14 | `TestValidateExternalURL_WithAllowRFC1918_BlocksMetadata` | `http://169.254.169.254` with `WithAllowHTTP()` + `WithAllowRFC1918()` | Returns error containing `"cloud metadata endpoints is blocked"` | -| 15 | `TestValidateExternalURL_WithAllowRFC1918_BlocksLinkLocal` | `http://169.254.10.1` with `WithAllowHTTP()` + `WithAllowRFC1918()` | Returns error containing `"private ip addresses is blocked"` | -| 16 | `TestValidateExternalURL_WithAllowRFC1918_BlocksLoopback` | `http://127.0.0.1` with `WithAllowHTTP()` + `WithAllowRFC1918()` (no `WithAllowLocalhost`) | Returns error; loopback not covered | -| 17 | `TestValidateExternalURL_RFC1918BlockedByDefault` | `http://192.168.1.1` with `WithAllowHTTP()` only (no RFC 1918 option) | Returns error containing `"private ip addresses is blocked"` — regression guard | -| 18 | `TestValidateExternalURL_WithAllowRFC1918_IPv4MappedIPv6Allowed` | `http://[::ffff:192.168.1.1]` with `WithAllowHTTP()` + `WithAllowRFC1918()` | Permit (RFC 1918 bypass applies to IPv4-mapped form too) | -| 19 | `TestValidateExternalURL_WithAllowRFC1918_IPv4MappedMetadataBlocked` | `http://[::ffff:169.254.169.254]` with `WithAllowHTTP()` + `WithAllowRFC1918()` | Blocked; cloud metadata remains rejected in mapped form | - -**Implementation note for tests 11–13:** `ValidateExternalURL` calls `net.Resolver.LookupIP` on literal IP strings. In Go, `LookupIP` on a literal IP (e.g., `"192.168.1.1"`) returns that IP immediately without a real DNS query. So tests 11–13 should succeed at validation and return either a normalized URL (success) or a DNS timeout error if the test environment's resolver behaves unexpectedly. Both outcomes are acceptable; the important invariant is that the returned error must NOT contain `"private ip addresses is blocked"`. - -#### `backend/internal/services/uptime_service_test.go` - -| # | Test Function | Scenario | Expected Result | -|---|--------------|----------|-----------------| -| 20 | `TestCheckMonitor_HTTP_LocalhostSucceedsWithPrivateIPBypass` | Start an `httptest.NewServer` on loopback, create HTTP monitor pointing to its URL, call `checkMonitor` — verifies that `WithAllowLocalhost` + `WithAllowRFC1918` together produce a "up" result | Monitor status `"up"`, heartbeat message `"HTTP 200"` | -| 21 | `TestCheckMonitor_TCP_AcceptsRFC1918Address` | TCP monitor with `host:port` format pointing to a server listening on `127.0.0.1`, call `checkMonitor` | Success (TCP uses `net.DialTimeout`, no SSRF layer to relax) | - ---- - -### D. Security Review Checklist - -Every item below is a security property that the implementation must satisfy. Each entry names the property, which code enforces it, and how to verify it. - -| # | Property | Enforced By | Verification Method | -|---|----------|-------------|---------------------| -| 1 | **Cloud metadata remains blocked.** `169.254.169.254` (AWS/GCP/Azure metadata service) is never reachable, even with `AllowRFC1918` active. | `IsRFC1918` returns `false` for `169.254.x.x` (link-local, not RFC 1918). Both `ValidateExternalURL` and `safeDialer` will still call `IsPrivateIP` which blocks `169.254.0.0/16`. | Test 8 + Test 14. | -| 2 | **Full link-local range blocked.** Not just `169.254.169.254` but the entire `169.254.0.0/16` range is blocked. | Same as #1. `IsPrivateIP` covers `169.254.0.0/16`. `IsRFC1918` excludes this range. | Test 6 + Test 15. | -| 3 | **Loopback does not gain blanket bypass.** `127.0.0.0/8` and `::1` are not unblocked by `AllowRFC1918`. Only `AllowLocalhost` can bypass loopback, and it is not added unexpectedly. | `IsRFC1918` only covers the three RFC 1918 ranges. Loopback is handled independently by `AllowLocalhost`. | Test 7 + Test 16. | -| 4 | **IPv6 unique-local and link-local remain blocked.** `fc00::/7` and `fe80::/10` are not unblocked. RFC 1918 is IPv4-only. | `IsRFC1918` converts to IPv4 via `To4()`; it returns `false` for all IPv6 addresses. | Test 2 (`fe80::1`, `fc00::1` in `TestIsRFC1918_NonRFC1918Addresses`). | -| 5 | **Reserved ranges remain blocked.** `0.0.0.0/8`, `240.0.0.0/4`, `255.255.255.255/32` are not unblocked. | Same as above — not in `rfc1918CIDRs`. | Test 2 (`0.0.0.1`, `240.0.0.1` in `TestIsRFC1918_NonRFC1918Addresses`). | -| 6 | **RFC 1918 bypass is bounded precisely.** Addresses just outside the three RFC 1918 ranges (e.g., `172.15.255.255`, `172.32.0.0`) are not treated as RFC 1918. | `net.ParseCIDR` + `block.Contains` provide exact CIDR boundary enforcement. | Test 4 (`TestIsRFC1918_BoundaryAddresses`). | -| 7 | **IPv4-mapped IPv6 addresses are handled.** `::ffff:192.168.1.1` is permitted with `AllowRFC1918`; `::ffff:169.254.169.254` is not. | `IsRFC1918` normalizes to IPv4 via `To4()` before CIDR check. The URL validator's IPv4-mapped branch also checks `IsRFC1918` before `IsPrivateIP`. Unit-level coverage provided by Test 10 (`TestIsRFC1918_IPv4MappedAddresses`). | Test 10 + Test 18 + Test 19. | -| 8 | **Option is not accessible to unauthenticated users.** The uptime monitor check routes are behind `authMiddleware` + `middleware.RequireManagementAccess()`. | `routes.go` nests uptime routes inside `management` group which is `protected.Group("/")` with `RequireManagementAccess()`. | Code review of `backend/internal/api/routes/routes.go` (confirmed: `management.POST("/uptime/monitors/:id/check", ...)` at line 461). | -| 9 | **Option is not applied to user-facing URL validation.** Webhook URLs, notification URLs, and other user-supplied inputs use `ValidateExternalURL` without `WithAllowRFC1918()`. | `WithAllowRFC1918()` is only added in `checkMonitor` in `uptime_service.go`. No other `ValidateExternalURL` call site is modified. | Grep all `ValidateExternalURL` call sites; verify only `uptime_service.go` carries `WithAllowRFC1918()`. | -| 10 | **Both layers are consistently relaxed.** If `WithAllowRFC1918()` is at Layer 1 (URL validator), it is also at Layer 2 (safe dialer). Partial bypass is not possible. | Comment in `uptime_service.go` creates a code-review anchor. | Cross-reference Layer 1 and Layer 2 call sites in `checkMonitor`. | -| 11 | **DNS rebinding is still defeated.** The safe dialer re-resolves the hostname at connect time and re-applies the same RFC 1918 policy. A hostname that resolves to a public IP during validation cannot be swapped for a private non-RFC-1918 IP at connect time. | `safeDialer` validates ALL resolved IPs against the same logic as the URL validator. `IsPrivateIP` is still called for non-RFC-1918 addresses. | Existing `TestSafeDialer_BlocksPrivateIPs` remains unchanged and continues to pass. | -| 12 | **`IsRFC1918(nil)` returns `false`, not `true`.** `IsPrivateIP(nil)` returns `true` (block-by-default safety). `IsRFC1918(nil)` should return `false` because `nil` is not an RFC 1918 address — it would fall through to `IsPrivateIP` which handles the nil case. | Early nil check in `IsRFC1918`: `if ip == nil { return false }`. | Test 3 (`TestIsRFC1918_NilIP`). | -| 13 | **`CheckMonitor` exported wrapper propagates the fix automatically.** The exported `CheckMonitor` method delegates directly to the unexported `checkMonitor`. All RFC 1918 option changes applied inside `checkMonitor` take effect for both entry points without separate configuration. | `uptime_service.go`: `CheckMonitor` calls `checkMonitor` without re-creating the HTTP client or invoking `ValidateExternalURL` independently. | Code review: verify `CheckMonitor` does not construct its own HTTP client or URL validation path outside of `checkMonitor`. | -| 14 | **Coordination invariant is comment-enforced; integration test can assert the contract.** The requirement that Layer 1 (`WithAllowRFC1918()` in `ValidateExternalURL`) and Layer 2 (`WithAllowRFC1918()` in `NewSafeHTTPClient`) are always relaxed together is documented via the inline comment at the `NewSafeHTTPClient` call site in `checkMonitor`. Partial bypass — relaxing only one layer — is not possible silently because the code-review anchor makes the intent explicit. A `TestCheckMonitor` integration test (Test 20) can additionally assert the "up" outcome to confirm both layers cooperate. | Comment in `uptime_service.go`: "Must mirror the `WithAllowRFC1918()` passed to `ValidateExternalURL` above." | Cross-reference both `WithAllowRFC1918()` call sites in `checkMonitor`; any future call site adding only one of the two options is a mis-use detectable at code review. | - ---- - -### E. Commit Message - -``` -fix(uptime): allow RFC 1918 IPs for admin-configured monitors - -HTTP/HTTPS uptime monitors targeting LAN addresses (e.g., 192.168.x.x, -10.x.x.x, 172.16.x.x) permanently reported "down" on fresh installs. -The SSRF protection layer silently blocked private IPs at two independent -checkpoints — URL validation and TCP dial time — causing monitors that -pointed to self-hosted LAN services to always fail. - -Introduce WithAllowRFC1918() as a functional option in both the -url_validator (security package) and NewSafeHTTPClient / safeDialer -(network package). A new IsRFC1918() exported predicate in the network -package covers exactly the three RFC 1918 ranges without touching the -broader IsPrivateIP logic. - -Apply WithAllowRFC1918() exclusively in checkMonitor() (uptime_service.go) -for the http/https case. Both layers are relaxed in concert; relaxing only -one produces a partial fix where URL validation passes but the TCP dialer -still blocks the connection. - -Security properties preserved: -- 169.254.169.254 and the full 169.254.0.0/16 link-local range remain - blocked unconditionally (not RFC 1918) -- Loopback (127.0.0.0/8, ::1) is not affected by this option -- IPv6 unique-local (fc00::/7) and link-local (fe80::/10) remain blocked -- Reserved ranges (0.0.0.0/8, 240.0.0.0/4, broadcast) remain blocked -- The bypass is only reachable by authenticated management-tier users -- No user-facing URL validation call site is modified - -Add an explanatory comment at the TCP net.DialTimeout call site in -checkMonitor documenting the deliberate SSRF bypass for TCP monitors. - -Fixes issues 6 and 7 from the fresh-install bug report. -``` - ---- - -## PR-4: CrowdSec First-Enable UX (Issues 3 & 4) - -**Title:** fix(frontend): stabilize CrowdSec first-enable UX and guard empty-value regression -**Issues Resolved:** Issue 3 (UI bugs on first enabling CrowdSec) + Issue 4 ("required value" error) -**Dependencies:** PR-1 (already merged — confirmed by code inspection) -**Status:** APPROVED (after Supervisor corrections applied) - ---- - -### Overview - -Two bugs compound to produce a broken first-enable experience. **Issue 4** (backend) is already -fixed: `UpdateSettingRequest.Value` no longer carries `binding:"required"` (confirmed in -`backend/internal/api/handlers/settings_handler.go` line 116 — the tag reads `json:"value"` with -no `binding` directive). PR-4 only needs a regression test to preserve this, plus a note in the -plan confirming it is done. - -**Issue 3** (frontend) is the real work. When CrowdSec is first enabled, the -`crowdsecPowerMutation` in `Security.tsx` takes 10–60 seconds to complete. During this window: - -1. **Toggle flicker** — `switch checked` reads `crowdsecStatus?.running ?? status.crowdsec.enabled`. - Both sources lag behind user intent: `crowdsecStatus` is local state that hasn't been - re-fetched yet (`null`), and `status.crowdsec.enabled` is the stale server value (`false` still, - because `queryClient.invalidateQueries` fires only in `onSuccess`, which has not fired). The - toggle therefore immediately reverts to unchecked the moment it is clicked. - -2. **Stale "Disabled" badge** — The `` inside the CrowdSec card reads the same condition - and shows "Disabled" for the entire startup duration even though the user explicitly enabled it. - -3. **Premature `CrowdSecKeyWarning`** — The warning is conditionally rendered at - `Security.tsx` line ~355. The condition is `status.cerberus?.enabled && (crowdsecStatus?.running ?? status.crowdsec.enabled)`. During startup the condition may briefly evaluate `true` after - `crowdsecStatus` is updated by `fetchCrowdsecStatus()` inside the mutation body, before bouncer - registration completes on the backend, causing the key-rejection warning to flash. - -4. **LAPI "not ready" / "not running" alerts in `CrowdSecConfig.tsx`** — If the user navigates to - `/security/crowdsec` while the mutation is running, `lapiStatusQuery` (which polls every 5s) will - immediately return `running: false` or `lapi_ready: false`. The 3-second `initialCheckComplete` - guard is insufficient for a 10–60 second startup. The page shows an alarming red "CrowdSec not - running" banner unnecessarily. - ---- - -### A. Pre-flight: Issue 4 Verification and Regression Test - -#### Confirmed Status - -Open `backend/internal/api/handlers/settings_handler.go` at **line 115–121**. The current struct -is: - -```go -type UpdateSettingRequest struct { - Key string `json:"key" binding:"required"` - Value string `json:"value"` - Category string `json:"category"` - Type string `json:"type"` -} -``` - -`binding:"required"` is absent from `Value`. The backend fix is **complete**. - -#### Handler Compensation: No Additional Key-Specific Validation Needed - -Scan the `UpdateSetting` handler body (lines 127–250). The only value-level validation that exists -targets two specific keys: -- `security.admin_whitelist` → calls `validateAdminWhitelist(req.Value)` (line ~138) -- `caddy.keepalive_idle` / `caddy.keepalive_count` → calls `validateOptionalKeepaliveSetting` (line ~143) - -Both already handle empty values gracefully by returning early or using zero-value defaults. No new -key-specific validation is required for the CrowdSec enable flow. - -#### Regression Test to Add - -**File:** `backend/internal/api/handlers/settings_handler_test.go` - -**Test name:** `TestUpdateSetting_EmptyValueIsAccepted` - -**Location in file:** Add to the existing `TestUpdateSetting*` suite. The file uses `package handlers_test` and already has a `mockCaddyConfigManager` / `mockCacheInvalidator` test harness. - -**What it asserts:** - -``` -POST /settings body: {"key":"security.crowdsec.enabled","value":""} -→ HTTP 200 (not 400) -→ DB contains a Setting row with Key="security.crowdsec.enabled" and Value="" -``` - -**Scaffolding pattern** (mirror the helpers already present in the test file): - -```go -func TestUpdateSetting_EmptyValueIsAccepted(t *testing.T) { - db := setupTestDB(t) // helper already in the test file - h := handlers.NewSettingsHandler(db) - router := setupTestRouter(h) // helper already in the test file - - body := `{"key":"security.crowdsec.enabled","value":""}` - req := httptest.NewRequest(http.MethodPost, "/settings", strings.NewReader(body)) - req.Header.Set("Content-Type", "application/json") - // inject admin role as the existing test helpers do - injectAdminContext(req) // helper pattern used across the file - - w := httptest.NewRecorder() - router.ServeHTTP(w, req) - - assert.Equal(t, http.StatusOK, w.Code, "empty Value must not trigger a 400 validation error") - - var s models.Setting - require.NoError(t, db.Where("key = ?", "security.crowdsec.enabled").First(&s).Error) - assert.Equal(t, "", s.Value) -} -``` - -**Why this test matters:** Gin's `binding:"required"` treats the empty string `""` as a missing -value for `string` fields and returns 400. Without this test, re-adding the tag silently (e.g. by a -future contributor copying the `Key` field's annotation) would regress the fix without any CI -signal. - ---- - -### B. Issue 3 Fix Plan — `frontend/src/pages/Security.tsx` - -**File:** `frontend/src/pages/Security.tsx` -**Affected lines:** ~90 (state block), ~168–228 (crowdsecPowerMutation), ~228–240 (derived vars), -~354–357 (CrowdSecKeyWarning gate), ~413–415 (card Badge), ~418–420 (card icon), ~429–431 (card -body text), ~443 (Switch checked prop). - -#### Change 1 — Derive `crowdsecChecked` from mutation intent - -**Current code block** (lines ~228–232): - -```tsx -const cerberusDisabled = !status.cerberus?.enabled -const crowdsecToggleDisabled = cerberusDisabled || crowdsecPowerMutation.isPending -const crowdsecControlsDisabled = cerberusDisabled || crowdsecPowerMutation.isPending -``` - -**Add immediately before `cerberusDisabled`:** - -```tsx -// During the crowdsecPowerMutation, use the mutation's argument as the authoritative -// checked state. Neither crowdsecStatus (local, stale) nor status.crowdsec.enabled -// (server, not yet invalidated) reflects the user's intent until onSuccess fires. -const crowdsecChecked = crowdsecPowerMutation.isPending - ? (crowdsecPowerMutation.variables ?? (crowdsecStatus?.running ?? status.crowdsec.enabled)) - : (crowdsecStatus?.running ?? status.crowdsec.enabled) -``` - -`crowdsecPowerMutation.variables` holds the `enabled: boolean` argument passed to `mutate()`. When -the user clicks to enable, `variables` is `true`; when they click to disable, it is `false`. This -is the intent variable that must drive the UI. - -#### Change 2 — Replace every occurrence of the raw condition in the CrowdSec card - -There are **six** places in the CrowdSec card (starting at line ~405) that currently read -`(crowdsecStatus?.running ?? status.crowdsec.enabled)`. All must be replaced with `crowdsecChecked`. - -| Location | JSX attribute / expression | Before | After | -|----------|---------------------------|--------|-------| -| Line ~413 | `Badge variant` | `(crowdsecStatus?.running ?? status.crowdsec.enabled) ? 'success' : 'default'` | `crowdsecPowerMutation.isPending && crowdsecPowerMutation.variables ? 'warning' : crowdsecChecked ? 'success' : 'default'` | -| Line ~415 | `Badge text` | `(crowdsecStatus?.running ?? status.crowdsec.enabled) ? t('common.enabled') : t('common.disabled')` | `crowdsecPowerMutation.isPending && crowdsecPowerMutation.variables ? t('security.crowdsec.starting') : crowdsecChecked ? t('common.enabled') : t('common.disabled')` | -| Line ~418 | `div bg class` | `(crowdsecStatus?.running ?? status.crowdsec.enabled) ? 'bg-success/10' : 'bg-surface-muted'` | `crowdsecChecked ? 'bg-success/10' : 'bg-surface-muted'` | -| Line ~420 | `ShieldAlert text class` | `(crowdsecStatus?.running ?? status.crowdsec.enabled) ? 'text-success' : 'text-content-muted'` | `crowdsecChecked ? 'text-success' : 'text-content-muted'` | -| Line ~429 | `CardContent body text` | `(crowdsecStatus?.running ?? status.crowdsec.enabled) ? t('security.crowdsecProtects') : t('security.crowdsecDisabledDescription')` | `crowdsecChecked ? t('security.crowdsecProtects') : t('security.crowdsecDisabledDescription')` | -| Line ~443 | `Switch checked` | `crowdsecStatus?.running ?? status.crowdsec.enabled` | `crowdsecChecked` | - -The `Badge` for the status indicator gets one additional case: the "Starting..." variant. Use -`variant="warning"` (already exists in the Badge component based on other usages in the file). - -#### Change 3 — Suppress `CrowdSecKeyWarning` during mutation - -**Current code** (lines ~353–357): - -```tsx -{/* CrowdSec Key Rejection Warning */} -{status.cerberus?.enabled && (crowdsecStatus?.running ?? status.crowdsec.enabled) && ( - -)} -``` - -**Replace with:** - -```tsx -{/* CrowdSec Key Rejection Warning — suppressed during startup to avoid flashing before bouncer registration completes */} -{status.cerberus?.enabled && !crowdsecPowerMutation.isPending && (crowdsecStatus?.running ?? status.crowdsec.enabled) && ( - -)} -``` - -The only change is `&& !crowdsecPowerMutation.isPending`. This prevents the warning from -rendering during the full 10–60s startup window. - -#### Change 4 — Broadcast "starting" state to the QueryClient cache - -`CrowdSecConfig.tsx` cannot read `crowdsecPowerMutation.isPending` directly — it lives in a -different component tree. The cleanest cross-component coordination mechanism in TanStack Query is -`queryClient.setQueryData` on a synthetic key. This is not an HTTP fetch; no network call occurs. -`CrowdSecConfig.tsx` consumes the value via `useQuery` with a stub `queryFn` and -`staleTime: Infinity`, which means it returns the cache value immediately. - -**Add `onMutate` to `crowdsecPowerMutation`** (insert before `onError` at line ~199): - -```tsx -onMutate: async (enabled: boolean) => { - if (enabled) { - queryClient.setQueryData(['crowdsec-starting'], { isStarting: true, startedAt: Date.now() }) - } -}, -``` - -Note: The `if (enabled)` guard is intentional. The disable path does NOT set `isStartingUp` in CrowdSecConfig.tsx — when disabling CrowdSec, 'LAPI not running' banners are accurate and should not be suppressed. - -**Modify `onError`** — add one line at the beginning of the existing handler body (line ~199): - -```tsx -onError: (err: unknown, enabled: boolean) => { - queryClient.setQueryData(['crowdsec-starting'], { isStarting: false }) - // ...existing error handling unchanged... -``` - -**Modify `onSuccess`** — add one line at the beginning of the existing handler body (line ~205): - -```tsx -onSuccess: async (result: { lapi_ready?: boolean; enabled?: boolean } | boolean) => { - queryClient.setQueryData(['crowdsec-starting'], { isStarting: false }) - // ...existing success handling unchanged... -``` - -The `startedAt` timestamp enables `CrowdSecConfig.tsx` to apply a safety cap: if the cache was -never cleared (e.g., the app crashed mid-mutation), the "is starting" signal expires after 90 -seconds regardless. - ---- - -### C. Issue 3 Fix Plan — `frontend/src/pages/CrowdSecConfig.tsx` - -**File:** `frontend/src/pages/CrowdSecConfig.tsx` -**Affected lines:** ~44 (state block, after `queryClient`), ~582 (LAPI-initializing warning), ~608 -(LAPI-not-running warning). - -#### Change 1 — Read the "starting" signal - -The file already declares `const queryClient = useQueryClient()` (line ~44). Insert immediately -after it: - -```tsx -// Read the "CrowdSec is starting" signal broadcast by Security.tsx via the -// QueryClient cache. No HTTP call is made; this is pure in-memory coordination. -const { data: crowdsecStartingCache } = useQuery<{ isStarting: boolean; startedAt?: number }>({ - queryKey: ['crowdsec-starting'], - queryFn: () => ({ isStarting: false, startedAt: 0 }), - staleTime: Infinity, - gcTime: Infinity, -}) - -// isStartingUp is true only while the mutation is genuinely running. -// The 90-second cap guards against stale cache if Security.tsx onSuccess/onError -// never fired (e.g., browser tab was closed mid-mutation). -const isStartingUp = - (crowdsecStartingCache?.isStarting === true) && - Date.now() - (crowdsecStartingCache.startedAt ?? 0) < 90_000 -``` - -#### Change 2 — Suppress the "LAPI initializing" yellow banner - -**Current condition** (line ~582): - -```tsx -{lapiStatusQuery.data && lapiStatusQuery.data.running && !lapiStatusQuery.data.lapi_ready && initialCheckComplete && ( -``` - -**Replace with:** - -```tsx -{lapiStatusQuery.data && lapiStatusQuery.data.running && !lapiStatusQuery.data.lapi_ready && initialCheckComplete && !isStartingUp && ( -``` - -#### Change 3 — Suppress the "CrowdSec not running" red banner - -**Current condition** (line ~608): - -```tsx -{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && ( -``` - -**Replace with:** - -```tsx -{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && !isStartingUp && ( -``` - -Both suppressions share the same `isStartingUp` flag derived in Change 1. When -`crowdsecPowerMutation` completes (or fails), `isStartingUp` immediately becomes `false`, and the -banners are re-evaluated based on real LAPI state. - ---- - -### D. Issue 3 Fix Plan — `frontend/src/components/CrowdSecKeyWarning.tsx` - -**No changes required to the component itself.** The suppression is fully handled at the call site -in `Security.tsx` (Section B, Change 3 above). The component's own render guard already returns -`null` if `isLoading` or `!keyStatus?.env_key_rejected`, which provides an additional layer of -safety. No new props are needed. - ---- - -### E. i18n Requirements - -#### New key: `security.crowdsec.starting` - -This key drives the CrowdSec card badge text during the startup window. It must be added inside the -`security.crowdsec` namespace object in every locale file. - -**Exact insertion point in each file:** Insert after the `"processStopped"` key inside the -`"crowdsec"` object (line ~252 in `en/translation.json`). - -| Locale file | Key path | Value | -|-------------|----------|-------| -| `frontend/src/locales/en/translation.json` | `security.crowdsec.starting` | `"Starting..."` | -| `frontend/src/locales/de/translation.json` | `security.crowdsec.starting` | `"Startet..."` | -| `frontend/src/locales/es/translation.json` | `security.crowdsec.starting` | `"Iniciando..."` | -| `frontend/src/locales/fr/translation.json` | `security.crowdsec.starting` | `"Démarrage..."` | -| `frontend/src/locales/zh/translation.json` | `security.crowdsec.starting` | `"启动中..."` | - -**Usage in `Security.tsx`:** - -```tsx -t('security.crowdsec.starting') -``` - -No other new i18n keys are required. The `CrowdSecConfig.tsx` changes reuse the existing keys -`t('crowdsecConfig.lapiInitializing')`, `t('crowdsecConfig.notRunning')`, etc. — they are only -suppressed via the `isStartingUp` guard, not replaced. - ---- - -### F. Test Plan - -#### 1. Backend Unit Test (Regression Guard for Issue 4) - -**File:** `backend/internal/api/handlers/settings_handler_test.go` -**Test type:** Go unit test (`go test ./backend/internal/api/handlers/...`) -**Mock requirements:** Uses the existing `setupTestDB` / `setupTestRouter` / `injectAdminContext` -helpers already present in the file. No additional mocks needed. - -| Test name | Assertion | Pass condition | -|-----------|-----------|----------------| -| `TestUpdateSetting_EmptyValueIsAccepted` | POST `{"key":"security.crowdsec.enabled","value":""}` returns HTTP 200 and the DB row has `Value=""` | HTTP 200, no 400 "required" error | -| `TestUpdateSetting_MissingKeyRejected` | POST `{"value":"true"}` (no `key` field) returns HTTP 400 | HTTP 400 — `Key` still requires `binding:"required"` | - -The second test ensures the `binding:"required"` was only removed from `Value`, not accidentally -from `Key` as well. - -#### 2. Frontend RTL Tests - -**Framework:** Vitest + React Testing Library (same as `frontend/src/hooks/__tests__/useSecurity.test.tsx`) - -##### File: `frontend/src/pages/__tests__/Security.crowdsec.test.tsx` (new file) - -**Scaffolding pattern:** Mirror the setup in `useSecurity.test.tsx` — `QueryClientProvider` wrapper, -`vi.mock('../api/crowdsec')`, `vi.mock('../api/settings')`, `vi.mock('../api/security')`, -`vi.mock('../hooks/useSecurity')`. - -| Test name | What is mocked | What is rendered | Assertion | -|-----------|---------------|-----------------|-----------| -| `toggle stays checked while crowdsecPowerMutation is pending` | `startCrowdsec` never resolves (pending promise). `getSecurityStatus` returns `{ cerberus: { enabled: true }, crowdsec: { enabled: false }, ... }`. `statusCrowdsec` returns `{ running: false, pid: 0, lapi_ready: false }`. | `` | Click the CrowdSec toggle → `Switch[data-testid="toggle-crowdsec"]` remains checked (`aria-checked="true"`) while mutation is pending. Without the fix, it would be unchecked. | -| `CrowdSec badge shows "Starting..." while mutation is pending` | Same as above | `` | Click toggle → Badge inside the CrowdSec card contains text "Starting...". | -| `CrowdSecKeyWarning is not rendered while crowdsecPowerMutation is pending` | Same as above. `getCrowdsecKeyStatus` returns `{ env_key_rejected: true, full_key: "abc" }`. | `` | Click toggle → `CrowdSecKeyWarning` (identified by its unique title text or `data-testid` if added) is not present in the DOM. | -| `toggle reflects correct final state after mutation succeeds` | `startCrowdsec` resolves `{ pid: 123, lapi_ready: true }`. `statusCrowdsec` returns `{ running: true, pid: 123, lapi_ready: true }`. | `` | After mutation resolves → toggle is checked, badge shows "Enabled". | -| `toggle reverts to unchecked when mutation fails` | `startCrowdsec` rejects with `new Error("failed")`. | `` | After rejection → toggle is unchecked, badge shows "Disabled". | - -##### File: `frontend/src/pages/__tests__/CrowdSecConfig.crowdsec.test.tsx` (new file) - -**Required mocks:** `vi.mock('../api/crowdsec')`, `vi.mock('../api/security')`, -`vi.mock('../api/featureFlags')`. Seed the `QueryClient` with -`queryClient.setQueryData(['crowdsec-starting'], { isStarting: true, startedAt: Date.now() })` before rendering. - -REQUIRED: Because `initialCheckComplete` is driven by a `setTimeout(..., 3000)` inside a `useEffect`, tests must use Vitest fake timers. Without this, positive-case tests will fail and suppression tests will vacuously pass: - -```ts -beforeEach(() => { - vi.useFakeTimers() -}) -afterEach(() => { - vi.useRealTimers() -}) -// In each test, after render(), advance timers past the 3s guard: -await vi.advanceTimersByTimeAsync(3001) -``` - -| Test name | Setup | Assertion | -|-----------|-------|-----------| -| `LAPI not-running banner suppressed when isStartingUp is true` | `lapiStatusQuery` loaded with `{ running: false, lapi_ready: false }`. `['crowdsec-starting']` cache: `{ isStarting: true, startedAt: Date.now() }`. `initialCheckComplete` timer fires normally. | `[data-testid="lapi-not-running-warning"]` is not present in DOM. | -| `LAPI initializing banner suppressed when isStartingUp is true` | `lapiStatusQuery` loaded with `{ running: true, lapi_ready: false }`. `['crowdsec-starting']` cache: `{ isStarting: true, startedAt: Date.now() }`. | `[data-testid="lapi-warning"]` is not present in DOM. | -| `LAPI not-running banner shows after isStartingUp expires` | `['crowdsec-starting']` cache: `{ isStarting: true, startedAt: Date.now() - 100_000 }` (100s ago, past 90s cap). `lapiStatusQuery` loaded with `{ running: false, lapi_ready: false }`. `initialCheckComplete` = true. | `[data-testid="lapi-not-running-warning"]` is present in DOM. | -| `LAPI not-running banner shows when isStartingUp is false` | `['crowdsec-starting']` cache: `{ isStarting: false }`. `lapiStatusQuery`: `{ running: false, lapi_ready: false }`. `initialCheckComplete` = true. | `[data-testid="lapi-not-running-warning"]` is present in DOM. | - -#### 3. Playwright E2E Tests - -E2E testing for CrowdSec startup UX is constrained because the Docker E2E environment does not have -CrowdSec installed. The mutations will fail immediately, making it impossible to test the "pending" -window with a real startup delay. - -**Recommended approach:** UI-only behavioral tests that mock the mutation pending state at the API -layer (via Playwright route interception), focused on the visible symptoms. - -**File:** `playwright/tests/security/crowdsec-first-enable.spec.ts` (new file) - -| Test title | Playwright intercept | Steps | Assertion | -|------------|---------------------|-------|-----------| -| `CrowdSec toggle stays checked while starting` | Intercept `POST /api/v1/admin/crowdsec/start` — respond after a 2s delay with success | Navigate to `/security`, click CrowdSec toggle | `[data-testid="toggle-crowdsec"]` has `aria-checked="true"` immediately after click (before response) | -| `CrowdSec card shows Starting badge while starting` | Same intercept | Click toggle | A badge with text "Starting..." is visible in the CrowdSec card | -| `CrowdSecKeyWarning absent while starting` | Same intercept; also intercept `GET /api/v1/admin/crowdsec/key-status` → return `{ env_key_rejected: true, full_key: "key123", ... }` | Click toggle | The key-warning alert (ARIA role "alert" with heading "CrowdSec API Key Updated") is not present | -| `Backend rejects empty key for setting` | No intercept | POST `{"key":"security.crowdsec.enabled","value":""}` via `page.evaluate` (or `fetch`) | Response code is 200 | - ---- - -### G. Commit Slicing Strategy - -**Decision:** Single PR (`PR-4`). - -**Rationale:** -- The backend change is a one-line regression test addition — not a fix (the fix is already in). -- The frontend changes are all tightly coupled: the `crowdsecChecked` derived variable feeds both - the toggle fix and the badge fix; the `onMutate` broadcast is consumed by `CrowdSecConfig.tsx`. - Splitting them would produce an intermediate state where `Security.tsx` broadcasts a signal that - nothing reads, or `CrowdSecConfig.tsx` reads a signal that is never set. -- Total file count: 5 files changed (`Security.tsx`, `CrowdSecConfig.tsx`, `en/translation.json`, - `de/translation.json`, `es/translation.json`, `fr/translation.json`, `zh/translation.json`) + - 2 new test files + 1 new test in the backend handler test file. Review surface is small. - -**Files changed:** - -| File | Change type | -|------|-------------| -| `frontend/src/pages/Security.tsx` | Derived state, onMutate, suppression | -| `frontend/src/pages/CrowdSecConfig.tsx` | useQuery cache read, conditional suppression | -| `frontend/src/locales/en/translation.json` | New key `security.crowdsec.starting` | -| `frontend/src/locales/de/translation.json` | New key `security.crowdsec.starting` | -| `frontend/src/locales/es/translation.json` | New key `security.crowdsec.starting` | -| `frontend/src/locales/fr/translation.json` | New key `security.crowdsec.starting` | -| `frontend/src/locales/zh/translation.json` | New key `security.crowdsec.starting` | -| `frontend/src/pages/__tests__/Security.crowdsec.test.tsx` | New RTL test file | -| `frontend/src/pages/__tests__/CrowdSecConfig.crowdsec.test.tsx` | New RTL test file | -| `backend/internal/api/handlers/settings_handler_test.go` | 2 new test functions | -| `playwright/tests/security/crowdsec-first-enable.spec.ts` | New E2E spec file | - -**Rollback:** The PR is independently revertable. No database migrations. No API contract changes. -The `['crowdsec-starting']` QueryClient key is ephemeral (in-memory only); removing the PR removes -the key cleanly. - ---- - -### H. Acceptance Criteria - -| # | Criterion | How to verify | -|---|-----------|---------------| -| 1 | POST `{"key":"any.key","value":""}` returns HTTP 200 | `TestUpdateSetting_EmptyValueIsAccepted` passes | -| 2 | CrowdSec toggle shows the user's intended state immediately after click, for the full pending duration | RTL test `toggle stays checked while crowdsecPowerMutation is pending` passes | -| 3 | CrowdSec card badge shows "Starting..." text while mutation is pending | RTL test `CrowdSec badge shows Starting... while mutation is pending` passes | -| 4 | `CrowdSecKeyWarning` is not rendered while `crowdsecPowerMutation.isPending` | RTL test `CrowdSecKeyWarning is not rendered while crowdsecPowerMutation is pending` passes | -| 5 | LAPI "not running" red banner absent on `CrowdSecConfig` while `isStartingUp` is true | RTL test `LAPI not-running banner suppressed when isStartingUp is true` passes | -| 6 | LAPI "initializing" yellow banner absent on `CrowdSecConfig` while `isStartingUp` is true | RTL test `LAPI initializing banner suppressed when isStartingUp is true` passes | -| 7 | Both banners reappear correctly after the 90s cap or after mutation completes | RTL test `LAPI not-running banner shows after isStartingUp expires` passes | -| 8 | Translation key `security.crowdsec.starting` exists in all 5 locale files | CI lint / i18n-check passes | -| 9 | Playwright: toggle does not flicker on click (stays `aria-checked="true"` during delayed API response) | E2E test `CrowdSec toggle stays checked while starting` passes | -| 10 | No regressions in existing `useSecurity.test.tsx` or other security test suites | Full Vitest suite green | - ---- - -### I. Commit Message - -``` -fix(frontend): stabilize CrowdSec first-enable UX and guard empty-value regression - -When CrowdSec is first enabled, the 10–60 second startup window caused the -toggle to immediately flicker back to unchecked, the card badge to show -"Disabled" throughout startup, the CrowdSecKeyWarning to flash before bouncer -registration completed, and CrowdSecConfig to show alarming LAPI-not-ready -banners at the user. - -Root cause: the toggle, badge, and warning conditions all read from stale -sources (crowdsecStatus local state and status.crowdsec.enabled server data), -neither of which reflects user intent during a pending mutation. - -Derive a crowdsecChecked variable from crowdsecPowerMutation.variables during -the pending window so the UI reflects intent, not lag. Suppress -CrowdSecKeyWarning unconditionally while the mutation is pending. Show a -"Starting..." badge variant (warning) during startup. - -Coordinate the "is starting" state to CrowdSecConfig.tsx via a synthetic -QueryClient cache key ['crowdsec-starting'] set in onMutate and cleared in -onSuccess/onError. CrowdSecConfig reads this key via useQuery and uses it to -suppress the LAPI-not-running and LAPI-initializing alerts during startup. -A 90-second safety cap prevents stale suppression if the mutation never resolves. - -Also add a regression test confirming that UpdateSettingRequest accepts an empty -string Value (the binding:"required" tag was removed in PR-1; this test ensures -it is not re-introduced). - -Adds security.crowdsec.starting i18n key to all 5 supported locales. - -Closes issue 3, closes issue 4 (regression test only, backend fix in PR-1). -``` - ---- - -## PR-5: TCP Monitor UX — Issue 5 - -**Title:** fix(frontend): correct TCP monitor URL placeholder and add per-type UX guidance -**Issue Resolved:** Issue 5 — "monitor tcp port ui can't add" -**Classification:** CONFIRMED BUG -**Dependencies:** None (independent of all other PRs) -**Status:** READY FOR IMPLEMENTATION - ---- - -### 1. Introduction - -A user attempting to create a TCP uptime monitor follows the URL input placeholder text, -which currently reads `"https://example.com or tcp://host:port"`. They enter -`tcp://192.168.1.1:8080` as instructed and submit the form. The creation silently fails -with an HTTP 500 error. The form provides no guidance on why — no inline error, no revised -placeholder, no helper text. - -This plan specifies four layered fixes ranging from the single-line minimum viable fix to -comprehensive per-type UX guidance and client-side validation. All four fixes belong in a -single small PR. - ---- - -### 2. Research Findings - -#### 2.1 Root Cause: `net.SplitHostPort` Chokes on the `tcp://` Scheme - -`net.SplitHostPort` in Go's standard library splits `"host:port"` on the last colon. When the -host portion itself contains a colon (as `"tcp://192.168.1.1"` does after the `://`), Go's -implementation returns `"too many colons in address"`. - -Full trace: - -``` -frontend: placeholder says "tcp://host:port" - ↓ -user enters: tcp://192.168.1.1:8080 - ↓ -POST /api/v1/uptime/monitors { name: "…", url: "tcp://192.168.1.1:8080", type: "tcp", … } - ↓ -uptime_handler.go:42 c.ShouldBindJSON(&req) → OK (all required fields present) - ↓ -uptime_service.go:1107 url.Parse("tcp://192.168.1.1:8080") → OK (valid URL, scheme=tcp) - ↓ -uptime_service.go:1122 net.SplitHostPort("tcp://192.168.1.1:8080") - → host = "tcp://192.168.1.1" ← contains a colon (the : after tcp) - → Go: byteIndex(host, ':') >= 0 → return addrErr(…, "too many colons in address") - ↓ -uptime_service.go:1123 returns fmt.Errorf("TCP URL must be in host:port format: %w", err) - ↓ -uptime_handler.go:51 c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()}) - → HTTP 500 {"error":"TCP URL must be in host:port format: address tcp://192.168.1.1:8080: too many colons in address"} - ↓ -frontend onError: toast.error("TCP URL must be in host:port format: …") - → user sees a cryptic toast, form stays open, no inline guidance -``` - -`net.SplitHostPort` accepts **only** the bare `host:port` form. Valid examples: -`192.168.1.1:8080`, `myserver.local:22`, `[::1]:8080` (IPv6). The scheme prefix `tcp://` -is not valid and is never stripped by the backend. - -#### 2.2 The Unit Test Pre-Empted the Fix But It Was Never Applied - -`frontend/src/pages/__tests__/Uptime.test.tsx` at **line 40** already mocks: -```typescript -'uptime.urlPlaceholder': 'https://example.com or host:port', -``` -The test was written with the corrected value in mind. The actual translation file was -never updated to match. This is a direct discrepancy that CI does not catch because the -unit test mocks the i18n layer — it never reads the real translation file. - -#### 2.3 State of All Locale Files - -`frontend/src/locales/en/translation.json` is the **only** locale file that contains the -newer uptime keys added when the Create Monitor modal was built. The other four locales -(de, fr, es, zh) all fall through to the English key values for these keys (i18next -fallback). The `urlPlaceholder` key does not exist in any non-English locale — they -inherit the English value verbatim, which is the broken one. - -Missing keys across **all** non-English locales (de, fr, es, zh): - -``` -uptime.addMonitor uptime.syncWithHosts uptime.createMonitor -uptime.monitorCreated uptime.syncComplete uptime.syncing -uptime.monitorType uptime.monitorUrl uptime.monitorTypeHttp -uptime.monitorTypeTcp uptime.urlPlaceholder uptime.autoRefreshing -uptime.noMonitorsFound uptime.triggerHealthCheck uptime.monitorSettings -uptime.failedToDeleteMonitor uptime.failedToUpdateMonitor -uptime.failedToTriggerCheck uptime.pendingFirstCheck uptime.last60Checks -uptime.unpaused uptime.deleteConfirmation uptime.loadingMonitors -``` - -**PR-5 addresses all new keys added by this PR** (see §5.1 below). The broader backfill of -missing non-English uptime keys is a separate i18n debt task outside PR-5's scope. - -#### 2.4 Component Architecture — Exact Locations in `Uptime.tsx` - -The `CreateMonitorModal` component occupies lines 342–476 of -`frontend/src/pages/Uptime.tsx`. - -Key lines within the component: - -| Line | Code | -|------|------| -| 342 | `const CreateMonitorModal: FC<{ onClose: () => void; t: (key: string) => string }> = ({ onClose, t }) => {` | -| 345 | `const [url, setUrl] = useState('');` | -| 346 | `const [type, setType] = useState<'http' \| 'tcp'>('http');` | -| 362 | `if (!name.trim() \|\| !url.trim()) return;` (`handleSubmit` guard) | -| 363 | `mutation.mutate({ name: name.trim(), url: url.trim(), type, interval, max_retries: maxRetries });` | -| 395–405 | URL `