Files
Charon/docs/plans/archive/ssrf-remediation.md
2026-02-19 16:34:10 +00:00

7.1 KiB
Raw Blame History

SSRF (go/request-forgery) Remediation — Supervisor-Aligned Plan

Status: Planning (active)

This plan remediates CodeQLs go/request-forgery (CWE-918 / SSRF) in the Go backend by hardening remaining outbound network call sites and removing blanket suppressions.

Constraints (Supervisor requirements)

  • Deny-by-default: Only allow outbound targets that are explicitly allowed by policy.
  • Docker/service-name deployments: Internal dependencies (notably CrowdSec LAPI and Caddy Admin) may be reached via Docker service names (e.g., http://crowdsec:8085, http://caddy:2019), not just localhost. The plan must support this safely via explicit allowlists.
  • No mergeable intermediate state: The global CodeQL exclusion for go/request-forgery in .github/codeql/codeql-config.yml must be removed in the same PR as the fixes.
  • Ignore proxy environment variables: SSRF-sensitive clients must not inherit HTTP_PROXY / HTTPS_PROXY / NO_PROXY behavior.
  • Redirect policy must be explicit: Each call site must either disable redirects or validate each hop against the same SSRF policy.
  • Keep TCP dialing scope minimal: Only touch TCP-dial endpoints if they are directly required for go/request-forgery remediation or explicitly called out by findings.

In-scope call sites (confirmed)

These are the known SSRF-relevant sinks discovered during recon:

Existing primitives (use, dont duplicate)

Target security posture

Default outbound policy (external)

  • Allow https only by default (allow http only where strictly needed).
  • Reject userinfo in URLs.
  • Block private/reserved/link-local/loopback/multicast ranges for both IPv4/IPv6.
  • Validate via DNS resolution and re-validate at connect time (DNS rebinding defense).

Internal service policy (explicit allowlist)

Some internal integrations must target non-public addresses in Docker or localhost scenarios. These are allowed only if:

  • The destination hostname is in an explicit allowlist (exact matches; no wildcards).
  • The port matches the expected service port.
  • Redirects are disabled (recommended) or strictly hop-validated.
  • Proxy env vars are ignored.

Implementation direction (plan-level):

  • Introduce a single internal-service allowlist configuration surface (e.g., an env-backed list) used by both CrowdSec and Caddy admin client validation.
  • Keep defaults strict: allow localhost/127.0.0.1/::1 only, and require explicit config to add Docker service hostnames (crowdsec, caddy).

Redirect policy (per call site)

  • Uptime monitor HTTP checks: redirects should be disabled.
  • CrowdSec LAPI calls: redirects should be disabled.
  • Caddy Admin API calls: redirects should be disabled.
  • URL connectivity test (url_testing.go): keep the current explicit redirect validation behavior (hop-validated) or disable redirects; either is acceptable if CodeQL sees a safe path.

Proxy env var policy

  • All SSRF-sensitive http.Transport instances used by the above call sites must set Proxy: nil (or otherwise guarantee proxy env vars are not consulted).
  • backend/internal/utils/url_testing.go already demonstrates this; use it as a precedent to standardize behavior.

Phases

Phase 0 — Recon (local-only, not mergeable)

  • Temporarily remove the go/request-forgery exclusion locally in .github/codeql/codeql-config.yml to surface the full finding list.
  • Run VS Code task: shell: Security: CodeQL Go Scan (CI-Aligned) [~60s].
  • Record findings with file/function mapping.

Exit criteria: A concrete list of findings drives Phase 1+. This phase alone must not be merged.

Phase 1 — Remediate uptime monitor outbound HTTP

  • Update the uptime HTTP/HTTPS path to:
    • Validate the URL via ValidateExternalURL (and use the validated result, not the raw input).
    • Use a safe HTTP client (dial-time checks) with:
      • proxy env vars ignored
      • redirects disabled

Acceptance criteria: CodeQL no longer flags the uptime monitor request path.

Phase 2 — Remediate CrowdSec handler direct LAPI calls

  • Replace plain http.Client usage in:
    • (*CrowdsecHandler).GetLAPIDecisions
    • (*CrowdsecHandler).CheckLAPIHealth

With:

  • strict validation of the configured LAPI base URL using the internal-service allowlist policy
  • safe URL construction (net/url for query strings)
  • safe HTTP client configuration:
    • redirects disabled
    • proxy env vars ignored

Acceptance criteria: CodeQL no longer flags these requests; behavior works for both localhost and Docker service-name deployments when explicitly allowlisted.

Phase 3 — Remediate Caddy admin client base URL and HTTP client

  • Validate adminAPIURL in caddy.NewClient using the internal-service allowlist policy.
  • Use a safe HTTP client:
    • redirects disabled
    • proxy env vars ignored
    • dial-time validation enabled

Acceptance criteria: Admin API client cannot be pointed at arbitrary destinations.

Phase 4 — Remove suppressions (atomic with phases 13)

  • Remove the global query filter exclusion for go/request-forgery from .github/codeql/codeql-config.yml.
  • Remove any inline suppressions related to go/request-forgery that are no longer required (notably any lgtm [go/request-forgery] style comments).

Acceptance criteria: CodeQL Go scan passes without the global exclusion.

Tests

  • Add deterministic unit tests for:
    • URL validation rules in ValidateExternalURL
    • redirect behavior (disabled vs hop-validated) where applicable
    • internal-service allowlist behavior (localhost + explicit service-name allow)

Do not rely on real DNS/network. Prefer httptest.Server and validation-only failure cases.

Definition of Done (DoD)

  • VS Code task: shell: Lint: Pre-commit (All Files) passes.
  • VS Code task: shell: Test: Backend with Coverage passes and backend coverage ≥ 85%.
  • VS Code task: shell: Security: CodeQL Go Scan (CI-Aligned) [~60s] passes with go/request-forgery enabled (no global exclusion).
  • Proxy env var behavior is deterministic for SSRF-sensitive clients.
  • Redirect behavior is explicit per call site (disabled or hop-validated).