Proxy TLS & IP Login Recovery Plan ================================== Context - Proxy hosts return ERR_SSL_PROTOCOL_ERROR after container build succeeds; TLS handshake likely broken in generated Caddy config or certificate provisioning. - Charon login fails with “invalid credentials” when UI is accessed via raw IP/port; likely cookie or header handling across HTTP/non-SNI scenarios. - Security scans can wait until connectivity and login paths are stable. Goals - Restore HTTPS/HTTP reachability for proxy hosts and admin UI without TLS protocol errors. - Make login succeed when using IP:port access while preserving secure defaults for domain-based HTTPS. - Keep changes minimal per request; batch verification runs. Phase 1 — Fast Repro & Evidence (single command batch) - Build is running remotely; use the deployed host [http://100.98.12.109:8080](http://100.98.12.109:8080) (not localhost) for repro. If HTTPS is exposed, also probe [https://100.98.12.109](https://100.98.12.109). - Capture logs remotely: docker logs (Caddy + Charon) to logs/build/proxy-ssl.log and logs/build/login-ip.log on the remote node. - From the remote container, fetch live Caddy config: curl [http://127.0.0.1:2019/config](http://127.0.0.1:2019/config) > logs/build/caddy-live.json. - Snapshot TLS handshake from a reachable vantage point: openssl s_client -connect 100.98.12.109:443 -servername {first-proxy-domain} -tls1_2 to capture protocol/alert. Phase 2 — Diagnose ERR_SSL_PROTOCOL_ERROR in Caddy pipeline - Inspect generation path: [backend/internal/caddy/manager.go](backend/internal/caddy/manager.go) ApplyConfig → GenerateConfig; ensure ACME email/provider/flags are loaded from settings. - Review server wiring: [backend/internal/caddy/config.go](backend/internal/caddy/config.go) sets servers to listen on :80/:443 with AutoHTTPS enabled. Check whether hosts with IP literals are being treated like domain names (Caddy cannot issue ACME for IP; may yield protocol alerts). - Inspect per-host TLS inputs: models.ProxyHost.CertificateID/Certificate.Provider (custom vs ACME), DomainNames normalization, and AdvancedConfig WAF handlers that might inject broken handlers. - Validate stored config at runtime: data/caddy/caddy.json (if persisted) vs live admin API to see if TLS automation policies or certificates are missing. - Verify entrypoint sequencing: [docker-entrypoint.sh](docker-entrypoint.sh) seeds empty Caddy config then relies on charon to push config; ensure ApplyConfig runs before first request. Phase 3 — Plan fixes for TLS/HTTPS reachability - Add IP-aware TLS handling in [backend/internal/caddy/config.go](backend/internal/caddy/config.go): detect hosts whose DomainNames are IPs; for those, set explicit HTTP listener only or `tls internal` to avoid failed ACME, and skip AutoHTTPS redirect for IP-only sites. - Add guardrails/tests: extend [backend/internal/caddy/config_test.go](backend/internal/caddy/config_test.go) with a table case for IP hosts (expects HTTP route present, no AutoHTTPS redirect, optional internal TLS when requested). - If admin UI also rides on :443, consider a fallback self-signed cert for bare IP by injecting a static certificate loader (same file) or disabling redirect when no hostname SNI is present. - Re-apply config through [backend/internal/caddy/manager.go](backend/internal/caddy/manager.go) and confirm via admin API; ensure rollback still works if validation fails. Phase 4 — Diagnose login failures on IP:port - Backend cookie issuance: [backend/internal/api/handlers/auth_handler.go](backend/internal/api/handlers/auth_handler.go) `setSecureCookie` forces `Secure` when CHARON_ENV=production; on HTTP/IP this prevents cookie storage → follow-up /auth/me returns 401, surfaced as “Login failed/invalid credentials”. - Request-aware secure flag: derive `Secure` from request scheme or `X-Forwarded-Proto`, and relax SameSite to Lax for forward_auth flows; keep Strict for HTTPS hostnames. - Auth flow: [backend/internal/services/auth_service.go](backend/internal/services/auth_service.go) handles credentials; [backend/internal/api/middleware/auth.go](backend/internal/api/middleware/auth.go) accepts cookie/Authorization/query token. Ensure fallback to Authorization header using login response token when cookie is absent (IP/HTTP). - Frontend: [frontend/src/api/client.ts](frontend/src/api/client.ts) uses withCredentials; [frontend/src/pages/Login.tsx](frontend/src/pages/Login.tsx) currently ignores returned token. Add optional storage/Authorization injection when cookie not set (feature-flagged), and surface clearer error when /auth/me fails post-login. - Security headers: review [backend/internal/api/middleware/security_headers.go](backend/internal/api/middleware/security_headers.go) (HSTS/CSP) to ensure HTTP over IP is not force-upgraded to HTTPS unexpectedly during troubleshooting. Phase 5 — Validation & Regression - Unit tests: add table-driven cases for setSecureCookie in auth handler (HTTP vs HTTPS, IP vs hostname) and AuthMiddleware behavior when token is supplied via header instead of cookie. - Caddy config tests: ensure IP host generation passes validation and does not emit duplicate routes or ghost hosts. - Frontend tests: extend [frontend/src/pages/__tests__/Login.test.tsx](frontend/src/pages/__tests__/Login.test.tsx) to cover the no-cookie fallback path. - Manual: rerun "Go: Build Backend", `npm run build`, task "Build & Run Local Docker", then verify login via IP:8080 and HTTPS domain, and re-run a narrow Caddy integration test if available (e.g., "Coraza: Run Integration Go Test"). Phase 6 — Hygiene (.gitignore / .dockerignore / .codecov.yml / Dockerfile) - .gitignore: add frontend/.cache, frontend/.eslintcache, data/geoip/ (downloaded in Dockerfile), and backend/.vscode/ if it appears locally. - .dockerignore: mirror the new ignores (frontend/.cache, frontend/.eslintcache, data/geoip/) to keep context slim; keep docs exclusions as-is. - .codecov.yml: reconsider excluding backend/cmd/api/** if we touch startup or ApplyConfig wiring so coverage reflects new logic. - Dockerfile: after TLS/login fixes, assess adding a healthcheck or a post-start verification curl to :2019 and :8080; keep current multi-stage caching intact. Exit Criteria - Proxy hosts and admin UI respond over HTTP/HTTPS without ERR_SSL_PROTOCOL_ERROR; TLS handshake succeeds for domain hosts, HTTP works for IP-only access. - Login succeeds via IP:port and via domain/HTTPS; cookies or header-based fallback maintain session across /auth/me. - Updated ignore lists prevent new artifacts from leaking; coverage targets remain achievable after test additions. Build Failure & Security Scan Battle Plan ========================================= Phasing principle: collapse the effort into the fewest high-signal requests by batching commands (backend + frontend + container + scans) and only re-running the narrowest slice after each fix. Keep evidence artifacts for every step. Phase 1 — Reproduce and Capture the Failure (single pass) - Run the workspace tasks in this order to get a complete signal stack: "Go: Build Backend", then "Frontend: Type Check", then `npm run build` inside frontend (captures Vite/React errors near [frontend/src/main.tsx](frontend/src/main.tsx) and `App`), then "Build & Run Local Docker" to surface multi-stage Dockerfile issues. - Preserve raw outputs to `logs/build/`: backend (`backend/build.log`), frontend (`frontend/build.log`), docker (`docker/build.log`). If a stage fails, stop and annotate the failing command, module, and package. - If Docker fails before build, try `docker build --progress=plain --no-cache` once to expose failing layer context (Caddy build, Golang, or npm). Keep the resulting layer logs. Phase 2 — Backend Compilation & Test Rehab (one request) - Inspect error stack for the Go layer; focus on imports and CGO flags in [backend/cmd/api/main.go](backend/cmd/api/main.go) and router bootstrap [backend/internal/server/server.go](backend/internal/server/server.go). - If module resolution fails, run "Go: Mod Tidy (Backend)" once, then re-run "Go: Build Backend"; avoid extra tidies to limit churn. - If CGO/SQLite headers are missing, verify `apk add --no-cache gcc musl-dev sqlite-dev` step in Dockerfile backend-builder stage; mirror locally via `apk add` or `sudo apt-get` equivalents depending on host env. - Run "Go: Test Backend" (or narrower `go test ./internal/...` if failure is localized) to ensure handlers (e.g., `routes.Register`, `handlers.CheckMountedImport`) still compile after fixes; capture coverage deltas if touched. Phase 3 — Frontend Build & Type Discipline (one request) - If type-check passes but build fails, inspect Vite config and rollup native skip flags in Dockerfile frontend-builder; cross-check `npm_config_rollup_skip_nodejs_native` and `ROLLUP_SKIP_NODEJS_NATIVE` envs. - Validate entry composition in [frontend/src/main.tsx](frontend/src/main.tsx) and any failing component stack (e.g., `ThemeProvider`, `App`). Run `npm run lint -- --fix` only after root cause is understood to avoid masking errors. - Re-run `npm run build` only after code fixes; stash bundle warnings for later size/security audits. Phase 4 — Container Build Reliability (one request) - Reproduce Docker failure with `--progress=plain`; pinpoint failing stage: `frontend-builder` (npm ci/build), `backend-builder` (xx-go build of `cmd/api`), or `caddy-builder` (xcaddy patch loop). - If failure is in Caddy patch block, test with a narrowed build arg (e.g., `--build-arg CADDY_VERSION=2.10.2`) and confirm the fallback path works. Consider pinning quic-go/expr/smallstep versions if Renovate lagged. - Verify entrypoint expectations in [docker-entrypoint.sh](docker-entrypoint.sh) align with built assets (`/app/frontend/dist`, `/app/charon`). Ensure symlink `cpmp` creation does not fail when `/app` is read-only. Phase 5 — CodeQL Scan & Triage (single run, then focused reruns) - Execute "Run CodeQL Scan (Local)" task once the code builds. Preserve SARIF to `codeql-agent-results/` and convert critical findings into issues. - Triage hotspots: server middleware (`RequestID`, `RequestLogger`, `Recovery`), auth handlers under `internal/api/handlers`, and config loader `internal/config`. Prioritize SQL injections, path traversal in `handlers.CheckMountedImport`, and logging of secrets. - After fixes, re-run only the affected language pack (Go or JS) to minimize cycle time; attach SARIF diff to the plan. Phase 6 — Trivy Image Scan & Triage (single run) - After a successful Docker build (`charon:local`), run "Run Trivy Scan (Local)". Persist report in `.trivy_logs/trivy-report.txt` (already ignored). - Bucket findings: base image vulns (alpine), Caddy plugins, CrowdSec bundle, Go binary CVEs. Cross-check with Dockerfile upgrade levers (`CADDY_VERSION`, `CROWDSEC_VERSION`, `golang:1.25.5-alpine`). - For OS-level CVEs, prefer `apk --no-cache upgrade` (already present) and version bumps; for Go deps, adjust go.mod and rebuild. Phase 7 — Coverage & Quality Gates - Ensure Codecov target (85%) still reachable; if exclusions are too broad (e.g., entire `backend/cmd/api`), reassess in [.codecov.yml](.codecov.yml) after fixes to keep new logic covered. - If new backend logic lands in handlers or middleware, add table-driven tests under `backend/internal/api/...` to keep coverage from regressing. Phase 8 — Hygiene Checks (.gitignore, .dockerignore, Dockerfile, Codecov) - .gitignore: consider adding `frontend/.cache/` and `backend/.vscode/` artifacts if they appear during debugging; keep `.trivy_logs/` already present. - .dockerignore: keep build context lean; add `frontend/.cache/`, `backend/.vscode/`; `codeql-results*.sarif` is already excluded. Ensure `docs/` exclusion is acceptable (only README/CONTRIBUTING/LICENSE kept) so Docker builds stay small. - .codecov.yml: exclusions already cover e2e/integration and configs; if we add security helpers, avoid excluding them to keep visibility. Review whether ignoring `backend/cmd/api/**` is desired; we may want to include it if main wiring changes. - Dockerfile: if builds fail due to xcaddy patch drift, add guard logs or split the patch block into a script under `scripts/` for clearer diffing. Consider caching npm and go modules via `--mount=type=cache` already present; avoid expanding build args further to limit attack surface. Exit Criteria - All four commands succeed in sequence: "Go: Build Backend", `npm run build`, `docker build` (local multi-stage), "Run CodeQL Scan (Local)", and "Run Trivy Scan (Local)" on `charon:local`. - Logs captured and linked; actionable items opened for any CodeQL/Trivy HIGH/CRITICAL. - No new untracked artifacts thanks to updated ignore lists.