Refactor security headers workflow and improve user feedback
- Removed the Badge component displaying preset type in SecurityHeaders.tsx for a cleaner UI. - Added detailed analysis for the "Apply Preset" workflow, highlighting user confusion and root causes. - Proposed fixes to enhance user experience, including clearer toast messages, loading indicators, and better naming for profile sections. - Documented the complete workflow trace for applying security header presets, emphasizing the need for per-host assignment.
This commit is contained in:
@@ -1,124 +0,0 @@
|
||||
Proxy TLS & IP Login Recovery Plan
|
||||
==================================
|
||||
|
||||
Context
|
||||
|
||||
- Proxy hosts return ERR_SSL_PROTOCOL_ERROR after container build succeeds; TLS handshake likely broken in generated Caddy config or certificate provisioning.
|
||||
- Charon login fails with “invalid credentials” when UI is accessed via raw IP/port; likely cookie or header handling across HTTP/non-SNI scenarios.
|
||||
- Security scans can wait until connectivity and login paths are stable.
|
||||
|
||||
Goals
|
||||
|
||||
- Restore HTTPS/HTTP reachability for proxy hosts and admin UI without TLS protocol errors.
|
||||
- Make login succeed when using IP:port access while preserving secure defaults for domain-based HTTPS.
|
||||
- Keep changes minimal per request; batch verification runs.
|
||||
|
||||
Phase 1 — Fast Repro & Evidence (single command batch)
|
||||
|
||||
- Build is running remotely; use the deployed host [http://100.98.12.109:8080](http://100.98.12.109:8080) (not localhost) for repro. If HTTPS is exposed, also probe [https://100.98.12.109](https://100.98.12.109).
|
||||
- Capture logs remotely: docker logs (Caddy + Charon) to logs/build/proxy-ssl.log and logs/build/login-ip.log on the remote node.
|
||||
- From the remote container, fetch live Caddy config: curl [http://127.0.0.1:2019/config](http://127.0.0.1:2019/config) > logs/build/caddy-live.json.
|
||||
- Snapshot TLS handshake from a reachable vantage point: openssl s_client -connect 100.98.12.109:443 -servername {first-proxy-domain} -tls1_2 to capture protocol/alert.
|
||||
|
||||
Phase 2 — Diagnose ERR_SSL_PROTOCOL_ERROR in Caddy pipeline
|
||||
|
||||
- Inspect generation path: [backend/internal/caddy/manager.go](backend/internal/caddy/manager.go) ApplyConfig → GenerateConfig; ensure ACME email/provider/flags are loaded from settings.
|
||||
- Review server wiring: [backend/internal/caddy/config.go](backend/internal/caddy/config.go) sets servers to listen on :80/:443 with AutoHTTPS enabled. Check whether hosts with IP literals are being treated like domain names (Caddy cannot issue ACME for IP; may yield protocol alerts).
|
||||
- Inspect per-host TLS inputs: models.ProxyHost.CertificateID/Certificate.Provider (custom vs ACME), DomainNames normalization, and AdvancedConfig WAF handlers that might inject broken handlers.
|
||||
- Validate stored config at runtime: data/caddy/caddy.json (if persisted) vs live admin API to see if TLS automation policies or certificates are missing.
|
||||
- Verify entrypoint sequencing: [docker-entrypoint.sh](docker-entrypoint.sh) seeds empty Caddy config then relies on charon to push config; ensure ApplyConfig runs before first request.
|
||||
|
||||
Phase 3 — Plan fixes for TLS/HTTPS reachability
|
||||
|
||||
- Add IP-aware TLS handling in [backend/internal/caddy/config.go](backend/internal/caddy/config.go): detect hosts whose DomainNames are IPs; for those, set explicit HTTP listener only or `tls internal` to avoid failed ACME, and skip AutoHTTPS redirect for IP-only sites.
|
||||
- Add guardrails/tests: extend [backend/internal/caddy/config_test.go](backend/internal/caddy/config_test.go) with a table case for IP hosts (expects HTTP route present, no AutoHTTPS redirect, optional internal TLS when requested).
|
||||
- If admin UI also rides on :443, consider a fallback self-signed cert for bare IP by injecting a static certificate loader (same file) or disabling redirect when no hostname SNI is present.
|
||||
- Re-apply config through [backend/internal/caddy/manager.go](backend/internal/caddy/manager.go) and confirm via admin API; ensure rollback still works if validation fails.
|
||||
|
||||
Phase 4 — Diagnose login failures on IP:port
|
||||
|
||||
- Backend cookie issuance: [backend/internal/api/handlers/auth_handler.go](backend/internal/api/handlers/auth_handler.go) `setSecureCookie` forces `Secure` when CHARON_ENV=production; on HTTP/IP this prevents cookie storage → follow-up /auth/me returns 401, surfaced as “Login failed/invalid credentials”.
|
||||
- Request-aware secure flag: derive `Secure` from request scheme or `X-Forwarded-Proto`, and relax SameSite to Lax for forward_auth flows; keep Strict for HTTPS hostnames.
|
||||
- Auth flow: [backend/internal/services/auth_service.go](backend/internal/services/auth_service.go) handles credentials; [backend/internal/api/middleware/auth.go](backend/internal/api/middleware/auth.go) accepts cookie/Authorization/query token. Ensure fallback to Authorization header using login response token when cookie is absent (IP/HTTP).
|
||||
- Frontend: [frontend/src/api/client.ts](frontend/src/api/client.ts) uses withCredentials; [frontend/src/pages/Login.tsx](frontend/src/pages/Login.tsx) currently ignores returned token. Add optional storage/Authorization injection when cookie not set (feature-flagged), and surface clearer error when /auth/me fails post-login.
|
||||
- Security headers: review [backend/internal/api/middleware/security_headers.go](backend/internal/api/middleware/security_headers.go) (HSTS/CSP) to ensure HTTP over IP is not force-upgraded to HTTPS unexpectedly during troubleshooting.
|
||||
|
||||
Phase 5 — Validation & Regression
|
||||
|
||||
- Unit tests: add table-driven cases for setSecureCookie in auth handler (HTTP vs HTTPS, IP vs hostname) and AuthMiddleware behavior when token is supplied via header instead of cookie.
|
||||
- Caddy config tests: ensure IP host generation passes validation and does not emit duplicate routes or ghost hosts.
|
||||
- Frontend tests: extend [frontend/src/pages/__tests__/Login.test.tsx](frontend/src/pages/__tests__/Login.test.tsx) to cover the no-cookie fallback path.
|
||||
- Manual: rerun "Go: Build Backend", `npm run build`, task "Build & Run Local Docker", then verify login via IP:8080 and HTTPS domain, and re-run a narrow Caddy integration test if available (e.g., "Coraza: Run Integration Go Test").
|
||||
|
||||
Phase 6 — Hygiene (.gitignore / .dockerignore / .codecov.yml / Dockerfile)
|
||||
|
||||
- .gitignore: add frontend/.cache, frontend/.eslintcache, data/geoip/ (downloaded in Dockerfile), and backend/.vscode/ if it appears locally.
|
||||
- .dockerignore: mirror the new ignores (frontend/.cache, frontend/.eslintcache, data/geoip/) to keep context slim; keep docs exclusions as-is.
|
||||
- .codecov.yml: reconsider excluding backend/cmd/api/** if we touch startup or ApplyConfig wiring so coverage reflects new logic.
|
||||
- Dockerfile: after TLS/login fixes, assess adding a healthcheck or a post-start verification curl to :2019 and :8080; keep current multi-stage caching intact.
|
||||
|
||||
Exit Criteria
|
||||
|
||||
- Proxy hosts and admin UI respond over HTTP/HTTPS without ERR_SSL_PROTOCOL_ERROR; TLS handshake succeeds for domain hosts, HTTP works for IP-only access.
|
||||
- Login succeeds via IP:port and via domain/HTTPS; cookies or header-based fallback maintain session across /auth/me.
|
||||
- Updated ignore lists prevent new artifacts from leaking; coverage targets remain achievable after test additions.
|
||||
|
||||
Build Failure & Security Scan Battle Plan
|
||||
=========================================
|
||||
|
||||
Phasing principle: collapse the effort into the fewest high-signal requests by batching commands (backend + frontend + container + scans) and only re-running the narrowest slice after each fix. Keep evidence artifacts for every step.
|
||||
|
||||
Phase 1 — Reproduce and Capture the Failure (single pass)
|
||||
|
||||
- Run the workspace tasks in this order to get a complete signal stack: "Go: Build Backend", then "Frontend: Type Check", then `npm run build` inside frontend (captures Vite/React errors near [frontend/src/main.tsx](frontend/src/main.tsx) and `App`), then "Build & Run Local Docker" to surface multi-stage Dockerfile issues.
|
||||
- Preserve raw outputs to `logs/build/`: backend (`backend/build.log`), frontend (`frontend/build.log`), docker (`docker/build.log`). If a stage fails, stop and annotate the failing command, module, and package.
|
||||
- If Docker fails before build, try `docker build --progress=plain --no-cache` once to expose failing layer context (Caddy build, Golang, or npm). Keep the resulting layer logs.
|
||||
|
||||
Phase 2 — Backend Compilation & Test Rehab (one request)
|
||||
|
||||
- Inspect error stack for the Go layer; focus on imports and CGO flags in [backend/cmd/api/main.go](backend/cmd/api/main.go) and router bootstrap [backend/internal/server/server.go](backend/internal/server/server.go).
|
||||
- If module resolution fails, run "Go: Mod Tidy (Backend)" once, then re-run "Go: Build Backend"; avoid extra tidies to limit churn.
|
||||
- If CGO/SQLite headers are missing, verify `apk add --no-cache gcc musl-dev sqlite-dev` step in Dockerfile backend-builder stage; mirror locally via `apk add` or `sudo apt-get` equivalents depending on host env.
|
||||
- Run "Go: Test Backend" (or narrower `go test ./internal/...` if failure is localized) to ensure handlers (e.g., `routes.Register`, `handlers.CheckMountedImport`) still compile after fixes; capture coverage deltas if touched.
|
||||
|
||||
Phase 3 — Frontend Build & Type Discipline (one request)
|
||||
|
||||
- If type-check passes but build fails, inspect Vite config and rollup native skip flags in Dockerfile frontend-builder; cross-check `npm_config_rollup_skip_nodejs_native` and `ROLLUP_SKIP_NODEJS_NATIVE` envs.
|
||||
- Validate entry composition in [frontend/src/main.tsx](frontend/src/main.tsx) and any failing component stack (e.g., `ThemeProvider`, `App`). Run `npm run lint -- --fix` only after root cause is understood to avoid masking errors.
|
||||
- Re-run `npm run build` only after code fixes; stash bundle warnings for later size/security audits.
|
||||
|
||||
Phase 4 — Container Build Reliability (one request)
|
||||
|
||||
- Reproduce Docker failure with `--progress=plain`; pinpoint failing stage: `frontend-builder` (npm ci/build), `backend-builder` (xx-go build of `cmd/api`), or `caddy-builder` (xcaddy patch loop).
|
||||
- If failure is in Caddy patch block, test with a narrowed build arg (e.g., `--build-arg CADDY_VERSION=2.10.2`) and confirm the fallback path works. Consider pinning quic-go/expr/smallstep versions if Renovate lagged.
|
||||
- Verify entrypoint expectations in [docker-entrypoint.sh](docker-entrypoint.sh) align with built assets (`/app/frontend/dist`, `/app/charon`). Ensure symlink `cpmp` creation does not fail when `/app` is read-only.
|
||||
|
||||
Phase 5 — CodeQL Scan & Triage (single run, then focused reruns)
|
||||
|
||||
- Execute "Run CodeQL Scan (Local)" task once the code builds. Preserve SARIF to `codeql-agent-results/` and convert critical findings into issues.
|
||||
- Triage hotspots: server middleware (`RequestID`, `RequestLogger`, `Recovery`), auth handlers under `internal/api/handlers`, and config loader `internal/config`. Prioritize SQL injections, path traversal in `handlers.CheckMountedImport`, and logging of secrets.
|
||||
- After fixes, re-run only the affected language pack (Go or JS) to minimize cycle time; attach SARIF diff to the plan.
|
||||
|
||||
Phase 6 — Trivy Image Scan & Triage (single run)
|
||||
|
||||
- After a successful Docker build (`charon:local`), run "Run Trivy Scan (Local)". Persist report in `.trivy_logs/trivy-report.txt` (already ignored).
|
||||
- Bucket findings: base image vulns (alpine), Caddy plugins, CrowdSec bundle, Go binary CVEs. Cross-check with Dockerfile upgrade levers (`CADDY_VERSION`, `CROWDSEC_VERSION`, `golang:1.25.5-alpine`).
|
||||
- For OS-level CVEs, prefer `apk --no-cache upgrade` (already present) and version bumps; for Go deps, adjust go.mod and rebuild.
|
||||
|
||||
Phase 7 — Coverage & Quality Gates
|
||||
|
||||
- Ensure Codecov target (85%) still reachable; if exclusions are too broad (e.g., entire `backend/cmd/api`), reassess in [.codecov.yml](.codecov.yml) after fixes to keep new logic covered.
|
||||
- If new backend logic lands in handlers or middleware, add table-driven tests under `backend/internal/api/...` to keep coverage from regressing.
|
||||
|
||||
Phase 8 — Hygiene Checks (.gitignore, .dockerignore, Dockerfile, Codecov)
|
||||
|
||||
- .gitignore: consider adding `frontend/.cache/` and `backend/.vscode/` artifacts if they appear during debugging; keep `.trivy_logs/` already present.
|
||||
- .dockerignore: keep build context lean; add `frontend/.cache/`, `backend/.vscode/`; `codeql-results*.sarif` is already excluded. Ensure `docs/` exclusion is acceptable (only README/CONTRIBUTING/LICENSE kept) so Docker builds stay small.
|
||||
- .codecov.yml: exclusions already cover e2e/integration and configs; if we add security helpers, avoid excluding them to keep visibility. Review whether ignoring `backend/cmd/api/**` is desired; we may want to include it if main wiring changes.
|
||||
- Dockerfile: if builds fail due to xcaddy patch drift, add guard logs or split the patch block into a script under `scripts/` for clearer diffing. Consider caching npm and go modules via `--mount=type=cache` already present; avoid expanding build args further to limit attack surface.
|
||||
|
||||
Exit Criteria
|
||||
|
||||
- All four commands succeed in sequence: "Go: Build Backend", `npm run build`, `docker build` (local multi-stage), "Run CodeQL Scan (Local)", and "Run Trivy Scan (Local)" on `charon:local`.
|
||||
- Logs captured and linked; actionable items opened for any CodeQL/Trivy HIGH/CRITICAL.
|
||||
- No new untracked artifacts thanks to updated ignore lists.
|
||||
Reference in New Issue
Block a user