# Security Dashboard Live Logs - Complete Trace Analysis **Date:** December 16, 2025 **Status:** ✅ ALL ISSUES FIXED & VERIFIED **Severity:** Was Critical (WebSocket reconnection loop) → Now Resolved --- ## 0. FULL TRACE ANALYSIS ### File-by-File Data Flow | Step | File | Lines | Purpose | Status | |------|------|-------|---------|--------| | 1 | `frontend/src/pages/Security.tsx` | 36, 421 | Renders LiveLogViewer with memoized filters | ✅ Fixed | | 2 | `frontend/src/components/LiveLogViewer.tsx` | 138-143, 183-268 | Manages WebSocket lifecycle in useEffect | ✅ Fixed | | 3 | `frontend/src/api/logs.ts` | 177-237 | `connectSecurityLogs()` - builds WS URL with auth | ✅ Working | | 4 | `backend/internal/api/routes/routes.go` | 373-394 | Registers `/cerberus/logs/ws` in protected group | ✅ Working | | 5 | `backend/internal/api/middleware/auth.go` | 12-39 | Validates JWT from header/cookie/query param | ✅ Working | | 6 | `backend/internal/api/handlers/cerberus_logs_ws.go` | 27-120 | WebSocket handler with filter parsing | ✅ Working | | 7 | `backend/internal/services/log_watcher.go` | 44-237 | Tails Caddy access log, broadcasts to subscribers | ✅ Working | ### Authentication Flow ```text Frontend Backend ──────── ─────── localStorage.getItem('charon_auth_token') │ ▼ Query param: ?token= ────────► AuthMiddleware: 1. Check Authorization header 2. Check auth_token cookie 3. Check token query param ◄── MATCHES │ ▼ ValidateToken(jwt) → OK │ ▼ Upgrade to WebSocket ``` ### Logic Gap Analysis **ANSWER: NO - There is NO logic gap between Frontend and Backend.** | Question | Answer | |----------|--------| | Frontend auth method | Query param `?token=` from `localStorage.getItem('charon_auth_token')` | | Backend auth method | Accepts: Header → Cookie → Query param `token` ✅ | | Filter params | Both use `source`, `level`, `ip`, `host`, `blocked_only` ✅ | | Data format | `SecurityLogEntry` struct matches frontend TypeScript type ✅ | --- ## 1. VERIFICATION STATUS ### ✅ localStorage Key IS Correct Both WebSocket functions in `frontend/src/api/logs.ts` correctly use `charon_auth_token`: - **Line 119-122** (`connectLiveLogs`): `localStorage.getItem('charon_auth_token')` - **Line 178-181** (`connectSecurityLogs`): `localStorage.getItem('charon_auth_token')` --- ## 2. ALL ISSUES FOUND (NOW FIXED) ### Issue #1: CRITICAL - Object Reference Instability in Props (ROOT CAUSE) ✅ FIXED **Problem:** `Security.tsx` passed `securityFilters={{}}` inline, creating a new object on every render. This triggered useEffect cleanup/reconnection on every parent re-render. **Fix Applied:** ```tsx // frontend/src/pages/Security.tsx line 36 const emptySecurityFilters = useMemo(() => ({}), []) // frontend/src/pages/Security.tsx line 421 ``` ### Issue #2: Default Props Had Same Problem ✅ FIXED **Problem:** Default empty objects `filters = {}` in function params created new objects on each call. **Fix Applied:** ```typescript // frontend/src/components/LiveLogViewer.tsx lines 138-143 const EMPTY_LIVE_FILTER: LiveLogFilter = {}; const EMPTY_SECURITY_FILTER: SecurityLogFilter = {}; export function LiveLogViewer({ filters = EMPTY_LIVE_FILTER, securityFilters = EMPTY_SECURITY_FILTER, // ... }) ``` ### Issue #3: `showBlockedOnly` Toggle (INTENTIONAL) The `showBlockedOnly` state in useEffect dependencies causes reconnection when toggled. This is **intentional** for server-side filtering - not a bug. --- ## 3. ROOT CAUSE ANALYSIS ### The Reconnection Loop (Before Fix) 1. User navigates to Security Dashboard 2. `Security.tsx` renders with `` 3. `LiveLogViewer` mounts → useEffect runs → WebSocket connects 4. React Query refetches security status 5. `Security.tsx` re-renders → **new `{}` object created** 6. `LiveLogViewer` re-renders → useEffect sees "changed" `securityFilters` 7. useEffect cleanup runs → **WebSocket closes** 8. useEffect body runs → **WebSocket opens** 9. Repeat steps 4-8 every ~100ms ### Evidence from Docker Logs (Before Fix) ```text {"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"xxx"} {"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"xxx"} {"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"yyy"} {"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"yyy"} ``` --- ## 4. COMPONENT DEEP DIVE ### Frontend: Security.tsx - Renders the Security Dashboard with 4 security layer cards (CrowdSec, ACL, Coraza, Rate Limiting) - Contains multiple `useQuery`/`useMutation` hooks that trigger re-renders - **Line 36:** Creates stable filter reference with `useMemo` - **Line 421:** Passes stable reference to `LiveLogViewer` ### Frontend: LiveLogViewer.tsx - Dual-mode log viewer (application logs vs security logs) - **Lines 138-139:** Stable default filter objects defined outside component - **Lines 183-268:** useEffect that manages WebSocket lifecycle - **Line 268:** Dependencies: `[currentMode, filters, securityFilters, maxLogs, showBlockedOnly]` - Uses `isPausedRef` to avoid reconnection when pausing ### Frontend: logs.ts (API Client) - **`connectSecurityLogs()`** (lines 177-237): - Builds URLSearchParams from filter object - Gets auth token from `localStorage.getItem('charon_auth_token')` - Appends token as query param - Constructs URL: `wss://host/api/v1/cerberus/logs/ws?...&token=` ### Backend: routes.go - **Line 380-389:** Creates LogWatcher service pointing to `/var/log/caddy/access.log` - **Line 393:** Creates `CerberusLogsHandler` - **Line 394:** Registers route in protected group (auth required) ### Backend: auth.go (Middleware) - **Lines 14-28:** Auth flow: Header → Cookie → Query param - **Line 25-28:** Query param fallback: `if token := c.Query("token"); token != ""` - WebSocket connections use query param auth (browsers can't set headers on WS) ### Backend: cerberus_logs_ws.go (Handler) - **Lines 42-48:** Upgrades HTTP to WebSocket - **Lines 53-59:** Parses filter query params - **Lines 61-62:** Subscribes to LogWatcher - **Lines 80-109:** Main loop broadcasting filtered entries ### Backend: log_watcher.go (Service) - Singleton service tailing Caddy access log - Parses JSON log lines into `SecurityLogEntry` - Broadcasts to all WebSocket subscribers - Detects security events (WAF, CrowdSec, ACL, rate limit) --- ## 5. SUMMARY TABLE | Component | Status | Notes | |-----------|--------|-------| | localStorage key | ✅ Fixed | Now uses `charon_auth_token` | | Auth middleware | ✅ Working | Accepts query param `token` | | WebSocket endpoint | ✅ Working | Protected route, upgrades correctly | | LogWatcher service | ✅ Working | Tails access.log successfully | | **Frontend memoization** | ✅ Fixed | `useMemo` in Security.tsx | | **Stable default props** | ✅ Fixed | Constants in LiveLogViewer.tsx | --- ## 6. VERIFICATION STEPS After any changes, verify with: ```bash # 1. Rebuild and restart docker build -t charon:local . && docker compose -f docker-compose.override.yml up -d # 2. Check for stable connection (should see ONE connect, no rapid cycling) docker logs charon 2>&1 | grep -i "cerberus.*websocket" | tail -10 # 3. Browser DevTools → Console # Should see: "Cerberus logs WebSocket connection established" # Should NOT see repeated connection attempts ``` --- ## 7. CONCLUSION **Root Cause:** React reference instability (`{}` creates new object on every render) **Solution Applied:** Memoize filter objects to maintain stable references **Logic Gap Between Frontend/Backend:** **NO** - Both are correctly aligned **Current Status:** ✅ All fixes applied and working --- # Health Check 401 Auth Failures - Investigation Report **Date:** December 16, 2025 **Status:** ✅ ANALYZED - NOT A BUG **Severity:** Informational (Log Noise) --- ## 1. INVESTIGATION SUMMARY ### What the User Observed The user reported recurring 401 auth failures in Docker logs: ``` 01:03:10 AUTH 172.20.0.1 GET / → 401 [401] 133.6ms { "auth_failure": true } 01:04:10 AUTH 172.20.0.1 GET / → 401 [401] 112.9ms { "auth_failure": true } ``` ### Initial Hypothesis vs Reality | Hypothesis | Reality | |------------|---------| | Docker health check hitting `/` | ❌ Docker health check hits `/api/v1/health` and works correctly (200) | | Charon backend auth issue | ❌ Charon backend auth is working fine | | Missing health endpoint | ❌ `/api/v1/health` exists and is public | --- ## 2. ROOT CAUSE IDENTIFIED ### The 401s are FROM Plex, NOT Charon **Evidence from logs:** ```json { "host": "plex.hatfieldhosted.com", "uri": "/", "status": 401, "resp_headers": { "X-Plex-Protocol": ["1.0"], "X-Plex-Content-Compressed-Length": ["157"], "Cache-Control": ["no-cache"] } } ``` The 401 responses contain **Plex-specific headers** (`X-Plex-Protocol`, `X-Plex-Content-Compressed-Length`). This proves: 1. The request goes through Caddy to **Plex backend** 2. **Plex** returns 401 because the request has no auth token 3. Caddy logs this as a handled request ### What's Making These Requests? **Charon's Uptime Monitoring Service** (`backend/internal/services/uptime_service.go`) The `checkMonitor()` function performs HTTP GET requests to proxied hosts: ```go case "http", "https": client := http.Client{Timeout: 10 * time.Second} resp, err := client.Get(monitor.URL) // e.g., https://plex.hatfieldhosted.com/ ``` Key behaviors: - Runs every 60 seconds (`interval: 60`) - Checks the **public URL** of each proxy host - Uses `Go-http-client/2.0` User-Agent (visible in logs) - **Correctly treats 401/403 as "service is up"** (lines 471-474 of uptime_service.go) --- ## 3. ARCHITECTURE FLOW ```text ┌─────────────────────────────────────────────────────────────┐ │ Charon Container (172.20.0.1 from Docker's perspective) │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────┐ │ │ │ Uptime Service │ │ │ │ (Go-http-client/2.0)│ │ │ └──────────┬──────────┘ │ │ │ GET https://plex.hatfieldhosted.com/ │ │ ▼ │ │ ┌─────────────────────┐ │ │ │ Caddy Reverse Proxy │ │ │ │ (ports 80/443) │ │ │ └──────────┬──────────┘ │ │ │ Logs request to access.log │ └─────────────┼───────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Plex Container (172.20.0.x) │ ├─────────────────────────────────────────────────────────────┤ │ GET / → 401 Unauthorized (no X-Plex-Token) │ └─────────────────────────────────────────────────────────────┘ ``` --- ## 4. DOCKER HEALTH CHECK STATUS ### ✅ Docker Health Check is WORKING CORRECTLY **Configuration** (from all docker-compose files): ```yaml healthcheck: test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s ``` **Evidence:** ``` [GIN] 2025/12/16 - 01:04:45 | 200 | 304.212µs | ::1 | GET "/api/v1/health" ``` - Hits `/api/v1/health` (not `/`) - Returns `200` (not `401`) - Source IP is `::1` (localhost) - Interval is 30s (matches config) ### Health Endpoint Details **Route Registration** ([routes.go#L86](backend/internal/api/routes/routes.go#L86)): ```go router.GET("/api/v1/health", handlers.HealthHandler) ``` This is registered **before** any auth middleware, making it a public endpoint. **Handler Response** ([health_handler.go#L29-L37](backend/internal/api/handlers/health_handler.go#L29-L37)): ```go func HealthHandler(c *gin.Context) { c.JSON(http.StatusOK, gin.H{ "status": "ok", "service": version.Name, "version": version.Version, "git_commit": version.GitCommit, "build_time": version.BuildTime, "internal_ip": getLocalIP(), }) } ``` --- ## 5. WHY THIS IS NOT A BUG ### Uptime Service Design is Correct From [uptime_service.go#L471-L474](backend/internal/services/uptime_service.go#L471-L474): ```go // Accept 2xx, 3xx, and 401/403 (Unauthorized/Forbidden often means the service is up but protected) if (resp.StatusCode >= 200 && resp.StatusCode < 400) || resp.StatusCode == 401 || resp.StatusCode == 403 { success = true msg = fmt.Sprintf("HTTP %d", resp.StatusCode) } ``` **Rationale:** A 401 response proves: - The service is running - The network path is functional - The application is responding This is industry-standard practice for uptime monitoring of auth-protected services. --- ## 6. RECOMMENDATIONS ### Option A: Do Nothing (Recommended) The current behavior is correct: - Docker health checks work ✅ - Uptime monitoring works ✅ - Plex is correctly marked as "up" despite 401 ✅ The 401s in Caddy access logs are informational noise, not errors. ### Option B: Reduce Log Verbosity (Optional) If the log noise is undesirable, options include: 1. **Configure Caddy to not log uptime checks:** Add a log filter for `Go-http-client` User-Agent 2. **Use backend health endpoints:** Some services like Plex have health endpoints (`/identity`, `/status`) that don't require auth 3. **Add per-monitor health path option:** Extend `UptimeMonitor` model to allow custom health check paths ### Option C: Already Implemented The Uptime Service already logs status changes only, not every check: ```go if statusChanged { logger.Log().WithFields(map[string]interface{}{ "host_name": host.Name, // ... }).Info("Host status changed") } ``` --- ## 7. SUMMARY TABLE | Question | Answer | |----------|--------| | What is making the requests? | Charon's Uptime Service (`Go-http-client/2.0`) | | Should `/` be accessible without auth? | N/A - this is hitting proxied backends, not Charon | | Is there a dedicated health endpoint? | Yes: `/api/v1/health` (public, returns 200) | | Is Docker health check working? | ✅ Yes, every 30s, returns 200 | | Are the 401s a bug? | ❌ No, they're expected from auth-protected backends | | What's the fix? | None needed - working as designed | --- ## 8. CONCLUSION **The 401s are NOT from Docker health checks or Charon auth failures.** They are normal responses from **auth-protected backend services** (like Plex) being monitored by Charon's uptime service. The uptime service correctly interprets 401/403 as "service is up but requires authentication." **No fix required.** The system is working as designed.