16 KiB
Security Dashboard Live Logs - Complete Trace Analysis
Date: December 16, 2025 Status: ✅ ALL ISSUES FIXED & VERIFIED Severity: Was Critical (WebSocket reconnection loop) → Now Resolved
0. FULL TRACE ANALYSIS
File-by-File Data Flow
| Step | File | Lines | Purpose | Status |
|---|---|---|---|---|
| 1 | frontend/src/pages/Security.tsx |
36, 421 | Renders LiveLogViewer with memoized filters | ✅ Fixed |
| 2 | frontend/src/components/LiveLogViewer.tsx |
138-143, 183-268 | Manages WebSocket lifecycle in useEffect | ✅ Fixed |
| 3 | frontend/src/api/logs.ts |
177-237 | connectSecurityLogs() - builds WS URL with auth |
✅ Working |
| 4 | backend/internal/api/routes/routes.go |
373-394 | Registers /cerberus/logs/ws in protected group |
✅ Working |
| 5 | backend/internal/api/middleware/auth.go |
12-39 | Validates JWT from header/cookie/query param | ✅ Working |
| 6 | backend/internal/api/handlers/cerberus_logs_ws.go |
27-120 | WebSocket handler with filter parsing | ✅ Working |
| 7 | backend/internal/services/log_watcher.go |
44-237 | Tails Caddy access log, broadcasts to subscribers | ✅ Working |
Authentication Flow
Frontend Backend
──────── ───────
localStorage.getItem('charon_auth_token')
│
▼
Query param: ?token=<jwt> ────────► AuthMiddleware:
1. Check Authorization header
2. Check auth_token cookie
3. Check token query param ◄── MATCHES
│
▼
ValidateToken(jwt) → OK
│
▼
Upgrade to WebSocket
Logic Gap Analysis
ANSWER: NO - There is NO logic gap between Frontend and Backend.
| Question | Answer |
|---|---|
| Frontend auth method | Query param ?token=<jwt> from localStorage.getItem('charon_auth_token') |
| Backend auth method | Accepts: Header → Cookie → Query param token ✅ |
| Filter params | Both use source, level, ip, host, blocked_only ✅ |
| Data format | SecurityLogEntry struct matches frontend TypeScript type ✅ |
1. VERIFICATION STATUS
✅ localStorage Key IS Correct
Both WebSocket functions in frontend/src/api/logs.ts correctly use charon_auth_token:
- Line 119-122 (
connectLiveLogs):localStorage.getItem('charon_auth_token') - Line 178-181 (
connectSecurityLogs):localStorage.getItem('charon_auth_token')
2. ALL ISSUES FOUND (NOW FIXED)
Issue #1: CRITICAL - Object Reference Instability in Props (ROOT CAUSE) ✅ FIXED
Problem: Security.tsx passed securityFilters={{}} inline, creating a new object on every render. This triggered useEffect cleanup/reconnection on every parent re-render.
Fix Applied:
// frontend/src/pages/Security.tsx line 36
const emptySecurityFilters = useMemo(() => ({}), [])
// frontend/src/pages/Security.tsx line 421
<LiveLogViewer mode="security" securityFilters={emptySecurityFilters} className="w-full" />
Issue #2: Default Props Had Same Problem ✅ FIXED
Problem: Default empty objects filters = {} in function params created new objects on each call.
Fix Applied:
// frontend/src/components/LiveLogViewer.tsx lines 138-143
const EMPTY_LIVE_FILTER: LiveLogFilter = {};
const EMPTY_SECURITY_FILTER: SecurityLogFilter = {};
export function LiveLogViewer({
filters = EMPTY_LIVE_FILTER,
securityFilters = EMPTY_SECURITY_FILTER,
// ...
})
Issue #3: showBlockedOnly Toggle (INTENTIONAL)
The showBlockedOnly state in useEffect dependencies causes reconnection when toggled. This is intentional for server-side filtering - not a bug.
3. ROOT CAUSE ANALYSIS
The Reconnection Loop (Before Fix)
- User navigates to Security Dashboard
Security.tsxrenders with<LiveLogViewer securityFilters={{}} />LiveLogViewermounts → useEffect runs → WebSocket connects- React Query refetches security status
Security.tsxre-renders → new{}object createdLiveLogViewerre-renders → useEffect sees "changed"securityFilters- useEffect cleanup runs → WebSocket closes
- useEffect body runs → WebSocket opens
- Repeat steps 4-8 every ~100ms
Evidence from Docker Logs (Before Fix)
{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"xxx"}
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"xxx"}
{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"yyy"}
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"yyy"}
4. COMPONENT DEEP DIVE
Frontend: Security.tsx
- Renders the Security Dashboard with 4 security layer cards (CrowdSec, ACL, Coraza, Rate Limiting)
- Contains multiple
useQuery/useMutationhooks that trigger re-renders - Line 36: Creates stable filter reference with
useMemo - Line 421: Passes stable reference to
LiveLogViewer
Frontend: LiveLogViewer.tsx
- Dual-mode log viewer (application logs vs security logs)
- Lines 138-139: Stable default filter objects defined outside component
- Lines 183-268: useEffect that manages WebSocket lifecycle
- Line 268: Dependencies:
[currentMode, filters, securityFilters, maxLogs, showBlockedOnly] - Uses
isPausedRefto avoid reconnection when pausing
Frontend: logs.ts (API Client)
connectSecurityLogs()(lines 177-237):- Builds URLSearchParams from filter object
- Gets auth token from
localStorage.getItem('charon_auth_token') - Appends token as query param
- Constructs URL:
wss://host/api/v1/cerberus/logs/ws?...&token=<jwt>
Backend: routes.go
- Line 380-389: Creates LogWatcher service pointing to
/var/log/caddy/access.log - Line 393: Creates
CerberusLogsHandler - Line 394: Registers route in protected group (auth required)
Backend: auth.go (Middleware)
- Lines 14-28: Auth flow: Header → Cookie → Query param
- Line 25-28: Query param fallback:
if token := c.Query("token"); token != "" - WebSocket connections use query param auth (browsers can't set headers on WS)
Backend: cerberus_logs_ws.go (Handler)
- Lines 42-48: Upgrades HTTP to WebSocket
- Lines 53-59: Parses filter query params
- Lines 61-62: Subscribes to LogWatcher
- Lines 80-109: Main loop broadcasting filtered entries
Backend: log_watcher.go (Service)
- Singleton service tailing Caddy access log
- Parses JSON log lines into
SecurityLogEntry - Broadcasts to all WebSocket subscribers
- Detects security events (WAF, CrowdSec, ACL, rate limit)
5. SUMMARY TABLE
| Component | Status | Notes |
|---|---|---|
| localStorage key | ✅ Fixed | Now uses charon_auth_token |
| Auth middleware | ✅ Working | Accepts query param token |
| WebSocket endpoint | ✅ Working | Protected route, upgrades correctly |
| LogWatcher service | ✅ Working | Tails access.log successfully |
| Frontend memoization | ✅ Fixed | useMemo in Security.tsx |
| Stable default props | ✅ Fixed | Constants in LiveLogViewer.tsx |
6. VERIFICATION STEPS
After any changes, verify with:
# 1. Rebuild and restart
docker build -t charon:local . && docker compose -f docker-compose.override.yml up -d
# 2. Check for stable connection (should see ONE connect, no rapid cycling)
docker logs charon 2>&1 | grep -i "cerberus.*websocket" | tail -10
# 3. Browser DevTools → Console
# Should see: "Cerberus logs WebSocket connection established"
# Should NOT see repeated connection attempts
7. CONCLUSION
Root Cause: React reference instability ({} creates new object on every render)
Solution Applied: Memoize filter objects to maintain stable references
Logic Gap Between Frontend/Backend: NO - Both are correctly aligned
Current Status: ✅ All fixes applied and working
Health Check 401 Auth Failures - Investigation Report
Date: December 16, 2025 Status: ✅ ANALYZED - NOT A BUG Severity: Informational (Log Noise)
1. INVESTIGATION SUMMARY
What the User Observed
The user reported recurring 401 auth failures in Docker logs:
01:03:10 AUTH 172.20.0.1 GET / → 401 [401] 133.6ms
{ "auth_failure": true }
01:04:10 AUTH 172.20.0.1 GET / → 401 [401] 112.9ms
{ "auth_failure": true }
Initial Hypothesis vs Reality
| Hypothesis | Reality |
|---|---|
Docker health check hitting / |
❌ Docker health check hits /api/v1/health and works correctly (200) |
| Charon backend auth issue | ❌ Charon backend auth is working fine |
| Missing health endpoint | ❌ /api/v1/health exists and is public |
2. ROOT CAUSE IDENTIFIED
The 401s are FROM Plex, NOT Charon
Evidence from logs:
{
"host": "plex.hatfieldhosted.com",
"uri": "/",
"status": 401,
"resp_headers": {
"X-Plex-Protocol": ["1.0"],
"X-Plex-Content-Compressed-Length": ["157"],
"Cache-Control": ["no-cache"]
}
}
The 401 responses contain Plex-specific headers (X-Plex-Protocol, X-Plex-Content-Compressed-Length). This proves:
- The request goes through Caddy to Plex backend
- Plex returns 401 because the request has no auth token
- Caddy logs this as a handled request
What's Making These Requests?
Charon's Uptime Monitoring Service (backend/internal/services/uptime_service.go)
The checkMonitor() function performs HTTP GET requests to proxied hosts:
case "http", "https":
client := http.Client{Timeout: 10 * time.Second}
resp, err := client.Get(monitor.URL) // e.g., https://plex.hatfieldhosted.com/
Key behaviors:
- Runs every 60 seconds (
interval: 60) - Checks the public URL of each proxy host
- Uses
Go-http-client/2.0User-Agent (visible in logs) - Correctly treats 401/403 as "service is up" (lines 471-474 of uptime_service.go)
3. ARCHITECTURE FLOW
┌─────────────────────────────────────────────────────────────┐
│ Charon Container (172.20.0.1 from Docker's perspective) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ │
│ │ Uptime Service │ │
│ │ (Go-http-client/2.0)│ │
│ └──────────┬──────────┘ │
│ │ GET https://plex.hatfieldhosted.com/ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Caddy Reverse Proxy │ │
│ │ (ports 80/443) │ │
│ └──────────┬──────────┘ │
│ │ Logs request to access.log │
└─────────────┼───────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Plex Container (172.20.0.x) │
├─────────────────────────────────────────────────────────────┤
│ GET / → 401 Unauthorized (no X-Plex-Token) │
└─────────────────────────────────────────────────────────────┘
4. DOCKER HEALTH CHECK STATUS
✅ Docker Health Check is WORKING CORRECTLY
Configuration (from all docker-compose files):
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Evidence:
[GIN] 2025/12/16 - 01:04:45 | 200 | 304.212µs | ::1 | GET "/api/v1/health"
- Hits
/api/v1/health(not/) - Returns
200(not401) - Source IP is
::1(localhost) - Interval is 30s (matches config)
Health Endpoint Details
Route Registration (routes.go#L86):
router.GET("/api/v1/health", handlers.HealthHandler)
This is registered before any auth middleware, making it a public endpoint.
Handler Response (health_handler.go#L29-L37):
func HealthHandler(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{
"status": "ok",
"service": version.Name,
"version": version.Version,
"git_commit": version.GitCommit,
"build_time": version.BuildTime,
"internal_ip": getLocalIP(),
})
}
5. WHY THIS IS NOT A BUG
Uptime Service Design is Correct
From uptime_service.go#L471-L474:
// Accept 2xx, 3xx, and 401/403 (Unauthorized/Forbidden often means the service is up but protected)
if (resp.StatusCode >= 200 && resp.StatusCode < 400) || resp.StatusCode == 401 || resp.StatusCode == 403 {
success = true
msg = fmt.Sprintf("HTTP %d", resp.StatusCode)
}
Rationale: A 401 response proves:
- The service is running
- The network path is functional
- The application is responding
This is industry-standard practice for uptime monitoring of auth-protected services.
6. RECOMMENDATIONS
Option A: Do Nothing (Recommended)
The current behavior is correct:
- Docker health checks work ✅
- Uptime monitoring works ✅
- Plex is correctly marked as "up" despite 401 ✅
The 401s in Caddy access logs are informational noise, not errors.
Option B: Reduce Log Verbosity (Optional)
If the log noise is undesirable, options include:
-
Configure Caddy to not log uptime checks: Add a log filter for
Go-http-clientUser-Agent -
Use backend health endpoints: Some services like Plex have health endpoints (
/identity,/status) that don't require auth -
Add per-monitor health path option: Extend
UptimeMonitormodel to allow custom health check paths
Option C: Already Implemented
The Uptime Service already logs status changes only, not every check:
if statusChanged {
logger.Log().WithFields(map[string]interface{}{
"host_name": host.Name,
// ...
}).Info("Host status changed")
}
7. SUMMARY TABLE
| Question | Answer |
|---|---|
| What is making the requests? | Charon's Uptime Service (Go-http-client/2.0) |
Should / be accessible without auth? |
N/A - this is hitting proxied backends, not Charon |
| Is there a dedicated health endpoint? | Yes: /api/v1/health (public, returns 200) |
| Is Docker health check working? | ✅ Yes, every 30s, returns 200 |
| Are the 401s a bug? | ❌ No, they're expected from auth-protected backends |
| What's the fix? | None needed - working as designed |
8. CONCLUSION
The 401s are NOT from Docker health checks or Charon auth failures.
They are normal responses from auth-protected backend services (like Plex) being monitored by Charon's uptime service. The uptime service correctly interprets 401/403 as "service is up but requires authentication."
No fix required. The system is working as designed.