Files
Charon/docs/plans/current_spec.md

16 KiB

Security Dashboard Live Logs - Complete Trace Analysis

Date: December 16, 2025 Status: ALL ISSUES FIXED & VERIFIED Severity: Was Critical (WebSocket reconnection loop) → Now Resolved


0. FULL TRACE ANALYSIS

File-by-File Data Flow

Step File Lines Purpose Status
1 frontend/src/pages/Security.tsx 36, 421 Renders LiveLogViewer with memoized filters Fixed
2 frontend/src/components/LiveLogViewer.tsx 138-143, 183-268 Manages WebSocket lifecycle in useEffect Fixed
3 frontend/src/api/logs.ts 177-237 connectSecurityLogs() - builds WS URL with auth Working
4 backend/internal/api/routes/routes.go 373-394 Registers /cerberus/logs/ws in protected group Working
5 backend/internal/api/middleware/auth.go 12-39 Validates JWT from header/cookie/query param Working
6 backend/internal/api/handlers/cerberus_logs_ws.go 27-120 WebSocket handler with filter parsing Working
7 backend/internal/services/log_watcher.go 44-237 Tails Caddy access log, broadcasts to subscribers Working

Authentication Flow

Frontend                              Backend
────────                              ───────
localStorage.getItem('charon_auth_token')
        │
        ▼
Query param: ?token=<jwt>  ────────►  AuthMiddleware:
                                      1. Check Authorization header
                                      2. Check auth_token cookie
                                      3. Check token query param ◄── MATCHES
                                              │
                                              ▼
                                      ValidateToken(jwt) → OK
                                              │
                                              ▼
                                      Upgrade to WebSocket

Logic Gap Analysis

ANSWER: NO - There is NO logic gap between Frontend and Backend.

Question Answer
Frontend auth method Query param ?token=<jwt> from localStorage.getItem('charon_auth_token')
Backend auth method Accepts: Header → Cookie → Query param token
Filter params Both use source, level, ip, host, blocked_only
Data format SecurityLogEntry struct matches frontend TypeScript type

1. VERIFICATION STATUS

localStorage Key IS Correct

Both WebSocket functions in frontend/src/api/logs.ts correctly use charon_auth_token:

  • Line 119-122 (connectLiveLogs): localStorage.getItem('charon_auth_token')
  • Line 178-181 (connectSecurityLogs): localStorage.getItem('charon_auth_token')

2. ALL ISSUES FOUND (NOW FIXED)

Issue #1: CRITICAL - Object Reference Instability in Props (ROOT CAUSE) FIXED

Problem: Security.tsx passed securityFilters={{}} inline, creating a new object on every render. This triggered useEffect cleanup/reconnection on every parent re-render.

Fix Applied:

// frontend/src/pages/Security.tsx line 36
const emptySecurityFilters = useMemo(() => ({}), [])

// frontend/src/pages/Security.tsx line 421
<LiveLogViewer mode="security" securityFilters={emptySecurityFilters} className="w-full" />

Issue #2: Default Props Had Same Problem FIXED

Problem: Default empty objects filters = {} in function params created new objects on each call.

Fix Applied:

// frontend/src/components/LiveLogViewer.tsx lines 138-143
const EMPTY_LIVE_FILTER: LiveLogFilter = {};
const EMPTY_SECURITY_FILTER: SecurityLogFilter = {};

export function LiveLogViewer({
  filters = EMPTY_LIVE_FILTER,
  securityFilters = EMPTY_SECURITY_FILTER,
  // ...
})

Issue #3: showBlockedOnly Toggle (INTENTIONAL)

The showBlockedOnly state in useEffect dependencies causes reconnection when toggled. This is intentional for server-side filtering - not a bug.


3. ROOT CAUSE ANALYSIS

The Reconnection Loop (Before Fix)

  1. User navigates to Security Dashboard
  2. Security.tsx renders with <LiveLogViewer securityFilters={{}} />
  3. LiveLogViewer mounts → useEffect runs → WebSocket connects
  4. React Query refetches security status
  5. Security.tsx re-renders → new {} object created
  6. LiveLogViewer re-renders → useEffect sees "changed" securityFilters
  7. useEffect cleanup runs → WebSocket closes
  8. useEffect body runs → WebSocket opens
  9. Repeat steps 4-8 every ~100ms

Evidence from Docker Logs (Before Fix)

{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"xxx"}
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"xxx"}
{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"yyy"}
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"yyy"}

4. COMPONENT DEEP DIVE

Frontend: Security.tsx

  • Renders the Security Dashboard with 4 security layer cards (CrowdSec, ACL, Coraza, Rate Limiting)
  • Contains multiple useQuery/useMutation hooks that trigger re-renders
  • Line 36: Creates stable filter reference with useMemo
  • Line 421: Passes stable reference to LiveLogViewer

Frontend: LiveLogViewer.tsx

  • Dual-mode log viewer (application logs vs security logs)
  • Lines 138-139: Stable default filter objects defined outside component
  • Lines 183-268: useEffect that manages WebSocket lifecycle
  • Line 268: Dependencies: [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]
  • Uses isPausedRef to avoid reconnection when pausing

Frontend: logs.ts (API Client)

  • connectSecurityLogs() (lines 177-237):
    • Builds URLSearchParams from filter object
    • Gets auth token from localStorage.getItem('charon_auth_token')
    • Appends token as query param
    • Constructs URL: wss://host/api/v1/cerberus/logs/ws?...&token=<jwt>

Backend: routes.go

  • Line 380-389: Creates LogWatcher service pointing to /var/log/caddy/access.log
  • Line 393: Creates CerberusLogsHandler
  • Line 394: Registers route in protected group (auth required)

Backend: auth.go (Middleware)

  • Lines 14-28: Auth flow: Header → Cookie → Query param
  • Line 25-28: Query param fallback: if token := c.Query("token"); token != ""
  • WebSocket connections use query param auth (browsers can't set headers on WS)

Backend: cerberus_logs_ws.go (Handler)

  • Lines 42-48: Upgrades HTTP to WebSocket
  • Lines 53-59: Parses filter query params
  • Lines 61-62: Subscribes to LogWatcher
  • Lines 80-109: Main loop broadcasting filtered entries

Backend: log_watcher.go (Service)

  • Singleton service tailing Caddy access log
  • Parses JSON log lines into SecurityLogEntry
  • Broadcasts to all WebSocket subscribers
  • Detects security events (WAF, CrowdSec, ACL, rate limit)

5. SUMMARY TABLE

Component Status Notes
localStorage key Fixed Now uses charon_auth_token
Auth middleware Working Accepts query param token
WebSocket endpoint Working Protected route, upgrades correctly
LogWatcher service Working Tails access.log successfully
Frontend memoization Fixed useMemo in Security.tsx
Stable default props Fixed Constants in LiveLogViewer.tsx

6. VERIFICATION STEPS

After any changes, verify with:

# 1. Rebuild and restart
docker build -t charon:local . && docker compose -f docker-compose.override.yml up -d

# 2. Check for stable connection (should see ONE connect, no rapid cycling)
docker logs charon 2>&1 | grep -i "cerberus.*websocket" | tail -10

# 3. Browser DevTools → Console
# Should see: "Cerberus logs WebSocket connection established"
# Should NOT see repeated connection attempts

7. CONCLUSION

Root Cause: React reference instability ({} creates new object on every render)

Solution Applied: Memoize filter objects to maintain stable references

Logic Gap Between Frontend/Backend: NO - Both are correctly aligned

Current Status: All fixes applied and working


Health Check 401 Auth Failures - Investigation Report

Date: December 16, 2025 Status: ANALYZED - NOT A BUG Severity: Informational (Log Noise)


1. INVESTIGATION SUMMARY

What the User Observed

The user reported recurring 401 auth failures in Docker logs:

01:03:10 AUTH 172.20.0.1 GET / → 401 [401] 133.6ms
{ "auth_failure": true }
01:04:10 AUTH 172.20.0.1 GET / → 401 [401] 112.9ms
{ "auth_failure": true }

Initial Hypothesis vs Reality

Hypothesis Reality
Docker health check hitting / Docker health check hits /api/v1/health and works correctly (200)
Charon backend auth issue Charon backend auth is working fine
Missing health endpoint /api/v1/health exists and is public

2. ROOT CAUSE IDENTIFIED

The 401s are FROM Plex, NOT Charon

Evidence from logs:

{
  "host": "plex.hatfieldhosted.com",
  "uri": "/",
  "status": 401,
  "resp_headers": {
    "X-Plex-Protocol": ["1.0"],
    "X-Plex-Content-Compressed-Length": ["157"],
    "Cache-Control": ["no-cache"]
  }
}

The 401 responses contain Plex-specific headers (X-Plex-Protocol, X-Plex-Content-Compressed-Length). This proves:

  1. The request goes through Caddy to Plex backend
  2. Plex returns 401 because the request has no auth token
  3. Caddy logs this as a handled request

What's Making These Requests?

Charon's Uptime Monitoring Service (backend/internal/services/uptime_service.go)

The checkMonitor() function performs HTTP GET requests to proxied hosts:

case "http", "https":
    client := http.Client{Timeout: 10 * time.Second}
    resp, err := client.Get(monitor.URL)  // e.g., https://plex.hatfieldhosted.com/

Key behaviors:

  • Runs every 60 seconds (interval: 60)
  • Checks the public URL of each proxy host
  • Uses Go-http-client/2.0 User-Agent (visible in logs)
  • Correctly treats 401/403 as "service is up" (lines 471-474 of uptime_service.go)

3. ARCHITECTURE FLOW

┌─────────────────────────────────────────────────────────────┐
│ Charon Container (172.20.0.1 from Docker's perspective)    │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌─────────────────────┐                                   │
│  │ Uptime Service      │                                   │
│  │ (Go-http-client/2.0)│                                   │
│  └──────────┬──────────┘                                   │
│             │ GET https://plex.hatfieldhosted.com/         │
│             ▼                                              │
│  ┌─────────────────────┐                                   │
│  │ Caddy Reverse Proxy │                                   │
│  │ (ports 80/443)      │                                   │
│  └──────────┬──────────┘                                   │
│             │ Logs request to access.log                   │
└─────────────┼───────────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────────┐
│ Plex Container (172.20.0.x)                                │
├─────────────────────────────────────────────────────────────┤
│  GET / → 401 Unauthorized (no X-Plex-Token)               │
└─────────────────────────────────────────────────────────────┘

4. DOCKER HEALTH CHECK STATUS

Docker Health Check is WORKING CORRECTLY

Configuration (from all docker-compose files):

healthcheck:
  test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Evidence:

[GIN] 2025/12/16 - 01:04:45 | 200 |     304.212µs |             ::1 | GET      "/api/v1/health"
  • Hits /api/v1/health (not /)
  • Returns 200 (not 401)
  • Source IP is ::1 (localhost)
  • Interval is 30s (matches config)

Health Endpoint Details

Route Registration (routes.go#L86):

router.GET("/api/v1/health", handlers.HealthHandler)

This is registered before any auth middleware, making it a public endpoint.

Handler Response (health_handler.go#L29-L37):

func HealthHandler(c *gin.Context) {
    c.JSON(http.StatusOK, gin.H{
        "status":      "ok",
        "service":     version.Name,
        "version":     version.Version,
        "git_commit":  version.GitCommit,
        "build_time":  version.BuildTime,
        "internal_ip": getLocalIP(),
    })
}

5. WHY THIS IS NOT A BUG

Uptime Service Design is Correct

From uptime_service.go#L471-L474:

// Accept 2xx, 3xx, and 401/403 (Unauthorized/Forbidden often means the service is up but protected)
if (resp.StatusCode >= 200 && resp.StatusCode < 400) || resp.StatusCode == 401 || resp.StatusCode == 403 {
    success = true
    msg = fmt.Sprintf("HTTP %d", resp.StatusCode)
}

Rationale: A 401 response proves:

  • The service is running
  • The network path is functional
  • The application is responding

This is industry-standard practice for uptime monitoring of auth-protected services.


6. RECOMMENDATIONS

The current behavior is correct:

  • Docker health checks work
  • Uptime monitoring works
  • Plex is correctly marked as "up" despite 401

The 401s in Caddy access logs are informational noise, not errors.

Option B: Reduce Log Verbosity (Optional)

If the log noise is undesirable, options include:

  1. Configure Caddy to not log uptime checks: Add a log filter for Go-http-client User-Agent

  2. Use backend health endpoints: Some services like Plex have health endpoints (/identity, /status) that don't require auth

  3. Add per-monitor health path option: Extend UptimeMonitor model to allow custom health check paths

Option C: Already Implemented

The Uptime Service already logs status changes only, not every check:

if statusChanged {
    logger.Log().WithFields(map[string]interface{}{
        "host_name": host.Name,
        // ...
    }).Info("Host status changed")
}

7. SUMMARY TABLE

Question Answer
What is making the requests? Charon's Uptime Service (Go-http-client/2.0)
Should / be accessible without auth? N/A - this is hitting proxied backends, not Charon
Is there a dedicated health endpoint? Yes: /api/v1/health (public, returns 200)
Is Docker health check working? Yes, every 30s, returns 200
Are the 401s a bug? No, they're expected from auth-protected backends
What's the fix? None needed - working as designed

8. CONCLUSION

The 401s are NOT from Docker health checks or Charon auth failures.

They are normal responses from auth-protected backend services (like Plex) being monitored by Charon's uptime service. The uptime service correctly interprets 401/403 as "service is up but requires authentication."

No fix required. The system is working as designed.