Files
Charon/docs/plans/prev_spec_websocket_fix_dec16.md
GitHub Actions f936c93896 fix: add missing field handlers in proxy host Update endpoint
Add handlers for enable_standard_headers, forward_auth_enabled, and waf_disabled fields
in the proxy host Update function. These fields were defined in the model but were not
being processed during updates, causing:

- 500 errors when saving proxy host configurations
- Auth pass-through failures for apps like Seerr/Overseerr due to missing X-Forwarded-* headers

Changes:
- backend: Add field handlers for 3 missing fields in proxy_host_handler.go
- backend: Add 5 comprehensive unit tests for field handling
- frontend: Update TypeScript ProxyHost interface with missing fields
- docs: Document fixes in CHANGELOG.md

Tests: All 1147 tests pass (backend 85.6%, frontend 87.7% coverage)
Security: No vulnerabilities (Trivy + govulncheck clean)

Fixes #16 (auth pass-through)
Fixes #17 (500 error on save)
2025-12-20 01:55:52 +00:00

483 lines
17 KiB
Markdown

# Security Dashboard Live Logs - Complete Trace Analysis
**Date:** December 16, 2025
**Status:** ✅ ALL ISSUES FIXED & VERIFIED
**Severity:** Was Critical (WebSocket reconnection loop) → Now Resolved
---
## 0. FULL TRACE ANALYSIS
### File-by-File Data Flow
| Step | File | Lines | Purpose | Status |
|------|------|-------|---------|--------|
| 1 | `frontend/src/pages/Security.tsx` | 36, 421 | Renders LiveLogViewer with memoized filters | ✅ Fixed |
| 2 | `frontend/src/components/LiveLogViewer.tsx` | 138-143, 183-268 | Manages WebSocket lifecycle in useEffect | ✅ Fixed |
| 3 | `frontend/src/api/logs.ts` | 177-237 | `connectSecurityLogs()` - builds WS URL with auth | ✅ Working |
| 4 | `backend/internal/api/routes/routes.go` | 373-394 | Registers `/cerberus/logs/ws` in protected group | ✅ Working |
| 5 | `backend/internal/api/middleware/auth.go` | 12-39 | Validates JWT from header/cookie/query param | ✅ Working |
| 6 | `backend/internal/api/handlers/cerberus_logs_ws.go` | 27-120 | WebSocket handler with filter parsing | ✅ Working |
| 7 | `backend/internal/services/log_watcher.go` | 44-237 | Tails Caddy access log, broadcasts to subscribers | ✅ Working |
### Authentication Flow
```text
Frontend Backend
──────── ───────
User logs in
Backend sets HttpOnly auth_token cookie ──► AuthMiddleware:
│ 1. Check Authorization header
│ 2. Check auth_token cookie ◄── SECURE METHOD
│ 3. (Deprecated) Check token query param
▼ │
WebSocket connection initiated ▼
(Cookie sent automatically by browser) ValidateToken(jwt) → OK
│ │
│ ▼
└──────────────────────────────────► Upgrade to WebSocket
```
**Security Note:** Authentication now uses HttpOnly cookies instead of query parameters.
This prevents JWT tokens from being logged in access logs, proxies, and other telemetry.
The browser automatically sends the cookie with WebSocket upgrade requests.
### Logic Gap Analysis
**ANSWER: NO - There is NO logic gap between Frontend and Backend.**
| Question | Answer |
|----------|--------|
| Frontend auth method | HttpOnly cookie (`auth_token`) sent automatically by browser ✅ SECURE |
| Backend auth method | Accepts: Header → Cookie (preferred) → Query param (deprecated) ✅ |
| Filter params | Both use `source`, `level`, `ip`, `host`, `blocked_only` ✅ |
| Data format | `SecurityLogEntry` struct matches frontend TypeScript type ✅ |
| Security | Tokens no longer logged in access logs or exposed to XSS ✅ |
---
## 1. VERIFICATION STATUS
### ✅ Authentication Method Updated for Security
WebSocket authentication now uses HttpOnly cookies instead of query parameters:
- **`connectLiveLogs`** (frontend/src/api/logs.ts): Uses browser's automatic cookie transmission
- **`connectSecurityLogs`** (frontend/src/api/logs.ts): Uses browser's automatic cookie transmission
- **Backend middleware**: Prioritizes cookie-based auth, query param is deprecated
This change prevents JWT tokens from appearing in access logs, proxy logs, and other telemetry.
---
## 2. ALL ISSUES FOUND (NOW FIXED)
### Issue #1: CRITICAL - Object Reference Instability in Props (ROOT CAUSE) ✅ FIXED
**Problem:** `Security.tsx` passed `securityFilters={{}}` inline, creating a new object on every render. This triggered useEffect cleanup/reconnection on every parent re-render.
**Fix Applied:**
```tsx
// frontend/src/pages/Security.tsx line 36
const emptySecurityFilters = useMemo(() => ({}), [])
// frontend/src/pages/Security.tsx line 421
<LiveLogViewer mode="security" securityFilters={emptySecurityFilters} className="w-full" />
```
### Issue #2: Default Props Had Same Problem ✅ FIXED
**Problem:** Default empty objects `filters = {}` in function params created new objects on each call.
**Fix Applied:**
```typescript
// frontend/src/components/LiveLogViewer.tsx lines 138-143
const EMPTY_LIVE_FILTER: LiveLogFilter = {};
const EMPTY_SECURITY_FILTER: SecurityLogFilter = {};
export function LiveLogViewer({
filters = EMPTY_LIVE_FILTER,
securityFilters = EMPTY_SECURITY_FILTER,
// ...
})
```
### Issue #3: `showBlockedOnly` Toggle (INTENTIONAL)
The `showBlockedOnly` state in useEffect dependencies causes reconnection when toggled. This is **intentional** for server-side filtering - not a bug.
---
## 3. ROOT CAUSE ANALYSIS
### The Reconnection Loop (Before Fix)
1. User navigates to Security Dashboard
2. `Security.tsx` renders with `<LiveLogViewer securityFilters={{}} />`
3. `LiveLogViewer` mounts → useEffect runs → WebSocket connects
4. React Query refetches security status
5. `Security.tsx` re-renders → **new `{}` object created**
6. `LiveLogViewer` re-renders → useEffect sees "changed" `securityFilters`
7. useEffect cleanup runs → **WebSocket closes**
8. useEffect body runs → **WebSocket opens**
9. Repeat steps 4-8 every ~100ms
### Evidence from Docker Logs (Before Fix)
```text
{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"xxx"}
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"xxx"}
{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"yyy"}
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"yyy"}
```
---
## 4. COMPONENT DEEP DIVE
### Frontend: Security.tsx
- Renders the Security Dashboard with 4 security layer cards (CrowdSec, ACL, Coraza, Rate Limiting)
- Contains multiple `useQuery`/`useMutation` hooks that trigger re-renders
- **Line 36:** Creates stable filter reference with `useMemo`
- **Line 421:** Passes stable reference to `LiveLogViewer`
### Frontend: LiveLogViewer.tsx
- Dual-mode log viewer (application logs vs security logs)
- **Lines 138-139:** Stable default filter objects defined outside component
- **Lines 183-268:** useEffect that manages WebSocket lifecycle
- **Line 268:** Dependencies: `[currentMode, filters, securityFilters, maxLogs, showBlockedOnly]`
- Uses `isPausedRef` to avoid reconnection when pausing
### Frontend: logs.ts (API Client)
- **`connectSecurityLogs()`** (lines 177-237):
- Builds URLSearchParams from filter object
- Gets auth token from `localStorage.getItem('charon_auth_token')`
- Appends token as query param
- Constructs URL: `wss://host/api/v1/cerberus/logs/ws?...&token=<jwt>`
### Backend: routes.go
- **Line 380-389:** Creates LogWatcher service pointing to `/var/log/caddy/access.log`
- **Line 393:** Creates `CerberusLogsHandler`
- **Line 394:** Registers route in protected group (auth required)
### Backend: auth.go (Middleware)
- **Lines 14-28:** Auth flow: Header → Cookie → Query param
- **Line 25-28:** Query param fallback: `if token := c.Query("token"); token != ""`
- WebSocket connections use query param auth (browsers can't set headers on WS)
### Backend: cerberus_logs_ws.go (Handler)
- **Lines 42-48:** Upgrades HTTP to WebSocket
- **Lines 53-59:** Parses filter query params
- **Lines 61-62:** Subscribes to LogWatcher
- **Lines 80-109:** Main loop broadcasting filtered entries
### Backend: log_watcher.go (Service)
- Singleton service tailing Caddy access log
- Parses JSON log lines into `SecurityLogEntry`
- Broadcasts to all WebSocket subscribers
- Detects security events (WAF, CrowdSec, ACL, rate limit)
---
## 5. SUMMARY TABLE
| Component | Status | Notes |
|-----------|--------|-------|
| WebSocket authentication | ✅ Secured | Now uses HttpOnly cookies instead of query parameters |
| Auth middleware | ✅ Updated | Cookie-based auth prioritized, query param deprecated |
| WebSocket endpoint | ✅ Working | Protected route, upgrades correctly |
| LogWatcher service | ✅ Working | Tails access.log successfully |
| **Frontend memoization** | ✅ Fixed | `useMemo` in Security.tsx |
| **Stable default props** | ✅ Fixed | Constants in LiveLogViewer.tsx |
| **Security improvement** | ✅ Complete | Tokens no longer exposed in logs |
---
## 6. VERIFICATION STEPS
After any changes, verify with:
```bash
# 1. Rebuild and restart
docker build -t charon:local . && docker compose -f docker-compose.override.yml up -d
# 2. Check for stable connection (should see ONE connect, no rapid cycling)
docker logs charon 2>&1 | grep -i "cerberus.*websocket" | tail -10
# 3. Browser DevTools → Console
# Should see: "Cerberus logs WebSocket connection established"
# Should NOT see repeated connection attempts
```
---
## 7. CONCLUSION
**Root Cause:** React reference instability (`{}` creates new object on every render)
**Solution Applied:** Memoize filter objects to maintain stable references
**Logic Gap Between Frontend/Backend:** **NO** - Both are correctly aligned
**Security Enhancement:** WebSocket authentication now uses HttpOnly cookies instead of query parameters, preventing token leakage in logs
**Current Status:** ✅ All fixes applied and working securely
---
# Health Check 401 Auth Failures - Investigation Report
**Date:** December 16, 2025
**Status:** ✅ ANALYZED - NOT A BUG
**Severity:** Informational (Log Noise)
---
## 1. INVESTIGATION SUMMARY
### What the User Observed
The user reported recurring 401 auth failures in Docker logs:
```
01:03:10 AUTH 172.20.0.1 GET / → 401 [401] 133.6ms
{ "auth_failure": true }
01:04:10 AUTH 172.20.0.1 GET / → 401 [401] 112.9ms
{ "auth_failure": true }
```
### Initial Hypothesis vs Reality
| Hypothesis | Reality |
|------------|---------|
| Docker health check hitting `/` | ❌ Docker health check hits `/api/v1/health` and works correctly (200) |
| Charon backend auth issue | ❌ Charon backend auth is working fine |
| Missing health endpoint | ❌ `/api/v1/health` exists and is public |
---
## 2. ROOT CAUSE IDENTIFIED
### The 401s are FROM Plex, NOT Charon
**Evidence from logs:**
```json
{
"host": "plex.hatfieldhosted.com",
"uri": "/",
"status": 401,
"resp_headers": {
"X-Plex-Protocol": ["1.0"],
"X-Plex-Content-Compressed-Length": ["157"],
"Cache-Control": ["no-cache"]
}
}
```
The 401 responses contain **Plex-specific headers** (`X-Plex-Protocol`, `X-Plex-Content-Compressed-Length`). This proves:
1. The request goes through Caddy to **Plex backend**
2. **Plex** returns 401 because the request has no auth token
3. Caddy logs this as a handled request
### What's Making These Requests?
**Charon's Uptime Monitoring Service** (`backend/internal/services/uptime_service.go`)
The `checkMonitor()` function performs HTTP GET requests to proxied hosts:
```go
case "http", "https":
client := http.Client{Timeout: 10 * time.Second}
resp, err := client.Get(monitor.URL) // e.g., https://plex.hatfieldhosted.com/
```
Key behaviors:
- Runs every 60 seconds (`interval: 60`)
- Checks the **public URL** of each proxy host
- Uses `Go-http-client/2.0` User-Agent (visible in logs)
- **Correctly treats 401/403 as "service is up"** (lines 471-474 of uptime_service.go)
---
## 3. ARCHITECTURE FLOW
```text
┌─────────────────────────────────────────────────────────────┐
│ Charon Container (172.20.0.1 from Docker's perspective) │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ │
│ │ Uptime Service │ │
│ │ (Go-http-client/2.0)│ │
│ └──────────┬──────────┘ │
│ │ GET https://plex.hatfieldhosted.com/ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ Caddy Reverse Proxy │ │
│ │ (ports 80/443) │ │
│ └──────────┬──────────┘ │
│ │ Logs request to access.log │
└─────────────┼───────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Plex Container (172.20.0.x) │
├─────────────────────────────────────────────────────────────┤
│ GET / → 401 Unauthorized (no X-Plex-Token) │
└─────────────────────────────────────────────────────────────┘
```
---
## 4. DOCKER HEALTH CHECK STATUS
### ✅ Docker Health Check is WORKING CORRECTLY
**Configuration** (from all docker-compose files):
```yaml
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
**Evidence:**
```
[GIN] 2025/12/16 - 01:04:45 | 200 | 304.212µs | ::1 | GET "/api/v1/health"
```
- Hits `/api/v1/health` (not `/`)
- Returns `200` (not `401`)
- Source IP is `::1` (localhost)
- Interval is 30s (matches config)
### Health Endpoint Details
**Route Registration** ([routes.go#L86](backend/internal/api/routes/routes.go#L86)):
```go
router.GET("/api/v1/health", handlers.HealthHandler)
```
This is registered **before** any auth middleware, making it a public endpoint.
**Handler Response** ([health_handler.go#L29-L37](backend/internal/api/handlers/health_handler.go#L29-L37)):
```go
func HealthHandler(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{
"status": "ok",
"service": version.Name,
"version": version.Version,
"git_commit": version.GitCommit,
"build_time": version.BuildTime,
"internal_ip": getLocalIP(),
})
}
```
---
## 5. WHY THIS IS NOT A BUG
### Uptime Service Design is Correct
From [uptime_service.go#L471-L474](backend/internal/services/uptime_service.go#L471-L474):
```go
// Accept 2xx, 3xx, and 401/403 (Unauthorized/Forbidden often means the service is up but protected)
if (resp.StatusCode >= 200 && resp.StatusCode < 400) || resp.StatusCode == 401 || resp.StatusCode == 403 {
success = true
msg = fmt.Sprintf("HTTP %d", resp.StatusCode)
}
```
**Rationale:** A 401 response proves:
- The service is running
- The network path is functional
- The application is responding
This is industry-standard practice for uptime monitoring of auth-protected services.
---
## 6. RECOMMENDATIONS
### Option A: Do Nothing (Recommended)
The current behavior is correct:
- Docker health checks work ✅
- Uptime monitoring works ✅
- Plex is correctly marked as "up" despite 401 ✅
The 401s in Caddy access logs are informational noise, not errors.
### Option B: Reduce Log Verbosity (Optional)
If the log noise is undesirable, options include:
1. **Configure Caddy to not log uptime checks:**
Add a log filter for `Go-http-client` User-Agent
2. **Use backend health endpoints:**
Some services like Plex have health endpoints (`/identity`, `/status`) that don't require auth
3. **Add per-monitor health path option:**
Extend `UptimeMonitor` model to allow custom health check paths
### Option C: Already Implemented
The Uptime Service already logs status changes only, not every check:
```go
if statusChanged {
logger.Log().WithFields(map[string]interface{}{
"host_name": host.Name,
// ...
}).Info("Host status changed")
}
```
---
## 7. SUMMARY TABLE
| Question | Answer |
|----------|--------|
| What is making the requests? | Charon's Uptime Service (`Go-http-client/2.0`) |
| Should `/` be accessible without auth? | N/A - this is hitting proxied backends, not Charon |
| Is there a dedicated health endpoint? | Yes: `/api/v1/health` (public, returns 200) |
| Is Docker health check working? | ✅ Yes, every 30s, returns 200 |
| Are the 401s a bug? | ❌ No, they're expected from auth-protected backends |
| What's the fix? | None needed - working as designed |
---
## 8. CONCLUSION
**The 401s are NOT from Docker health checks or Charon auth failures.**
They are normal responses from **auth-protected backend services** (like Plex) being monitored by Charon's uptime service. The uptime service correctly interprets 401/403 as "service is up but requires authentication."
**No fix required.** The system is working as designed.