chore: clean .gitignore cache
This commit is contained in:
@@ -1,482 +0,0 @@
|
||||
# Security Dashboard Live Logs - Complete Trace Analysis
|
||||
|
||||
**Date:** December 16, 2025
|
||||
**Status:** ✅ ALL ISSUES FIXED & VERIFIED
|
||||
**Severity:** Was Critical (WebSocket reconnection loop) → Now Resolved
|
||||
|
||||
---
|
||||
|
||||
## 0. FULL TRACE ANALYSIS
|
||||
|
||||
### File-by-File Data Flow
|
||||
|
||||
| Step | File | Lines | Purpose | Status |
|
||||
|------|------|-------|---------|--------|
|
||||
| 1 | `frontend/src/pages/Security.tsx` | 36, 421 | Renders LiveLogViewer with memoized filters | ✅ Fixed |
|
||||
| 2 | `frontend/src/components/LiveLogViewer.tsx` | 138-143, 183-268 | Manages WebSocket lifecycle in useEffect | ✅ Fixed |
|
||||
| 3 | `frontend/src/api/logs.ts` | 177-237 | `connectSecurityLogs()` - builds WS URL with auth | ✅ Working |
|
||||
| 4 | `backend/internal/api/routes/routes.go` | 373-394 | Registers `/cerberus/logs/ws` in protected group | ✅ Working |
|
||||
| 5 | `backend/internal/api/middleware/auth.go` | 12-39 | Validates JWT from header/cookie/query param | ✅ Working |
|
||||
| 6 | `backend/internal/api/handlers/cerberus_logs_ws.go` | 27-120 | WebSocket handler with filter parsing | ✅ Working |
|
||||
| 7 | `backend/internal/services/log_watcher.go` | 44-237 | Tails Caddy access log, broadcasts to subscribers | ✅ Working |
|
||||
|
||||
### Authentication Flow
|
||||
|
||||
```text
|
||||
Frontend Backend
|
||||
──────── ───────
|
||||
User logs in
|
||||
│
|
||||
▼
|
||||
Backend sets HttpOnly auth_token cookie ──► AuthMiddleware:
|
||||
│ 1. Check Authorization header
|
||||
│ 2. Check auth_token cookie ◄── SECURE METHOD
|
||||
│ 3. (Deprecated) Check token query param
|
||||
▼ │
|
||||
WebSocket connection initiated ▼
|
||||
(Cookie sent automatically by browser) ValidateToken(jwt) → OK
|
||||
│ │
|
||||
│ ▼
|
||||
└──────────────────────────────────► Upgrade to WebSocket
|
||||
```
|
||||
|
||||
**Security Note:** Authentication now uses HttpOnly cookies instead of query parameters.
|
||||
This prevents JWT tokens from being logged in access logs, proxies, and other telemetry.
|
||||
The browser automatically sends the cookie with WebSocket upgrade requests.
|
||||
|
||||
### Logic Gap Analysis
|
||||
|
||||
**ANSWER: NO - There is NO logic gap between Frontend and Backend.**
|
||||
|
||||
| Question | Answer |
|
||||
|----------|--------|
|
||||
| Frontend auth method | HttpOnly cookie (`auth_token`) sent automatically by browser ✅ SECURE |
|
||||
| Backend auth method | Accepts: Header → Cookie (preferred) → Query param (deprecated) ✅ |
|
||||
| Filter params | Both use `source`, `level`, `ip`, `host`, `blocked_only` ✅ |
|
||||
| Data format | `SecurityLogEntry` struct matches frontend TypeScript type ✅ |
|
||||
| Security | Tokens no longer logged in access logs or exposed to XSS ✅ |
|
||||
|
||||
---
|
||||
|
||||
## 1. VERIFICATION STATUS
|
||||
|
||||
### ✅ Authentication Method Updated for Security
|
||||
|
||||
WebSocket authentication now uses HttpOnly cookies instead of query parameters:
|
||||
|
||||
- **`connectLiveLogs`** (frontend/src/api/logs.ts): Uses browser's automatic cookie transmission
|
||||
- **`connectSecurityLogs`** (frontend/src/api/logs.ts): Uses browser's automatic cookie transmission
|
||||
- **Backend middleware**: Prioritizes cookie-based auth, query param is deprecated
|
||||
|
||||
This change prevents JWT tokens from appearing in access logs, proxy logs, and other telemetry.
|
||||
|
||||
---
|
||||
|
||||
## 2. ALL ISSUES FOUND (NOW FIXED)
|
||||
|
||||
### Issue #1: CRITICAL - Object Reference Instability in Props (ROOT CAUSE) ✅ FIXED
|
||||
|
||||
**Problem:** `Security.tsx` passed `securityFilters={{}}` inline, creating a new object on every render. This triggered useEffect cleanup/reconnection on every parent re-render.
|
||||
|
||||
**Fix Applied:**
|
||||
|
||||
```tsx
|
||||
// frontend/src/pages/Security.tsx line 36
|
||||
const emptySecurityFilters = useMemo(() => ({}), [])
|
||||
|
||||
// frontend/src/pages/Security.tsx line 421
|
||||
<LiveLogViewer mode="security" securityFilters={emptySecurityFilters} className="w-full" />
|
||||
```
|
||||
|
||||
### Issue #2: Default Props Had Same Problem ✅ FIXED
|
||||
|
||||
**Problem:** Default empty objects `filters = {}` in function params created new objects on each call.
|
||||
|
||||
**Fix Applied:**
|
||||
|
||||
```typescript
|
||||
// frontend/src/components/LiveLogViewer.tsx lines 138-143
|
||||
const EMPTY_LIVE_FILTER: LiveLogFilter = {};
|
||||
const EMPTY_SECURITY_FILTER: SecurityLogFilter = {};
|
||||
|
||||
export function LiveLogViewer({
|
||||
filters = EMPTY_LIVE_FILTER,
|
||||
securityFilters = EMPTY_SECURITY_FILTER,
|
||||
// ...
|
||||
})
|
||||
```
|
||||
|
||||
### Issue #3: `showBlockedOnly` Toggle (INTENTIONAL)
|
||||
|
||||
The `showBlockedOnly` state in useEffect dependencies causes reconnection when toggled. This is **intentional** for server-side filtering - not a bug.
|
||||
|
||||
---
|
||||
|
||||
## 3. ROOT CAUSE ANALYSIS
|
||||
|
||||
### The Reconnection Loop (Before Fix)
|
||||
|
||||
1. User navigates to Security Dashboard
|
||||
2. `Security.tsx` renders with `<LiveLogViewer securityFilters={{}} />`
|
||||
3. `LiveLogViewer` mounts → useEffect runs → WebSocket connects
|
||||
4. React Query refetches security status
|
||||
5. `Security.tsx` re-renders → **new `{}` object created**
|
||||
6. `LiveLogViewer` re-renders → useEffect sees "changed" `securityFilters`
|
||||
7. useEffect cleanup runs → **WebSocket closes**
|
||||
8. useEffect body runs → **WebSocket opens**
|
||||
9. Repeat steps 4-8 every ~100ms
|
||||
|
||||
### Evidence from Docker Logs (Before Fix)
|
||||
|
||||
```text
|
||||
{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"xxx"}
|
||||
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"xxx"}
|
||||
{"level":"info","msg":"Cerberus logs WebSocket connected","subscriber_id":"yyy"}
|
||||
{"level":"info","msg":"Cerberus logs WebSocket client disconnected","subscriber_id":"yyy"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. COMPONENT DEEP DIVE
|
||||
|
||||
### Frontend: Security.tsx
|
||||
|
||||
- Renders the Security Dashboard with 4 security layer cards (CrowdSec, ACL, Coraza, Rate Limiting)
|
||||
- Contains multiple `useQuery`/`useMutation` hooks that trigger re-renders
|
||||
- **Line 36:** Creates stable filter reference with `useMemo`
|
||||
- **Line 421:** Passes stable reference to `LiveLogViewer`
|
||||
|
||||
### Frontend: LiveLogViewer.tsx
|
||||
|
||||
- Dual-mode log viewer (application logs vs security logs)
|
||||
- **Lines 138-139:** Stable default filter objects defined outside component
|
||||
- **Lines 183-268:** useEffect that manages WebSocket lifecycle
|
||||
- **Line 268:** Dependencies: `[currentMode, filters, securityFilters, maxLogs, showBlockedOnly]`
|
||||
- Uses `isPausedRef` to avoid reconnection when pausing
|
||||
|
||||
### Frontend: logs.ts (API Client)
|
||||
|
||||
- **`connectSecurityLogs()`** (lines 177-237):
|
||||
- Builds URLSearchParams from filter object
|
||||
- Gets auth token from `localStorage.getItem('charon_auth_token')`
|
||||
- Appends token as query param
|
||||
- Constructs URL: `wss://host/api/v1/cerberus/logs/ws?...&token=<jwt>`
|
||||
|
||||
### Backend: routes.go
|
||||
|
||||
- **Line 380-389:** Creates LogWatcher service pointing to `/var/log/caddy/access.log`
|
||||
- **Line 393:** Creates `CerberusLogsHandler`
|
||||
- **Line 394:** Registers route in protected group (auth required)
|
||||
|
||||
### Backend: auth.go (Middleware)
|
||||
|
||||
- **Lines 14-28:** Auth flow: Header → Cookie → Query param
|
||||
- **Line 25-28:** Query param fallback: `if token := c.Query("token"); token != ""`
|
||||
- WebSocket connections use query param auth (browsers can't set headers on WS)
|
||||
|
||||
### Backend: cerberus_logs_ws.go (Handler)
|
||||
|
||||
- **Lines 42-48:** Upgrades HTTP to WebSocket
|
||||
- **Lines 53-59:** Parses filter query params
|
||||
- **Lines 61-62:** Subscribes to LogWatcher
|
||||
- **Lines 80-109:** Main loop broadcasting filtered entries
|
||||
|
||||
### Backend: log_watcher.go (Service)
|
||||
|
||||
- Singleton service tailing Caddy access log
|
||||
- Parses JSON log lines into `SecurityLogEntry`
|
||||
- Broadcasts to all WebSocket subscribers
|
||||
- Detects security events (WAF, CrowdSec, ACL, rate limit)
|
||||
|
||||
---
|
||||
|
||||
## 5. SUMMARY TABLE
|
||||
|
||||
| Component | Status | Notes |
|
||||
|-----------|--------|-------|
|
||||
| WebSocket authentication | ✅ Secured | Now uses HttpOnly cookies instead of query parameters |
|
||||
| Auth middleware | ✅ Updated | Cookie-based auth prioritized, query param deprecated |
|
||||
| WebSocket endpoint | ✅ Working | Protected route, upgrades correctly |
|
||||
| LogWatcher service | ✅ Working | Tails access.log successfully |
|
||||
| **Frontend memoization** | ✅ Fixed | `useMemo` in Security.tsx |
|
||||
| **Stable default props** | ✅ Fixed | Constants in LiveLogViewer.tsx |
|
||||
| **Security improvement** | ✅ Complete | Tokens no longer exposed in logs |
|
||||
|
||||
---
|
||||
|
||||
## 6. VERIFICATION STEPS
|
||||
|
||||
After any changes, verify with:
|
||||
|
||||
```bash
|
||||
# 1. Rebuild and restart
|
||||
docker build -t charon:local . && docker compose -f docker-compose.override.yml up -d
|
||||
|
||||
# 2. Check for stable connection (should see ONE connect, no rapid cycling)
|
||||
docker logs charon 2>&1 | grep -i "cerberus.*websocket" | tail -10
|
||||
|
||||
# 3. Browser DevTools → Console
|
||||
# Should see: "Cerberus logs WebSocket connection established"
|
||||
# Should NOT see repeated connection attempts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. CONCLUSION
|
||||
|
||||
**Root Cause:** React reference instability (`{}` creates new object on every render)
|
||||
|
||||
**Solution Applied:** Memoize filter objects to maintain stable references
|
||||
|
||||
**Logic Gap Between Frontend/Backend:** **NO** - Both are correctly aligned
|
||||
|
||||
**Security Enhancement:** WebSocket authentication now uses HttpOnly cookies instead of query parameters, preventing token leakage in logs
|
||||
|
||||
**Current Status:** ✅ All fixes applied and working securely
|
||||
|
||||
---
|
||||
|
||||
# Health Check 401 Auth Failures - Investigation Report
|
||||
|
||||
**Date:** December 16, 2025
|
||||
**Status:** ✅ ANALYZED - NOT A BUG
|
||||
**Severity:** Informational (Log Noise)
|
||||
|
||||
---
|
||||
|
||||
## 1. INVESTIGATION SUMMARY
|
||||
|
||||
### What the User Observed
|
||||
|
||||
The user reported recurring 401 auth failures in Docker logs:
|
||||
|
||||
```
|
||||
01:03:10 AUTH 172.20.0.1 GET / → 401 [401] 133.6ms
|
||||
{ "auth_failure": true }
|
||||
01:04:10 AUTH 172.20.0.1 GET / → 401 [401] 112.9ms
|
||||
{ "auth_failure": true }
|
||||
```
|
||||
|
||||
### Initial Hypothesis vs Reality
|
||||
|
||||
| Hypothesis | Reality |
|
||||
|------------|---------|
|
||||
| Docker health check hitting `/` | ❌ Docker health check hits `/api/v1/health` and works correctly (200) |
|
||||
| Charon backend auth issue | ❌ Charon backend auth is working fine |
|
||||
| Missing health endpoint | ❌ `/api/v1/health` exists and is public |
|
||||
|
||||
---
|
||||
|
||||
## 2. ROOT CAUSE IDENTIFIED
|
||||
|
||||
### The 401s are FROM Plex, NOT Charon
|
||||
|
||||
**Evidence from logs:**
|
||||
|
||||
```json
|
||||
{
|
||||
"host": "plex.hatfieldhosted.com",
|
||||
"uri": "/",
|
||||
"status": 401,
|
||||
"resp_headers": {
|
||||
"X-Plex-Protocol": ["1.0"],
|
||||
"X-Plex-Content-Compressed-Length": ["157"],
|
||||
"Cache-Control": ["no-cache"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The 401 responses contain **Plex-specific headers** (`X-Plex-Protocol`, `X-Plex-Content-Compressed-Length`). This proves:
|
||||
|
||||
1. The request goes through Caddy to **Plex backend**
|
||||
2. **Plex** returns 401 because the request has no auth token
|
||||
3. Caddy logs this as a handled request
|
||||
|
||||
### What's Making These Requests?
|
||||
|
||||
**Charon's Uptime Monitoring Service** (`backend/internal/services/uptime_service.go`)
|
||||
|
||||
The `checkMonitor()` function performs HTTP GET requests to proxied hosts:
|
||||
|
||||
```go
|
||||
case "http", "https":
|
||||
client := http.Client{Timeout: 10 * time.Second}
|
||||
resp, err := client.Get(monitor.URL) // e.g., https://plex.hatfieldhosted.com/
|
||||
```
|
||||
|
||||
Key behaviors:
|
||||
|
||||
- Runs every 60 seconds (`interval: 60`)
|
||||
- Checks the **public URL** of each proxy host
|
||||
- Uses `Go-http-client/2.0` User-Agent (visible in logs)
|
||||
- **Correctly treats 401/403 as "service is up"** (lines 471-474 of uptime_service.go)
|
||||
|
||||
---
|
||||
|
||||
## 3. ARCHITECTURE FLOW
|
||||
|
||||
```text
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Charon Container (172.20.0.1 from Docker's perspective) │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────┐ │
|
||||
│ │ Uptime Service │ │
|
||||
│ │ (Go-http-client/2.0)│ │
|
||||
│ └──────────┬──────────┘ │
|
||||
│ │ GET https://plex.hatfieldhosted.com/ │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────┐ │
|
||||
│ │ Caddy Reverse Proxy │ │
|
||||
│ │ (ports 80/443) │ │
|
||||
│ └──────────┬──────────┘ │
|
||||
│ │ Logs request to access.log │
|
||||
└─────────────┼───────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Plex Container (172.20.0.x) │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ GET / → 401 Unauthorized (no X-Plex-Token) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. DOCKER HEALTH CHECK STATUS
|
||||
|
||||
### ✅ Docker Health Check is WORKING CORRECTLY
|
||||
|
||||
**Configuration** (from all docker-compose files):
|
||||
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
```
|
||||
|
||||
**Evidence:**
|
||||
|
||||
```
|
||||
[GIN] 2025/12/16 - 01:04:45 | 200 | 304.212µs | ::1 | GET "/api/v1/health"
|
||||
```
|
||||
|
||||
- Hits `/api/v1/health` (not `/`)
|
||||
- Returns `200` (not `401`)
|
||||
- Source IP is `::1` (localhost)
|
||||
- Interval is 30s (matches config)
|
||||
|
||||
### Health Endpoint Details
|
||||
|
||||
**Route Registration** ([routes.go#L86](backend/internal/api/routes/routes.go#L86)):
|
||||
|
||||
```go
|
||||
router.GET("/api/v1/health", handlers.HealthHandler)
|
||||
```
|
||||
|
||||
This is registered **before** any auth middleware, making it a public endpoint.
|
||||
|
||||
**Handler Response** ([health_handler.go#L29-L37](backend/internal/api/handlers/health_handler.go#L29-L37)):
|
||||
|
||||
```go
|
||||
func HealthHandler(c *gin.Context) {
|
||||
c.JSON(http.StatusOK, gin.H{
|
||||
"status": "ok",
|
||||
"service": version.Name,
|
||||
"version": version.Version,
|
||||
"git_commit": version.GitCommit,
|
||||
"build_time": version.BuildTime,
|
||||
"internal_ip": getLocalIP(),
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. WHY THIS IS NOT A BUG
|
||||
|
||||
### Uptime Service Design is Correct
|
||||
|
||||
From [uptime_service.go#L471-L474](backend/internal/services/uptime_service.go#L471-L474):
|
||||
|
||||
```go
|
||||
// Accept 2xx, 3xx, and 401/403 (Unauthorized/Forbidden often means the service is up but protected)
|
||||
if (resp.StatusCode >= 200 && resp.StatusCode < 400) || resp.StatusCode == 401 || resp.StatusCode == 403 {
|
||||
success = true
|
||||
msg = fmt.Sprintf("HTTP %d", resp.StatusCode)
|
||||
}
|
||||
```
|
||||
|
||||
**Rationale:** A 401 response proves:
|
||||
|
||||
- The service is running
|
||||
- The network path is functional
|
||||
- The application is responding
|
||||
|
||||
This is industry-standard practice for uptime monitoring of auth-protected services.
|
||||
|
||||
---
|
||||
|
||||
## 6. RECOMMENDATIONS
|
||||
|
||||
### Option A: Do Nothing (Recommended)
|
||||
|
||||
The current behavior is correct:
|
||||
|
||||
- Docker health checks work ✅
|
||||
- Uptime monitoring works ✅
|
||||
- Plex is correctly marked as "up" despite 401 ✅
|
||||
|
||||
The 401s in Caddy access logs are informational noise, not errors.
|
||||
|
||||
### Option B: Reduce Log Verbosity (Optional)
|
||||
|
||||
If the log noise is undesirable, options include:
|
||||
|
||||
1. **Configure Caddy to not log uptime checks:**
|
||||
Add a log filter for `Go-http-client` User-Agent
|
||||
|
||||
2. **Use backend health endpoints:**
|
||||
Some services like Plex have health endpoints (`/identity`, `/status`) that don't require auth
|
||||
|
||||
3. **Add per-monitor health path option:**
|
||||
Extend `UptimeMonitor` model to allow custom health check paths
|
||||
|
||||
### Option C: Already Implemented
|
||||
|
||||
The Uptime Service already logs status changes only, not every check:
|
||||
|
||||
```go
|
||||
if statusChanged {
|
||||
logger.Log().WithFields(map[string]interface{}{
|
||||
"host_name": host.Name,
|
||||
// ...
|
||||
}).Info("Host status changed")
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. SUMMARY TABLE
|
||||
|
||||
| Question | Answer |
|
||||
|----------|--------|
|
||||
| What is making the requests? | Charon's Uptime Service (`Go-http-client/2.0`) |
|
||||
| Should `/` be accessible without auth? | N/A - this is hitting proxied backends, not Charon |
|
||||
| Is there a dedicated health endpoint? | Yes: `/api/v1/health` (public, returns 200) |
|
||||
| Is Docker health check working? | ✅ Yes, every 30s, returns 200 |
|
||||
| Are the 401s a bug? | ❌ No, they're expected from auth-protected backends |
|
||||
| What's the fix? | None needed - working as designed |
|
||||
|
||||
---
|
||||
|
||||
## 8. CONCLUSION
|
||||
|
||||
**The 401s are NOT from Docker health checks or Charon auth failures.**
|
||||
|
||||
They are normal responses from **auth-protected backend services** (like Plex) being monitored by Charon's uptime service. The uptime service correctly interprets 401/403 as "service is up but requires authentication."
|
||||
|
||||
**No fix required.** The system is working as designed.
|
||||
Reference in New Issue
Block a user