diff --git a/docs/reports/HOTFIX_CROWDSEC_INTEGRATION_ISSUES.md b/docs/reports/HOTFIX_CROWDSEC_INTEGRATION_ISSUES.md
new file mode 100644
index 00000000..1a38ca5b
--- /dev/null
+++ b/docs/reports/HOTFIX_CROWDSEC_INTEGRATION_ISSUES.md
@@ -0,0 +1,627 @@
+# CrowdSec Integration Issues - Hotfix Plan
+
+**Date:** December 14, 2025
+**Priority:** HOTFIX - Critical
+**Status:** Investigation Complete, Ready for Implementation
+
+## Executive Summary
+
+Three critical issues have been identified in the CrowdSec integration that prevent proper operation:
+
+1. **CrowdSec process not actually running** - Message displays but process isn't started
+2. **Toggle state management broken** - CrowdSec toggle on Cerberus Dashboard won't turn off
+3. **Security log viewer shows wrong logs** - Displays Plex/application logs instead of security logs
+
+## Investigation Findings
+
+### Container Status
+
+```bash
+Container: charon (1cc717562976)
+Status: Up 4 hours (healthy)
+Processes Running:
+ - PID 1: /bin/sh /docker-entrypoint.sh
+ - PID 31: caddy run --config /config/caddy.json
+ - PID 43: /usr/local/bin/dlv exec /app/charon (debugger)
+ - PID 52: /app/charon (main process)
+
+CrowdSec Process: NOT RUNNING ❌
+No PID file found at: /app/data/crowdsec/crowdsec.pid
+```
+
+### Issue #1: CrowdSec Not Running
+
+**Root Cause:**
+- The error message "CrowdSec is not running" is **accurate**
+- `crowdsec` binary process is not executing in the container
+- PID file `/app/data/crowdsec/crowdsec.pid` does not exist
+- Process detection in `crowdsec_exec.go:Status()` correctly returns `running=false`
+
+**Code Path:**
+```
+backend/internal/api/handlers/crowdsec_exec.go:85
+├── Status() checks PID file at: filepath.Join(configDir, "crowdsec.pid")
+├── PID file missing → returns (running=false, pid=0, err=nil)
+└── Frontend displays: "CrowdSec is not running"
+```
+
+**Why CrowdSec Isn't Starting:**
+1. `ReconcileCrowdSecOnStartup()` runs at container boot (routes.go:360)
+2. Checks `SecurityConfig` table for `crowdsec_mode = "local"`
+3. **BUT**: The mode might not be set to "local" or the process start is failing silently
+4. No error logs visible in container logs about CrowdSec startup failures
+
+**Files Involved:**
+- `backend/internal/services/crowdsec_startup.go` - Reconciliation logic
+- `backend/internal/api/handlers/crowdsec_exec.go` - Process executor
+- `backend/internal/api/handlers/crowdsec_handler.go` - Status endpoint
+
+---
+
+### Issue #2: Toggle Won't Turn Off
+
+**Root Cause:**
+Frontend state management has optimistic updates that don't properly reconcile with backend state.
+
+**Code Path:**
+```typescript
+frontend/src/pages/Security.tsx:94-113 (crowdsecPowerMutation)
+├── onMutate: Optimistically sets crowdsec.enabled = new value
+├── mutationFn: Calls updateSetting() then startCrowdsec() or stopCrowdsec()
+├── onError: Reverts optimistic update but may not fully sync
+└── onSuccess: Calls fetchCrowdsecStatus() but state may be stale
+```
+
+**The Problem:**
+```typescript
+// Optimistic update sets enabled immediately
+queryClient.setQueryData(['security-status'], (old) => {
+ copy.crowdsec = { ...copy.crowdsec, enabled } // ← State updated BEFORE API call
+})
+
+// If API fails or times out, toggle appears stuck
+```
+
+**Why Toggle Appears Stuck:**
+1. User clicks toggle → Frontend immediately updates UI to "enabled"
+2. Backend API is called to start CrowdSec
+3. CrowdSec process fails to start (see Issue #1)
+4. API returns success (because the *setting* was updated)
+5. Frontend thinks CrowdSec is enabled, but `Status()` API says `running=false`
+6. Toggle now in inconsistent state - shows "on" but status says "not running"
+
+**Files Involved:**
+- `frontend/src/pages/Security.tsx:94-136` - Toggle mutation logic
+- `frontend/src/pages/CrowdSecConfig.tsx:105` - Status check
+- `backend/internal/api/handlers/security_handler.go:60-175` - GetStatus priority chain
+
+---
+
+### Issue #3: Security Log Viewer Shows Wrong Logs
+
+**Root Cause:**
+The `LiveLogViewer` component connects to the correct `/api/v1/cerberus/logs/ws` endpoint, but the `LogWatcher` service is reading from `/var/log/caddy/access.log` which may not exist or may contain the wrong logs.
+
+**Code Path:**
+```
+frontend/src/pages/Security.tsx:411
+├──
+└── Connects to: ws://localhost:8080/api/v1/cerberus/logs/ws
+
+backend/internal/api/routes/routes.go:362-390
+├── LogWatcher initialized with: accessLogPath = "/var/log/caddy/access.log"
+├── File exists check: Creates empty file if missing
+└── Starts tailing: services.LogWatcher.tailFile()
+
+backend/internal/services/log_watcher.go:139-186
+├── Opens /var/log/caddy/access.log
+├── Seeks to end of file
+└── Reads new lines, parses as Caddy JSON logs
+```
+
+**The Problem:**
+The log file path `/var/log/caddy/access.log` is hardcoded and may not match where Caddy is actually writing logs. The user reports seeing Plex logs, which suggests:
+
+1. **Wrong log file** - The LogWatcher might be reading an old/wrong log file
+2. **Parsing issue** - Caddy logs aren't properly formatted as expected
+3. **Source detection broken** - Logs are being classified as "normal" instead of security events
+
+**Verification Needed:**
+```bash
+# Check where Caddy is actually logging
+docker exec charon cat /config/caddy.json | jq '.logging'
+
+# Check if the access.log file exists and contains recent entries
+docker exec charon tail -50 /var/log/caddy/access.log
+
+# Check Caddy data directory
+docker exec charon ls -la /app/data/caddy/
+```
+
+**Files Involved:**
+- `backend/internal/api/routes/routes.go:366` - accessLogPath definition
+- `backend/internal/services/log_watcher.go` - File tailing and parsing
+- `backend/internal/api/handlers/cerberus_logs_ws.go` - WebSocket handler
+- `frontend/src/components/LiveLogViewer.tsx` - Frontend component
+
+---
+
+## Root Cause Summary
+
+| Issue | Root Cause | Impact |
+|-------|------------|--------|
+| CrowdSec not running | Process start fails silently OR mode not set to "local" in DB | User cannot use CrowdSec features |
+| Toggle stuck | Optimistic UI updates + API success despite process failure | Confusing UX, user can't disable |
+| Wrong logs displayed | LogWatcher reading wrong file OR parsing application logs | User can't monitor security events |
+
+---
+
+## Proposed Fixes
+
+### Fix #1: CrowdSec Process Start Issues
+
+**Change X → Y Impact:**
+
+```diff
+File: backend/internal/services/crowdsec_startup.go
+
+IF Change: Add detailed logging + retry mechanism
+THEN Impact:
+ ✓ Startup failures become visible in logs
+ ✓ Transient failures (DB not ready) are retried
+ ✓ CrowdSec has better chance of starting on boot
+ ⚠ Retry logic could delay boot by a few seconds
+
+IF Change: Validate binPath exists before calling Start()
+THEN Impact:
+ ✓ Prevent calling Start() if crowdsec binary missing
+ ✓ Clear error message to user
+ ⚠ Additional filesystem check on every reconcile
+```
+
+**Implementation:**
+
+```go
+// backend/internal/services/crowdsec_startup.go
+
+func ReconcileCrowdSecOnStartup(db *gorm.DB, executor CrowdsecProcessManager, binPath, dataDir string) {
+ logger.Log().Info("Starting CrowdSec reconciliation on startup")
+
+ // ... existing checks ...
+
+ // VALIDATE: Ensure binary exists
+ if _, err := os.Stat(binPath); os.IsNotExist(err) {
+ logger.Log().WithField("path", binPath).Error("CrowdSec binary not found, cannot start")
+ return
+ }
+
+ // VALIDATE: Ensure config directory exists
+ if _, err := os.Stat(dataDir); os.IsNotExist(err) {
+ logger.Log().WithField("path", dataDir).Error("CrowdSec config directory not found, cannot start")
+ return
+ }
+
+ // ... existing status check ...
+
+ // START with better error handling
+ logger.Log().WithFields(logrus.Fields{
+ "bin_path": binPath,
+ "data_dir": dataDir,
+ }).Info("Attempting to start CrowdSec process")
+
+ startCtx, startCancel := context.WithTimeout(context.Background(), 30*time.Second)
+ defer startCancel()
+
+ newPid, err := executor.Start(startCtx, binPath, dataDir)
+ if err != nil {
+ logger.Log().WithError(err).WithFields(logrus.Fields{
+ "bin_path": binPath,
+ "data_dir": dataDir,
+ }).Error("CrowdSec reconciliation: FAILED to start CrowdSec - check binary path and config")
+ return
+ }
+
+ // VERIFY: Wait for PID file to be written
+ time.Sleep(2 * time.Second)
+ running, pid, err := executor.Status(ctx, dataDir)
+ if err != nil || !running {
+ logger.Log().WithFields(logrus.Fields{
+ "expected_pid": newPid,
+ "actual_pid": pid,
+ "running": running,
+ }).Error("CrowdSec process started but not running - process may have crashed")
+ return
+ }
+
+ logger.Log().WithField("pid", newPid).Info("CrowdSec reconciliation: successfully started and verified CrowdSec")
+}
+```
+
+---
+
+### Fix #2: Toggle State Management
+
+**Change X → Y Impact:**
+
+```diff
+File: frontend/src/pages/Security.tsx
+
+IF Change: Remove optimistic updates, wait for API confirmation
+THEN Impact:
+ ✓ Toggle always reflects actual backend state
+ ✓ No "stuck toggle" UX issue
+ ⚠ Toggle feels slightly slower (100-200ms delay)
+ ⚠ User must wait for API response before seeing change
+
+IF Change: Add explicit error handling + status reconciliation
+THEN Impact:
+ ✓ Errors are clearly shown to user
+ ✓ Toggle reverts on failure
+ ✓ Status check after mutation ensures consistency
+ ⚠ Additional API call overhead
+```
+
+**Implementation:**
+
+```typescript
+// frontend/src/pages/Security.tsx
+
+const crowdsecPowerMutation = useMutation({
+ mutationFn: async (enabled: boolean) => {
+ // Update setting first
+ await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool')
+
+ if (enabled) {
+ toast.info('Starting CrowdSec... This may take up to 30 seconds')
+ const result = await startCrowdsec()
+
+ // VERIFY: Check if it actually started
+ const status = await statusCrowdsec()
+ if (!status.running) {
+ throw new Error('CrowdSec setting enabled but process failed to start. Check server logs.')
+ }
+
+ return result
+ } else {
+ await stopCrowdsec()
+
+ // VERIFY: Check if it actually stopped
+ const status = await statusCrowdsec()
+ if (status.running) {
+ throw new Error('CrowdSec setting disabled but process still running. Check server logs.')
+ }
+
+ return { enabled: false }
+ }
+ },
+
+ // REMOVE OPTIMISTIC UPDATES
+ onMutate: undefined,
+
+ onError: (err: unknown, enabled: boolean) => {
+ const msg = err instanceof Error ? err.message : String(err)
+ toast.error(enabled ? `Failed to start CrowdSec: ${msg}` : `Failed to stop CrowdSec: ${msg}`)
+
+ // Force refresh status from backend
+ queryClient.invalidateQueries({ queryKey: ['security-status'] })
+ fetchCrowdsecStatus()
+ },
+
+ onSuccess: async () => {
+ // Refresh all related queries to ensure consistency
+ await Promise.all([
+ queryClient.invalidateQueries({ queryKey: ['security-status'] }),
+ queryClient.invalidateQueries({ queryKey: ['settings'] }),
+ fetchCrowdsecStatus(),
+ ])
+
+ toast.success('CrowdSec status updated successfully')
+ },
+})
+```
+
+---
+
+### Fix #3: Security Log Viewer
+
+**Change X → Y Impact:**
+
+```diff
+File: backend/internal/api/routes/routes.go + backend/internal/services/log_watcher.go
+
+IF Change: Make log path configurable + validate it exists
+THEN Impact:
+ ✓ Can specify correct log file via env var
+ ✓ Graceful fallback if file doesn't exist
+ ✓ Clear error logging about file path issues
+ ⚠ Requires updating deployment/env vars
+
+IF Change: Improve log parsing + source detection
+THEN Impact:
+ ✓ Better classification of security events
+ ✓ Clearer distinction between app logs and security logs
+ ⚠ More CPU overhead for regex matching
+```
+
+**Implementation Plan:**
+
+1. **Verify Current Log Configuration:**
+```bash
+# Check Caddy config for logging directive
+docker exec charon cat /config/caddy.json | jq '.logging.logs'
+
+# Find where Caddy is actually writing logs
+docker exec charon find /app/data /var/log -name "*.log" -type f 2>/dev/null
+
+# Check if access.log has recent entries
+docker exec charon tail -20 /var/log/caddy/access.log
+```
+
+2. **Add Log Path Validation:**
+```go
+// backend/internal/api/routes/routes.go:366
+
+accessLogPath := os.Getenv("CHARON_CADDY_ACCESS_LOG")
+if accessLogPath == "" {
+ // Try multiple paths in order of preference
+ candidatePaths := []string{
+ "/var/log/caddy/access.log",
+ filepath.Join(cfg.CaddyConfigDir, "logs", "access.log"),
+ filepath.Join(dataDir, "logs", "access.log"),
+ }
+
+ for _, path := range candidatePaths {
+ if _, err := os.Stat(path); err == nil {
+ accessLogPath = path
+ logger.Log().WithField("path", path).Info("Found existing Caddy access log")
+ break
+ }
+ }
+
+ // If none exist, use default and create it
+ if accessLogPath == "" {
+ accessLogPath = "/var/log/caddy/access.log"
+ logger.Log().WithField("path", accessLogPath).Warn("No existing access log found, will create at default path")
+ }
+}
+
+logger.Log().WithField("path", accessLogPath).Info("Initializing LogWatcher with access log path")
+```
+
+3. **Improve Source Detection:**
+```go
+// backend/internal/services/log_watcher.go:221
+
+func (w *LogWatcher) detectSecurityEvent(entry *models.SecurityLogEntry, caddyLog *models.CaddyAccessLog) {
+ // Enhanced logger name checking
+ loggerLower := strings.ToLower(caddyLog.Logger)
+
+ // Check for WAF/Coraza
+ if caddyLog.Status == 403 && (
+ strings.Contains(loggerLower, "waf") ||
+ strings.Contains(loggerLower, "coraza") ||
+ hasHeader(caddyLog.RespHeaders, "X-Coraza-Id")) {
+ entry.Blocked = true
+ entry.Source = "waf"
+ entry.Level = "warn"
+ entry.BlockReason = "WAF rule triggered"
+ // ... extract rule ID ...
+ return
+ }
+
+ // Check for CrowdSec
+ if caddyLog.Status == 403 && (
+ strings.Contains(loggerLower, "crowdsec") ||
+ strings.Contains(loggerLower, "bouncer") ||
+ hasHeader(caddyLog.RespHeaders, "X-Crowdsec-Decision")) {
+ entry.Blocked = true
+ entry.Source = "crowdsec"
+ entry.Level = "warn"
+ entry.BlockReason = "CrowdSec decision"
+ return
+ }
+
+ // Check for ACL
+ if caddyLog.Status == 403 && (
+ strings.Contains(loggerLower, "acl") ||
+ hasHeader(caddyLog.RespHeaders, "X-Acl-Denied")) {
+ entry.Blocked = true
+ entry.Source = "acl"
+ entry.Level = "warn"
+ entry.BlockReason = "Access list denied"
+ return
+ }
+
+ // Check for rate limiting
+ if caddyLog.Status == 429 {
+ entry.Blocked = true
+ entry.Source = "ratelimit"
+ entry.Level = "warn"
+ entry.BlockReason = "Rate limit exceeded"
+ // ... extract rate limit headers ...
+ return
+ }
+
+ // If it's a proxy log (reverse_proxy logger), mark as normal traffic
+ if strings.Contains(loggerLower, "reverse_proxy") ||
+ strings.Contains(loggerLower, "access_log") {
+ entry.Source = "normal"
+ entry.Blocked = false
+ // Don't set level to warn for successful requests
+ if caddyLog.Status < 400 {
+ entry.Level = "info"
+ }
+ return
+ }
+
+ // Default for unclassified 403s
+ if caddyLog.Status == 403 {
+ entry.Blocked = true
+ entry.Source = "cerberus"
+ entry.Level = "warn"
+ entry.BlockReason = "Access denied"
+ }
+}
+```
+
+---
+
+## Testing Plan
+
+### Pre-Checks
+```bash
+# 1. Verify container is running
+docker ps | grep charon
+
+# 2. Check if crowdsec binary exists
+docker exec charon which crowdsec
+docker exec charon ls -la /usr/bin/crowdsec # Or wherever it's installed
+
+# 3. Check database config
+docker exec charon cat /app/data/charon.db # Would need sqlite3 or Go query
+
+# 4. Check Caddy log configuration
+docker exec charon cat /config/caddy.json | jq '.logging'
+
+# 5. Find actual log files
+docker exec charon find /var/log /app/data -name "*.log" -type f 2>/dev/null
+```
+
+### Test Scenario 1: CrowdSec Startup
+```bash
+# Given: Container restarts
+docker restart charon
+
+# When: Container boots
+# Then:
+# - Check logs for CrowdSec reconciliation messages
+# - Verify PID file created: /app/data/crowdsec/crowdsec.pid
+# - Verify process running: docker exec charon ps aux | grep crowdsec
+# - Verify status API returns running=true
+
+docker logs charon --tail 100 | grep -i "crowdsec"
+docker exec charon ps aux | grep crowdsec
+docker exec charon ls -la /app/data/crowdsec/crowdsec.pid
+```
+
+### Test Scenario 2: Toggle Behavior
+```bash
+# Given: CrowdSec is running
+# When: User clicks toggle to disable
+# Then:
+# - Frontend shows loading state
+# - API call succeeds
+# - Process stops (no crowdsec in ps)
+# - PID file removed
+# - Toggle reflects OFF state
+# - Status API returns running=false
+
+# When: User clicks toggle to enable
+# Then:
+# - Frontend shows loading state
+# - API call succeeds
+# - Process starts
+# - PID file created
+# - Toggle reflects ON state
+# - Status API returns running=true
+```
+
+### Test Scenario 3: Security Log Viewer
+```bash
+# Given: CrowdSec is enabled and blocking traffic
+# When: User opens Cerberus Dashboard
+# Then:
+# - WebSocket connects successfully (check browser console)
+# - Logs appear in real-time
+# - Blocked requests show with red indicator
+# - Source badges show correct module (crowdsec, waf, etc.)
+
+# Test blocked request:
+curl -H "User-Agent: BadBot" https://your-charon-instance.com
+# Should see blocked log entry in dashboard
+```
+
+---
+
+## Implementation Order
+
+1. **Phase 1: Diagnostics** (15 minutes)
+ - Run all pre-checks
+ - Document actual state of system
+ - Identify which issue is the primary blocker
+
+2. **Phase 2: CrowdSec Startup** (30 minutes)
+ - Implement enhanced logging in `crowdsec_startup.go`
+ - Add binary/config validation
+ - Test container restart
+
+3. **Phase 3: Toggle Fix** (20 minutes)
+ - Remove optimistic updates from `Security.tsx`
+ - Add status verification
+ - Test toggle on/off cycle
+
+4. **Phase 4: Log Viewer** (30 minutes)
+ - Verify log file path
+ - Implement log path detection
+ - Improve source detection
+ - Test with actual traffic
+
+5. **Phase 5: Integration Testing** (30 minutes)
+ - Full end-to-end test
+ - Verify all three issues resolved
+ - Check for regressions
+
+**Total Estimated Time:** 2 hours
+
+---
+
+## Success Criteria
+
+✅ **CrowdSec Running:**
+- `docker exec charon ps aux | grep crowdsec` shows running process
+- PID file exists at `/app/data/crowdsec/crowdsec.pid`
+- `/api/v1/admin/crowdsec/status` returns `{"running": true, "pid": }`
+
+✅ **Toggle Working:**
+- Toggle can be turned on and off without getting stuck
+- UI state matches backend process state
+- Clear error messages if operations fail
+
+✅ **Logs Correct:**
+- Security log viewer shows Caddy access logs
+- Blocked requests appear with proper indicators
+- Source badges correctly identify security module
+- WebSocket stays connected
+
+---
+
+## Rollback Plan
+
+If hotfix causes issues:
+
+1. **Revert Commits:**
+```bash
+git revert HEAD~3..HEAD # Revert last 3 commits
+git push origin feature/beta-release
+```
+
+2. **Restart Container:**
+```bash
+docker restart charon
+```
+
+3. **Verify Basic Functionality:**
+- Proxy hosts still work
+- SSL still works
+- No new errors in logs
+
+---
+
+## Notes for QA
+
+- Test on clean container (no previous CrowdSec state)
+- Test with existing CrowdSec config
+- Test rapid toggle on/off cycles
+- Monitor container logs during testing
+- Check browser console for WebSocket errors
+- Verify memory usage doesn't spike (log file tailing)