# CrowdSec Critical Hotfix Remediation Plan **Date**: December 15, 2025 **Priority**: CRITICAL **Issue Count**: 4 reported issues after 17 failed commit attempts **Affected Components**: Backend (handlers, services), Frontend (pages, hooks, components) --- ## Executive Summary After exhaustive analysis of the CrowdSec functionality across both backend and frontend, I have identified the **root causes** of all four reported issues. The core problem is a **dual-state architecture conflict** where CrowdSec's enabled state is managed by TWO independent systems that don't synchronize properly: 1. **Settings Table** (`security.crowdsec.enabled` and `security.crowdsec.mode`) - Runtime overrides 2. **SecurityConfig Table** (`CrowdSecMode` column) - User configuration Additionally, the Live Log Viewer has a **WebSocket lifecycle bug** and the deprecated mode UI causes state conflicts. --- ## The 4 Reported Issues | # | Issue | Root Cause | Severity | |---|-------|------------|----------| | 1 | CrowdSec card toggle broken - shows "active" but not actually on | Dual-state conflict: `security.crowdsec.mode` overrides `security.crowdsec.enabled` | CRITICAL | | 2 | Live logs show "disconnected" but logs appear; navigation clears logs | WebSocket reconnection lifecycle bug + state not persisted | HIGH | | 3 | Deprecated mode toggle still in UI causing confusion | UI component not removed after deprecation | MEDIUM | | 4 | Enrollment shows "not running" when LAPI initializing | Race condition between process start and LAPI readiness | HIGH | --- ## Current State Analysis ### Backend Data Flow #### 1. SecurityConfig Model **File**: [backend/internal/models/security_config.go](../../backend/internal/models/security_config.go) ```go type SecurityConfig struct { CrowdSecMode string `json:"crowdsec_mode"` // "disabled" or "local" - DEPRECATED Enabled bool `json:"enabled"` // Cerberus master switch // ... } ``` #### 2. GetStatus Handler - THE BUG **File**: [backend/internal/api/handlers/security_handler.go#L75-175](../../backend/internal/api/handlers/security_handler.go#L75-175) The `GetStatus` endpoint has a **three-tier priority chain** that causes the bug: ```go // PRIORITY 1 (highest): Settings table overrides // Line 135-140: Check security.crowdsec.enabled if strings.EqualFold(setting.Value, "true") { crowdSecMode = "local" } else { crowdSecMode = "disabled" } // Line 143-148: THEN check security.crowdsec.mode - THIS OVERRIDES THE ABOVE! setting = struct{ Value string }{} if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.mode").Scan(&setting).Error; err == nil && setting.Value != "" { crowdSecMode = setting.Value // <-- BUG: This can override the enabled check! } ``` **The Bug Flow**: 1. User toggles CrowdSec ON → `security.crowdsec.enabled = "true"` → `crowdSecMode = "local"` ✓ 2. BUT if `security.crowdsec.mode = "disabled"` was previously set (by deprecated UI), it OVERRIDES step 1 3. Final result: `crowdSecMode = "disabled"` even though user just toggled it ON #### 3. CrowdSec Start Handler - INCONSISTENT STATE UPDATE **File**: [backend/internal/api/handlers/crowdsec_handler.go#L184-240](../../backend/internal/api/handlers/crowdsec_handler.go#L184-240) ```go func (h *CrowdsecHandler) Start(c *gin.Context) { // Updates SecurityConfig table cfg.CrowdSecMode = "local" cfg.Enabled = true h.DB.Save(&cfg) // Saves to security_configs table // BUT: Does NOT update settings table! // Missing: h.DB.Create/Update(&models.Setting{Key: "security.crowdsec.enabled", Value: "true"}) } ``` **Problem**: `Start()` updates `SecurityConfig.CrowdSecMode` but the frontend toggle updates `settings.security.crowdsec.enabled`. These are TWO DIFFERENT tables that both affect CrowdSec state. #### 4. Feature Flags Handler **File**: [backend/internal/api/handlers/feature_flags_handler.go](../../backend/internal/api/handlers/feature_flags_handler.go) Only manages THREE flags: - `feature.cerberus.enabled` (Cerberus master switch) - `feature.uptime.enabled` - `feature.crowdsec.console_enrollment` **Missing**: No `feature.crowdsec.enabled`. CrowdSec uses `security.crowdsec.enabled` in settings table, which is NOT a feature flag. ### Frontend Data Flow #### 1. Security.tsx (Cerberus Dashboard) **File**: [frontend/src/pages/Security.tsx#L65-110](../../frontend/src/pages/Security.tsx#L65-110) ```typescript const crowdsecPowerMutation = useMutation({ mutationFn: async (enabled: boolean) => { // Step 1: Update settings table await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool') if (enabled) { // Step 2: Start process (which updates SecurityConfig table) const result = await startCrowdsec() // ... } } }) ``` The mutation updates TWO places: 1. `settings` table via `updateSetting()` → sets `security.crowdsec.enabled` 2. `security_configs` table via `startCrowdsec()` backend → sets `CrowdSecMode` But `GetStatus` reads from BOTH and can get conflicting values. #### 2. CrowdSecConfig.tsx - DEPRECATED MODE TOGGLE **File**: [frontend/src/pages/CrowdSecConfig.tsx#L69-90](../../frontend/src/pages/CrowdSecConfig.tsx#L69-90) ```typescript const updateModeMutation = useMutation({ mutationFn: async (mode: string) => updateSetting('security.crowdsec.mode', mode, 'security', 'string'), // This updates security.crowdsec.mode which OVERRIDES security.crowdsec.enabled! }) ``` **This is the deprecated toggle that should not exist.** It sets `security.crowdsec.mode` which takes precedence over `security.crowdsec.enabled` in `GetStatus`. #### 3. LiveLogViewer.tsx - WEBSOCKET BUGS **File**: [frontend/src/components/LiveLogViewer.tsx#L100-150](../../frontend/src/components/LiveLogViewer.tsx#L100-150) ```typescript useEffect(() => { // Close existing connection if (closeConnectionRef.current) { closeConnectionRef.current(); closeConnectionRef.current = null; } // ... reconnect logic }, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]); // ^^^^^^^^ // BUG: isPaused in dependencies causes reconnection when user just wants to pause! ``` **Problems**: 1. `isPaused` in deps → toggling pause causes WebSocket disconnect/reconnect 2. Navigation away unmounts component → `logs` state is lost 3. `isConnected` is local state → lost on unmount, starts as `false` on remount 4. No reconnection retry logic #### 4. Console Enrollment LAPI Check **File**: [frontend/src/pages/CrowdSecConfig.tsx#L85-120](../../frontend/src/pages/CrowdSecConfig.tsx#L85-120) ```typescript // Wait 3 seconds before first LAPI check const timer = setTimeout(() => { setInitialCheckComplete(true) }, 3000) ``` **Problem**: 3 seconds may not be enough. CrowdSec LAPI typically takes 5-10 seconds to initialize. Users see "not running" error during this window. --- ## Identified Problems ### Problem 1: Dual-State Conflict (Toggle Shows Active But Not Working) **Evidence Chain**: ``` User toggles ON → updateSetting('security.crowdsec.enabled', 'true') → startCrowdsec() → sets SecurityConfig.CrowdSecMode = 'local' User refreshes page → getSecurityStatus() → Reads security.crowdsec.enabled = 'true' → crowdSecMode = 'local' → Reads security.crowdsec.mode (if exists) → OVERRIDES to whatever value If security.crowdsec.mode = 'disabled' (from deprecated UI) → Final: crowdSecMode = 'disabled' ``` **Locations**: - Backend: [security_handler.go#L135-148](../../backend/internal/api/handlers/security_handler.go#L135-148) - Backend: [crowdsec_handler.go#L195-215](../../backend/internal/api/handlers/crowdsec_handler.go#L195-215) - Frontend: [Security.tsx#L65-110](../../frontend/src/pages/Security.tsx#L65-110) ### Problem 2: Live Log Viewer State Issues **Evidence**: - Shows "Disconnected" immediately after page load (initial state = false) - Logs appear because WebSocket connects quickly, but `isConnected` state update races - Navigation away loses all log entries (component state) - Pausing causes reconnection flicker **Location**: [LiveLogViewer.tsx#L100-150](../../frontend/src/components/LiveLogViewer.tsx#L100-150) ### Problem 3: Deprecated Mode Toggle Still Present **Evidence**: CrowdSecConfig.tsx still renders: ```tsx

CrowdSec Mode

handleModeToggle(e.target.checked)} /> {/* Disabled/Local toggle - DEPRECATED */}
``` **Location**: [CrowdSecConfig.tsx#L395-420](../../frontend/src/pages/CrowdSecConfig.tsx#L395-420) ### Problem 4: Enrollment "Not Running" Error **Evidence**: User enables CrowdSec, immediately tries to enroll, sees error because: 1. Process starts (running=true) 2. LAPI takes 5-10s to initialize (lapi_ready=false) 3. Frontend shows "not running" because it checks lapi_ready **Locations**: - Frontend: [CrowdSecConfig.tsx#L85-120](../../frontend/src/pages/CrowdSecConfig.tsx#L85-120) - Backend: [console_enroll.go#L165-190](../../backend/internal/crowdsec/console_enroll.go#L165-190) --- ## Remediation Plan ### Phase 1: Backend Fixes (CRITICAL) #### 1.1 Fix GetStatus Priority Chain **File**: `backend/internal/api/handlers/security_handler.go` **Lines**: 143-148 **Current Code (BUGGY)**: ```go // CrowdSec mode override (AFTER enabled check - causes override bug) setting = struct{ Value string }{} if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.mode").Scan(&setting).Error; err == nil && setting.Value != "" { crowdSecMode = setting.Value } ``` **Fix**: Remove the mode override OR make enabled take precedence: ```go // OPTION A: Remove mode override entirely (recommended) // DELETE lines 143-148 // OPTION B: Make enabled take precedence over mode setting = struct{ Value string }{} if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.mode").Scan(&setting).Error; err == nil && setting.Value != "" { // Only use mode if enabled wasn't explicitly set var enabledSetting struct{ Value string } if h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&enabledSetting).Error != nil || enabledSetting.Value == "" { crowdSecMode = setting.Value } // If enabled was set, ignore deprecated mode setting } ``` #### 1.2 Update Start/Stop to Sync State **File**: `backend/internal/api/handlers/crowdsec_handler.go` **In Start() after line 215**: ```go // Sync settings table (source of truth for UI) if h.DB != nil { settingEnabled := models.Setting{ Key: "security.crowdsec.enabled", Value: "true", Type: "bool", Category: "security", } h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(settingEnabled).FirstOrCreate(&settingEnabled) // Clear deprecated mode setting to prevent conflicts h.DB.Where("key = ?", "security.crowdsec.mode").Delete(&models.Setting{}) } ``` **In Stop() after line 260**: ```go // Sync settings table if h.DB != nil { settingEnabled := models.Setting{ Key: "security.crowdsec.enabled", Value: "false", Type: "bool", Category: "security", } h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(settingEnabled).FirstOrCreate(&settingEnabled) } ``` #### 1.3 Add Deprecation Warning for Mode Setting **File**: `backend/internal/api/handlers/settings_handler.go` Add validation in the update handler: ```go func (h *SettingsHandler) UpdateSetting(c *gin.Context) { // ... existing code ... if setting.Key == "security.crowdsec.mode" { logger.Log().Warn("DEPRECATED: security.crowdsec.mode is deprecated and will be removed. Use security.crowdsec.enabled instead.") } // ... rest of existing code ... } ``` ### Phase 2: Frontend Fixes #### 2.1 Remove Deprecated Mode Toggle **File**: `frontend/src/pages/CrowdSecConfig.tsx` **Remove these sections**: 1. **Lines 69-78** - Remove `updateModeMutation`: ```typescript // DELETE THIS ENTIRE MUTATION const updateModeMutation = useMutation({ mutationFn: async (mode: string) => updateSetting('security.crowdsec.mode', mode, 'security', 'string'), onSuccess: (_data, mode) => { queryClient.invalidateQueries({ queryKey: ['security-status'] }) toast.success(mode === 'disabled' ? 'CrowdSec disabled' : 'CrowdSec set to Local mode') }, onError: (err: unknown) => { const msg = err instanceof Error ? err.message : 'Failed to update mode' toast.error(msg) }, }) ``` 1. **Lines ~395-420** - Remove the Mode Card from render: ```tsx // DELETE THIS ENTIRE CARD

CrowdSec Mode

...

Disabled handleModeToggle(e.target.checked)} /> Local
``` 1. **Replace with informational banner**: ```tsx

CrowdSec is controlled from the Security Dashboard. Use the toggle there to enable or disable CrowdSec protection.

``` #### 2.2 Fix Live Log Viewer **File**: `frontend/src/components/LiveLogViewer.tsx` **Fix 1**: Remove `isPaused` from dependencies (line 148): ```typescript // BEFORE: }, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]); // AFTER: }, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]); ``` **Fix 2**: Use ref for pause state in message handler: ```typescript // Add ref near other refs (around line 70): const isPausedRef = useRef(isPaused); // Sync ref with state (add useEffect around line 95): useEffect(() => { isPausedRef.current = isPaused; }, [isPaused]); // Update message handler (lines 110-120): const handleSecurityMessage = (entry: SecurityLogEntry) => { if (!isPausedRef.current) { // Use ref instead of state const displayEntry = toDisplayFromSecurity(entry); setLogs((prev) => { const updated = [...prev, displayEntry]; return updated.length > maxLogs ? updated.slice(-maxLogs) : updated; }); } }; ``` **Fix 3**: Add reconnection retry logic: ```typescript // Add state for retry (around line 50): const [retryCount, setRetryCount] = useState(0); const maxRetries = 5; const retryDelay = 2000; // 2 seconds base delay // Update connection effect (around line 100): useEffect(() => { // ... existing close logic ... const handleClose = () => { console.log(`${currentMode} log viewer disconnected`); setIsConnected(false); // Schedule retry with exponential backoff if (retryCount < maxRetries) { const delay = retryDelay * Math.pow(1.5, retryCount); setTimeout(() => setRetryCount(r => r + 1), delay); } }; // ... rest of effect ... return () => { if (closeConnectionRef.current) { closeConnectionRef.current(); closeConnectionRef.current = null; } setIsConnected(false); // Reset retry on intentional unmount }; }, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly, retryCount]); // Reset retry count on successful connect: const handleOpen = () => { console.log(`${currentMode} log viewer connected`); setIsConnected(true); setRetryCount(0); // Reset retry counter }; ``` #### 2.3 Improve Enrollment LAPI Messaging **File**: `frontend/src/pages/CrowdSecConfig.tsx` **Fix 1**: Increase initial delay (line 85): ```typescript // BEFORE: }, 3000) // Wait 3 seconds // AFTER: }, 5000) // Wait 5 seconds for LAPI to initialize ``` **Fix 2**: Improve warning messages (around lines 200-250): ```tsx {/* Show LAPI initializing warning when process running but LAPI not ready */} {lapiStatusQuery.data && lapiStatusQuery.data.running && !lapiStatusQuery.data.lapi_ready && initialCheckComplete && (

CrowdSec Local API is initializing...

The CrowdSec process is running but LAPI takes 5-10 seconds to become ready. Console enrollment will be available once LAPI is ready. {lapiStatusQuery.isRefetching && ' Checking status...'}

)} {/* Show not running warning when process not running */} {lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && (

CrowdSec is not running

Enable CrowdSec from the Security Dashboard first. The process typically takes 5-10 seconds to start and LAPI another 5-10 seconds to initialize.

)} ``` ### Phase 3: Cleanup & Testing #### 3.1 Database Cleanup Migration (Optional) Create a one-time migration to remove conflicting settings: ```sql -- Remove deprecated mode setting to prevent conflicts DELETE FROM settings WHERE key = 'security.crowdsec.mode'; ``` #### 3.2 Backend Test Updates Add test cases for: 1. `GetStatus` returns correct enabled state when only `security.crowdsec.enabled` is set 2. `GetStatus` returns correct state when deprecated `security.crowdsec.mode` exists (should be ignored) 3. `Start()` updates `settings` table 4. `Stop()` updates `settings` table #### 3.3 Frontend Test Updates Add test cases for: 1. `LiveLogViewer` doesn't reconnect when pause toggled 2. `LiveLogViewer` retries connection on disconnect 3. `CrowdSecConfig` doesn't render mode toggle --- ## Test Plan ### Manual QA Checklist - [ ] **Toggle Test**: 1. Go to Security Dashboard 2. Toggle CrowdSec ON 3. Verify card shows "Active" 4. Verify `docker exec charon ps aux | grep crowdsec` shows process 5. Toggle CrowdSec OFF 6. Verify card shows "Disabled" 7. Verify process stopped - [ ] **State Persistence Test**: 1. Toggle CrowdSec ON 2. Refresh page 3. Verify toggle still shows ON 4. Check database: `SELECT * FROM settings WHERE key LIKE '%crowdsec%'` - [ ] **Live Logs Test**: 1. Go to Security Dashboard 2. Verify "Connected" status appears 3. Generate some traffic 4. Verify logs appear 5. Click "Pause" - verify NO flicker/reconnect 6. Navigate to another page 7. Navigate back 8. Verify reconnection happens (status goes from Disconnected → Connected) - [ ] **Enrollment Test**: 1. Enable CrowdSec 2. Go to CrowdSecConfig 3. Verify warning shows "LAPI initializing" (not "not running") 4. Wait for LAPI ready 5. Enter enrollment key 6. Click Enroll 7. Verify success - [ ] **Deprecated UI Removed**: 1. Go to CrowdSecConfig page 2. Verify NO "CrowdSec Mode" card with Disabled/Local toggle 3. Verify informational banner points to Security Dashboard ### Integration Test Commands ```bash # Test 1: Backend state consistency # Enable via API curl -X POST http://localhost:8080/api/v1/admin/crowdsec/start # Check settings table sqlite3 data/charon.db "SELECT * FROM settings WHERE key = 'security.crowdsec.enabled'" # Expected: value = "true" # Check status endpoint curl http://localhost:8080/api/v1/security/status | jq '.crowdsec' # Expected: {"mode":"local","enabled":true,...} # Test 2: No deprecated mode conflict sqlite3 data/charon.db "SELECT * FROM settings WHERE key = 'security.crowdsec.mode'" # Expected: No rows (or deprecated warning logged) # Test 3: Disable and verify curl -X POST http://localhost:8080/api/v1/admin/crowdsec/stop curl http://localhost:8080/api/v1/security/status | jq '.crowdsec' # Expected: {"mode":"disabled","enabled":false,...} sqlite3 data/charon.db "SELECT * FROM settings WHERE key = 'security.crowdsec.enabled'" # Expected: value = "false" ``` --- ## Implementation Order | Order | Phase | Task | Priority | Est. Time | |-------|-------|------|----------|-----------| | 1 | 1.1 | Fix GetStatus to ignore deprecated mode | CRITICAL | 15 min | | 2 | 1.2 | Update Start/Stop to sync settings table | CRITICAL | 20 min | | 3 | 2.1 | Remove deprecated mode toggle from UI | HIGH | 15 min | | 4 | 2.2 | Fix LiveLogViewer pause/reconnection | HIGH | 30 min | | 5 | 2.3 | Improve enrollment LAPI messaging | MEDIUM | 15 min | | 6 | 1.3 | Add deprecation warning for mode setting | LOW | 10 min | | 7 | 3.1 | Database cleanup migration | LOW | 10 min | | 8 | 3.2-3.3 | Update tests | MEDIUM | 30 min | **Total Estimated Time**: ~2.5 hours --- ## Success Criteria 1. ✅ Toggling CrowdSec ON shows "Active" AND process is actually running 2. ✅ Toggling CrowdSec OFF shows "Disabled" AND process is stopped 3. ✅ State persists across page refresh 4. ✅ No deprecated mode toggle visible on CrowdSecConfig page 5. ✅ Live logs show "Connected" when WebSocket connects 6. ✅ Pausing logs does NOT cause reconnection 7. ✅ Enrollment shows appropriate LAPI status message 8. ✅ All existing tests pass 9. ✅ No errors in browser console related to CrowdSec --- ## Appendix: File Reference | Issue | Backend Files | Frontend Files | |-------|---------------|----------------| | Toggle Bug | `security_handler.go#L135-148`, `crowdsec_handler.go#L184-265` | `Security.tsx#L65-110` | | Deprecated Mode | `security_handler.go#L143-148` | `CrowdSecConfig.tsx#L69-90, L395-420` | | Live Logs | `cerberus_logs_ws.go` | `LiveLogViewer.tsx#L100-150`, `logs.ts` | | Enrollment | `console_enroll.go#L165-190` | `CrowdSecConfig.tsx#L85-120` |