# CrowdSec Critical Hotfix Remediation Plan
**Date**: December 15, 2025
**Priority**: CRITICAL
**Issue Count**: 4 reported issues after 17 failed commit attempts
**Affected Components**: Backend (handlers, services), Frontend (pages, hooks, components)
---
## Executive Summary
After exhaustive analysis of the CrowdSec functionality across both backend and frontend, I have identified the **root causes** of all four reported issues. The core problem is a **dual-state architecture conflict** where CrowdSec's enabled state is managed by TWO independent systems that don't synchronize properly:
1. **Settings Table** (`security.crowdsec.enabled` and `security.crowdsec.mode`) - Runtime overrides
2. **SecurityConfig Table** (`CrowdSecMode` column) - User configuration
Additionally, the Live Log Viewer has a **WebSocket lifecycle bug** and the deprecated mode UI causes state conflicts.
---
## The 4 Reported Issues
| # | Issue | Root Cause | Severity |
|---|-------|------------|----------|
| 1 | CrowdSec card toggle broken - shows "active" but not actually on | Dual-state conflict: `security.crowdsec.mode` overrides `security.crowdsec.enabled` | CRITICAL |
| 2 | Live logs show "disconnected" but logs appear; navigation clears logs | WebSocket reconnection lifecycle bug + state not persisted | HIGH |
| 3 | Deprecated mode toggle still in UI causing confusion | UI component not removed after deprecation | MEDIUM |
| 4 | Enrollment shows "not running" when LAPI initializing | Race condition between process start and LAPI readiness | HIGH |
---
## Current State Analysis
### Backend Data Flow
#### 1. SecurityConfig Model
**File**: [backend/internal/models/security_config.go](../../backend/internal/models/security_config.go)
```go
type SecurityConfig struct {
CrowdSecMode string `json:"crowdsec_mode"` // "disabled" or "local" - DEPRECATED
Enabled bool `json:"enabled"` // Cerberus master switch
// ...
}
```
#### 2. GetStatus Handler - THE BUG
**File**: [backend/internal/api/handlers/security_handler.go#L75-175](../../backend/internal/api/handlers/security_handler.go#L75-175)
The `GetStatus` endpoint has a **three-tier priority chain** that causes the bug:
```go
// PRIORITY 1 (highest): Settings table overrides
// Line 135-140: Check security.crowdsec.enabled
if strings.EqualFold(setting.Value, "true") {
crowdSecMode = "local"
} else {
crowdSecMode = "disabled"
}
// Line 143-148: THEN check security.crowdsec.mode - THIS OVERRIDES THE ABOVE!
setting = struct{ Value string }{}
if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.mode").Scan(&setting).Error; err == nil && setting.Value != "" {
crowdSecMode = setting.Value // <-- BUG: This can override the enabled check!
}
```
**The Bug Flow**:
1. User toggles CrowdSec ON → `security.crowdsec.enabled = "true"` → `crowdSecMode = "local"` ✓
2. BUT if `security.crowdsec.mode = "disabled"` was previously set (by deprecated UI), it OVERRIDES step 1
3. Final result: `crowdSecMode = "disabled"` even though user just toggled it ON
#### 3. CrowdSec Start Handler - INCONSISTENT STATE UPDATE
**File**: [backend/internal/api/handlers/crowdsec_handler.go#L184-240](../../backend/internal/api/handlers/crowdsec_handler.go#L184-240)
```go
func (h *CrowdsecHandler) Start(c *gin.Context) {
// Updates SecurityConfig table
cfg.CrowdSecMode = "local"
cfg.Enabled = true
h.DB.Save(&cfg) // Saves to security_configs table
// BUT: Does NOT update settings table!
// Missing: h.DB.Create/Update(&models.Setting{Key: "security.crowdsec.enabled", Value: "true"})
}
```
**Problem**: `Start()` updates `SecurityConfig.CrowdSecMode` but the frontend toggle updates `settings.security.crowdsec.enabled`. These are TWO DIFFERENT tables that both affect CrowdSec state.
#### 4. Feature Flags Handler
**File**: [backend/internal/api/handlers/feature_flags_handler.go](../../backend/internal/api/handlers/feature_flags_handler.go)
Only manages THREE flags:
- `feature.cerberus.enabled` (Cerberus master switch)
- `feature.uptime.enabled`
- `feature.crowdsec.console_enrollment`
**Missing**: No `feature.crowdsec.enabled`. CrowdSec uses `security.crowdsec.enabled` in settings table, which is NOT a feature flag.
### Frontend Data Flow
#### 1. Security.tsx (Cerberus Dashboard)
**File**: [frontend/src/pages/Security.tsx#L65-110](../../frontend/src/pages/Security.tsx#L65-110)
```typescript
const crowdsecPowerMutation = useMutation({
mutationFn: async (enabled: boolean) => {
// Step 1: Update settings table
await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool')
if (enabled) {
// Step 2: Start process (which updates SecurityConfig table)
const result = await startCrowdsec()
// ...
}
}
})
```
The mutation updates TWO places:
1. `settings` table via `updateSetting()` → sets `security.crowdsec.enabled`
2. `security_configs` table via `startCrowdsec()` backend → sets `CrowdSecMode`
But `GetStatus` reads from BOTH and can get conflicting values.
#### 2. CrowdSecConfig.tsx - DEPRECATED MODE TOGGLE
**File**: [frontend/src/pages/CrowdSecConfig.tsx#L69-90](../../frontend/src/pages/CrowdSecConfig.tsx#L69-90)
```typescript
const updateModeMutation = useMutation({
mutationFn: async (mode: string) => updateSetting('security.crowdsec.mode', mode, 'security', 'string'),
// This updates security.crowdsec.mode which OVERRIDES security.crowdsec.enabled!
})
```
**This is the deprecated toggle that should not exist.** It sets `security.crowdsec.mode` which takes precedence over `security.crowdsec.enabled` in `GetStatus`.
#### 3. LiveLogViewer.tsx - WEBSOCKET BUGS
**File**: [frontend/src/components/LiveLogViewer.tsx#L100-150](../../frontend/src/components/LiveLogViewer.tsx#L100-150)
```typescript
useEffect(() => {
// Close existing connection
if (closeConnectionRef.current) {
closeConnectionRef.current();
closeConnectionRef.current = null;
}
// ... reconnect logic
}, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]);
// ^^^^^^^^
// BUG: isPaused in dependencies causes reconnection when user just wants to pause!
```
**Problems**:
1. `isPaused` in deps → toggling pause causes WebSocket disconnect/reconnect
2. Navigation away unmounts component → `logs` state is lost
3. `isConnected` is local state → lost on unmount, starts as `false` on remount
4. No reconnection retry logic
#### 4. Console Enrollment LAPI Check
**File**: [frontend/src/pages/CrowdSecConfig.tsx#L85-120](../../frontend/src/pages/CrowdSecConfig.tsx#L85-120)
```typescript
// Wait 3 seconds before first LAPI check
const timer = setTimeout(() => {
setInitialCheckComplete(true)
}, 3000)
```
**Problem**: 3 seconds may not be enough. CrowdSec LAPI typically takes 5-10 seconds to initialize. Users see "not running" error during this window.
---
## Identified Problems
### Problem 1: Dual-State Conflict (Toggle Shows Active But Not Working)
**Evidence Chain**:
```
User toggles ON → updateSetting('security.crowdsec.enabled', 'true')
→ startCrowdsec() → sets SecurityConfig.CrowdSecMode = 'local'
User refreshes page → getSecurityStatus()
→ Reads security.crowdsec.enabled = 'true' → crowdSecMode = 'local'
→ Reads security.crowdsec.mode (if exists) → OVERRIDES to whatever value
If security.crowdsec.mode = 'disabled' (from deprecated UI) → Final: crowdSecMode = 'disabled'
```
**Locations**:
- Backend: [security_handler.go#L135-148](../../backend/internal/api/handlers/security_handler.go#L135-148)
- Backend: [crowdsec_handler.go#L195-215](../../backend/internal/api/handlers/crowdsec_handler.go#L195-215)
- Frontend: [Security.tsx#L65-110](../../frontend/src/pages/Security.tsx#L65-110)
### Problem 2: Live Log Viewer State Issues
**Evidence**:
- Shows "Disconnected" immediately after page load (initial state = false)
- Logs appear because WebSocket connects quickly, but `isConnected` state update races
- Navigation away loses all log entries (component state)
- Pausing causes reconnection flicker
**Location**: [LiveLogViewer.tsx#L100-150](../../frontend/src/components/LiveLogViewer.tsx#L100-150)
### Problem 3: Deprecated Mode Toggle Still Present
**Evidence**: CrowdSecConfig.tsx still renders:
```tsx
CrowdSec Mode
handleModeToggle(e.target.checked)} />
{/* Disabled/Local toggle - DEPRECATED */}
```
**Location**: [CrowdSecConfig.tsx#L395-420](../../frontend/src/pages/CrowdSecConfig.tsx#L395-420)
### Problem 4: Enrollment "Not Running" Error
**Evidence**: User enables CrowdSec, immediately tries to enroll, sees error because:
1. Process starts (running=true)
2. LAPI takes 5-10s to initialize (lapi_ready=false)
3. Frontend shows "not running" because it checks lapi_ready
**Locations**:
- Frontend: [CrowdSecConfig.tsx#L85-120](../../frontend/src/pages/CrowdSecConfig.tsx#L85-120)
- Backend: [console_enroll.go#L165-190](../../backend/internal/crowdsec/console_enroll.go#L165-190)
---
## Remediation Plan
### Phase 1: Backend Fixes (CRITICAL)
#### 1.1 Fix GetStatus Priority Chain
**File**: `backend/internal/api/handlers/security_handler.go`
**Lines**: 143-148
**Current Code (BUGGY)**:
```go
// CrowdSec mode override (AFTER enabled check - causes override bug)
setting = struct{ Value string }{}
if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.mode").Scan(&setting).Error; err == nil && setting.Value != "" {
crowdSecMode = setting.Value
}
```
**Fix**: Remove the mode override OR make enabled take precedence:
```go
// OPTION A: Remove mode override entirely (recommended)
// DELETE lines 143-148
// OPTION B: Make enabled take precedence over mode
setting = struct{ Value string }{}
if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.mode").Scan(&setting).Error; err == nil && setting.Value != "" {
// Only use mode if enabled wasn't explicitly set
var enabledSetting struct{ Value string }
if h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&enabledSetting).Error != nil || enabledSetting.Value == "" {
crowdSecMode = setting.Value
}
// If enabled was set, ignore deprecated mode setting
}
```
#### 1.2 Update Start/Stop to Sync State
**File**: `backend/internal/api/handlers/crowdsec_handler.go`
**In Start() after line 215**:
```go
// Sync settings table (source of truth for UI)
if h.DB != nil {
settingEnabled := models.Setting{
Key: "security.crowdsec.enabled",
Value: "true",
Type: "bool",
Category: "security",
}
h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(settingEnabled).FirstOrCreate(&settingEnabled)
// Clear deprecated mode setting to prevent conflicts
h.DB.Where("key = ?", "security.crowdsec.mode").Delete(&models.Setting{})
}
```
**In Stop() after line 260**:
```go
// Sync settings table
if h.DB != nil {
settingEnabled := models.Setting{
Key: "security.crowdsec.enabled",
Value: "false",
Type: "bool",
Category: "security",
}
h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(settingEnabled).FirstOrCreate(&settingEnabled)
}
```
#### 1.3 Add Deprecation Warning for Mode Setting
**File**: `backend/internal/api/handlers/settings_handler.go`
Add validation in the update handler:
```go
func (h *SettingsHandler) UpdateSetting(c *gin.Context) {
// ... existing code ...
if setting.Key == "security.crowdsec.mode" {
logger.Log().Warn("DEPRECATED: security.crowdsec.mode is deprecated and will be removed. Use security.crowdsec.enabled instead.")
}
// ... rest of existing code ...
}
```
### Phase 2: Frontend Fixes
#### 2.1 Remove Deprecated Mode Toggle
**File**: `frontend/src/pages/CrowdSecConfig.tsx`
**Remove these sections**:
1. **Lines 69-78** - Remove `updateModeMutation`:
```typescript
// DELETE THIS ENTIRE MUTATION
const updateModeMutation = useMutation({
mutationFn: async (mode: string) => updateSetting('security.crowdsec.mode', mode, 'security', 'string'),
onSuccess: (_data, mode) => {
queryClient.invalidateQueries({ queryKey: ['security-status'] })
toast.success(mode === 'disabled' ? 'CrowdSec disabled' : 'CrowdSec set to Local mode')
},
onError: (err: unknown) => {
const msg = err instanceof Error ? err.message : 'Failed to update mode'
toast.error(msg)
},
})
```
1. **Lines ~395-420** - Remove the Mode Card from render:
```tsx
// DELETE THIS ENTIRE CARD
CrowdSec Mode
...
Disabled handleModeToggle(e.target.checked)} />
Local
```
1. **Replace with informational banner**:
```tsx
CrowdSec is controlled from the Security Dashboard.
Use the toggle there to enable or disable CrowdSec protection.
```
#### 2.2 Fix Live Log Viewer
**File**: `frontend/src/components/LiveLogViewer.tsx`
**Fix 1**: Remove `isPaused` from dependencies (line 148):
```typescript
// BEFORE:
}, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]);
// AFTER:
}, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]);
```
**Fix 2**: Use ref for pause state in message handler:
```typescript
// Add ref near other refs (around line 70):
const isPausedRef = useRef(isPaused);
// Sync ref with state (add useEffect around line 95):
useEffect(() => {
isPausedRef.current = isPaused;
}, [isPaused]);
// Update message handler (lines 110-120):
const handleSecurityMessage = (entry: SecurityLogEntry) => {
if (!isPausedRef.current) { // Use ref instead of state
const displayEntry = toDisplayFromSecurity(entry);
setLogs((prev) => {
const updated = [...prev, displayEntry];
return updated.length > maxLogs ? updated.slice(-maxLogs) : updated;
});
}
};
```
**Fix 3**: Add reconnection retry logic:
```typescript
// Add state for retry (around line 50):
const [retryCount, setRetryCount] = useState(0);
const maxRetries = 5;
const retryDelay = 2000; // 2 seconds base delay
// Update connection effect (around line 100):
useEffect(() => {
// ... existing close logic ...
const handleClose = () => {
console.log(`${currentMode} log viewer disconnected`);
setIsConnected(false);
// Schedule retry with exponential backoff
if (retryCount < maxRetries) {
const delay = retryDelay * Math.pow(1.5, retryCount);
setTimeout(() => setRetryCount(r => r + 1), delay);
}
};
// ... rest of effect ...
return () => {
if (closeConnectionRef.current) {
closeConnectionRef.current();
closeConnectionRef.current = null;
}
setIsConnected(false);
// Reset retry on intentional unmount
};
}, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly, retryCount]);
// Reset retry count on successful connect:
const handleOpen = () => {
console.log(`${currentMode} log viewer connected`);
setIsConnected(true);
setRetryCount(0); // Reset retry counter
};
```
#### 2.3 Improve Enrollment LAPI Messaging
**File**: `frontend/src/pages/CrowdSecConfig.tsx`
**Fix 1**: Increase initial delay (line 85):
```typescript
// BEFORE:
}, 3000) // Wait 3 seconds
// AFTER:
}, 5000) // Wait 5 seconds for LAPI to initialize
```
**Fix 2**: Improve warning messages (around lines 200-250):
```tsx
{/* Show LAPI initializing warning when process running but LAPI not ready */}
{lapiStatusQuery.data && lapiStatusQuery.data.running && !lapiStatusQuery.data.lapi_ready && initialCheckComplete && (
CrowdSec Local API is initializing...
The CrowdSec process is running but LAPI takes 5-10 seconds to become ready.
Console enrollment will be available once LAPI is ready.
{lapiStatusQuery.isRefetching && ' Checking status...'}
)}
{/* Show not running warning when process not running */}
{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && (
CrowdSec is not running
Enable CrowdSec from the Security Dashboard first.
The process typically takes 5-10 seconds to start and LAPI another 5-10 seconds to initialize.
)}
```
### Phase 3: Cleanup & Testing
#### 3.1 Database Cleanup Migration (Optional)
Create a one-time migration to remove conflicting settings:
```sql
-- Remove deprecated mode setting to prevent conflicts
DELETE FROM settings WHERE key = 'security.crowdsec.mode';
```
#### 3.2 Backend Test Updates
Add test cases for:
1. `GetStatus` returns correct enabled state when only `security.crowdsec.enabled` is set
2. `GetStatus` returns correct state when deprecated `security.crowdsec.mode` exists (should be ignored)
3. `Start()` updates `settings` table
4. `Stop()` updates `settings` table
#### 3.3 Frontend Test Updates
Add test cases for:
1. `LiveLogViewer` doesn't reconnect when pause toggled
2. `LiveLogViewer` retries connection on disconnect
3. `CrowdSecConfig` doesn't render mode toggle
---
## Test Plan
### Manual QA Checklist
- [ ] **Toggle Test**:
1. Go to Security Dashboard
2. Toggle CrowdSec ON
3. Verify card shows "Active"
4. Verify `docker exec charon ps aux | grep crowdsec` shows process
5. Toggle CrowdSec OFF
6. Verify card shows "Disabled"
7. Verify process stopped
- [ ] **State Persistence Test**:
1. Toggle CrowdSec ON
2. Refresh page
3. Verify toggle still shows ON
4. Check database: `SELECT * FROM settings WHERE key LIKE '%crowdsec%'`
- [ ] **Live Logs Test**:
1. Go to Security Dashboard
2. Verify "Connected" status appears
3. Generate some traffic
4. Verify logs appear
5. Click "Pause" - verify NO flicker/reconnect
6. Navigate to another page
7. Navigate back
8. Verify reconnection happens (status goes from Disconnected → Connected)
- [ ] **Enrollment Test**:
1. Enable CrowdSec
2. Go to CrowdSecConfig
3. Verify warning shows "LAPI initializing" (not "not running")
4. Wait for LAPI ready
5. Enter enrollment key
6. Click Enroll
7. Verify success
- [ ] **Deprecated UI Removed**:
1. Go to CrowdSecConfig page
2. Verify NO "CrowdSec Mode" card with Disabled/Local toggle
3. Verify informational banner points to Security Dashboard
### Integration Test Commands
```bash
# Test 1: Backend state consistency
# Enable via API
curl -X POST http://localhost:8080/api/v1/admin/crowdsec/start
# Check settings table
sqlite3 data/charon.db "SELECT * FROM settings WHERE key = 'security.crowdsec.enabled'"
# Expected: value = "true"
# Check status endpoint
curl http://localhost:8080/api/v1/security/status | jq '.crowdsec'
# Expected: {"mode":"local","enabled":true,...}
# Test 2: No deprecated mode conflict
sqlite3 data/charon.db "SELECT * FROM settings WHERE key = 'security.crowdsec.mode'"
# Expected: No rows (or deprecated warning logged)
# Test 3: Disable and verify
curl -X POST http://localhost:8080/api/v1/admin/crowdsec/stop
curl http://localhost:8080/api/v1/security/status | jq '.crowdsec'
# Expected: {"mode":"disabled","enabled":false,...}
sqlite3 data/charon.db "SELECT * FROM settings WHERE key = 'security.crowdsec.enabled'"
# Expected: value = "false"
```
---
## Implementation Order
| Order | Phase | Task | Priority | Est. Time |
|-------|-------|------|----------|-----------|
| 1 | 1.1 | Fix GetStatus to ignore deprecated mode | CRITICAL | 15 min |
| 2 | 1.2 | Update Start/Stop to sync settings table | CRITICAL | 20 min |
| 3 | 2.1 | Remove deprecated mode toggle from UI | HIGH | 15 min |
| 4 | 2.2 | Fix LiveLogViewer pause/reconnection | HIGH | 30 min |
| 5 | 2.3 | Improve enrollment LAPI messaging | MEDIUM | 15 min |
| 6 | 1.3 | Add deprecation warning for mode setting | LOW | 10 min |
| 7 | 3.1 | Database cleanup migration | LOW | 10 min |
| 8 | 3.2-3.3 | Update tests | MEDIUM | 30 min |
**Total Estimated Time**: ~2.5 hours
---
## Success Criteria
1. ✅ Toggling CrowdSec ON shows "Active" AND process is actually running
2. ✅ Toggling CrowdSec OFF shows "Disabled" AND process is stopped
3. ✅ State persists across page refresh
4. ✅ No deprecated mode toggle visible on CrowdSecConfig page
5. ✅ Live logs show "Connected" when WebSocket connects
6. ✅ Pausing logs does NOT cause reconnection
7. ✅ Enrollment shows appropriate LAPI status message
8. ✅ All existing tests pass
9. ✅ No errors in browser console related to CrowdSec
---
## Appendix: File Reference
| Issue | Backend Files | Frontend Files |
|-------|---------------|----------------|
| Toggle Bug | `security_handler.go#L135-148`, `crowdsec_handler.go#L184-265` | `Security.tsx#L65-110` |
| Deprecated Mode | `security_handler.go#L143-148` | `CrowdSecConfig.tsx#L69-90, L395-420` |
| Live Logs | `cerberus_logs_ws.go` | `LiveLogViewer.tsx#L100-150`, `logs.ts` |
| Enrollment | `console_enroll.go#L165-190` | `CrowdSecConfig.tsx#L85-120` |