- Backend: Start/Stop handlers now sync both settings and security_configs tables - Frontend: CrowdSec toggle uses actual process status (crowdsecStatus.running) - Frontend: Fixed LiveLogViewer WebSocket race condition by using isPausedRef - Frontend: Removed deprecated mode toggle from CrowdSecConfig page - Frontend: Added info banner directing users to Security Dashboard - Frontend: Added "Start CrowdSec" button to enrollment warning panel Fixes dual-source state conflict causing toggle to show incorrect state. Fixes live log "disconnected" status appearing while logs stream. Simplifies CrowdSec control to single source (Security Dashboard toggle). Includes comprehensive test updates for new architecture.
16 KiB
Comprehensive Bug Analysis: CrowdSec & Live Logs Issues
Date: December 15, 2025 Status: Ready for Implementation
Executive Summary
Four user-reported issues all stem from configuration state synchronization problems between:
- The
settingstable (runtime toggles) - The
security_configstable (SecurityConfig model) - The actual CrowdSec process state
- Frontend display state
Issue 1: CrowdSec Card Toggle Broken on Cerberus Dashboard
Symptoms
- CrowdSec card shows "Active" but toggle doesn't work properly
- Shows "on and active" but CrowdSec is NOT actually on
Root Cause Analysis
Files Involved:
- frontend/src/pages/Security.tsx -
crowdsecPowerMutation - frontend/src/api/crowdsec.ts -
startCrowdsec,stopCrowdsec,statusCrowdsec - backend/internal/api/handlers/security_handler.go -
GetStatus() - backend/internal/api/handlers/crowdsec_handler.go -
Start(),Stop(),Status()
The Problem:
-
Dual-Source State Conflict: The
GetStatus()endpoint in security_handler.go#L61-L137 combines state from TWO sources:settingstable:security.crowdsec.enabledandsecurity.crowdsec.modesecurity_configstable:CrowdSecModefield
-
Toggle Updates Wrong Store: When the user toggles CrowdSec via
crowdsecPowerMutation:- It calls
updateSetting('security.crowdsec.enabled', ...)which updates thesettingstable - It calls
startCrowdsec()/stopCrowdsec()which updatessecurity_configs.CrowdSecMode
- It calls
-
State Priority Mismatch: In security_handler.go#L100-L108:
// CrowdSec enabled override (from settings table) if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&setting).Error; err == nil && setting.Value != "" { if strings.EqualFold(setting.Value, "true") { crowdSecMode = "local" } else { crowdSecMode = "disabled" } }The
settingstable overridessecurity_configs, but theStart()handler updatessecurity_configs. -
Process State Not Verified: The frontend shows "Active" based on
status.crowdsec.enabledfrom the API, but this is computed from DB settings, NOT from actual process status. ThecrowdsecStatusstate (line 43-44) fetches real process status but this is a separate query displayed below the card.
The Fix
Backend (security_handler.go):
GetStatus()should check actual CrowdSec process status via theCrowdsecExecutor.Status()call, not just DB state
Frontend (Security.tsx):
- The toggle's
checkedstate should usecrowdsecStatus?.running(actual process state) instead ofstatus.crowdsec.enabled(DB setting) - Or sync both states properly after toggle
Issue 2: Live Log Viewer Shows "Disconnected" But Logs Appear
Symptoms
- Shows "Disconnected" status badge but logs ARE appearing
- Navigating away and back causes logs to disappear
Root Cause Analysis
Files Involved:
- frontend/src/components/LiveLogViewer.tsx
- frontend/src/api/logs.ts -
connectLiveLogs,connectSecurityLogs
The Problem:
-
Connection State Race Condition: In LiveLogViewer.tsx#L165-L240:
useEffect(() => { // Close existing connection if (closeConnectionRef.current) { closeConnectionRef.current(); closeConnectionRef.current = null; } // ... setup handlers ... return () => { if (closeConnectionRef.current) { closeConnectionRef.current(); closeConnectionRef.current = null; } setIsConnected(false); // <-- Issue: cleanup runs AFTER effect re-runs }; }, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]); -
Dependency Array Includes
isPaused: WhenisPausedchanges, the entire effect re-runs, creating a new WebSocket. But the cleanup of the old connection setsisConnected(false)AFTER the new connection'sonOpensetsisConnected(true), causing a flash of "Disconnected". -
Logs Disappear on Navigation: The
logsstate is stored locally in the component viauseState<DisplayLogEntry[]>([]). When the component unmounts (navigation) and remounts, state resets to empty array. There's no persistence or caching.
The Fix
-
Fix State Race: Use a ref to track connection state transitions:
const connectionIdRef = useRef(0); // In effect: increment connectionId, check it in callbacks -
Remove
isPausedfrom Dependencies: Pausing should NOT close/reopen the WebSocket. Instead, just skip adding messages when paused:// Current (wrong): connection is in dependency array // Fixed: only filter/process messages based on isPaused flag -
Persist Logs Across Navigation: Either:
- Store logs in React Query cache
- Use a global store (zustand/context)
- Accept the limitation with a "Logs cleared on navigation" note
Issue 3: DEPRECATED CrowdSec Mode Toggle Still in UI
Symptoms
- CrowdSec config page shows "Disabled/Local/External" mode toggle
- This is confusing because CrowdSec should run based SOLELY on the Feature Flag in System Settings
Root Cause Analysis
Files Involved:
- frontend/src/pages/CrowdSecConfig.tsx - Mode toggle UI
- frontend/src/pages/SystemSettings.tsx - Feature flag toggle
- backend/internal/models/security_config.go -
CrowdSecModefield
The Problem:
-
Redundant Control Surfaces: There are THREE ways to control CrowdSec:
- Feature Flag:
feature.cerberus.enabledin Settings (System Settings page) - Per-Service Toggle:
security.crowdsec.enabledin Settings (Security Dashboard) - Mode Toggle:
CrowdSecModein SecurityConfig (CrowdSec Config page)
- Feature Flag:
-
Deprecated UI Still Present: In CrowdSecConfig.tsx#L68-L100:
<Card> <div className="flex items-center justify-between gap-4 flex-wrap"> <div className="space-y-1"> <h2 className="text-lg font-semibold">CrowdSec Mode</h2> <p className="text-sm text-gray-400"> {isLocalMode ? 'CrowdSec runs locally...' : 'CrowdSec decisions are paused...'} </p> </div> <div className="flex items-center gap-3"> <span className="text-sm text-gray-400">Disabled</span> <Switch checked={isLocalMode} onChange={(e) => handleModeToggle(e.target.checked)} ... /> <span className="text-sm text-gray-200">Local</span> </div> </div> </Card> -
isLocalModeDerived from Wrong Source: Line 28:const isLocalMode = !!status && status.crowdsec?.mode !== 'disabled'This checks
modefromsecurity_configs.CrowdSecMode, not the feature flag. -
handleModeToggleUpdates Wrong Setting: Lines 72-77:const handleModeToggle = (nextEnabled: boolean) => { const mode = nextEnabled ? 'local' : 'disabled' updateModeMutation.mutate(mode) // Updates security.crowdsec.mode in settings }
The Fix
- Remove the Mode Toggle Card entirely (lines 68-100)
- Add a notice: "CrowdSec is controlled via the toggle on the Security Dashboard or System Settings"
Backend Cleanup (optional future work):
- Remove
CrowdSecModefield from SecurityConfig model - Migrate all state to use only
security.crowdsec.enabledsetting
Issue 4: Enrollment Shows "CrowdSec is not running"
Symptoms
- CrowdSec enrollment shows error even when enabled
- Red warning box: "CrowdSec is not running"
Root Cause Analysis
Files Involved:
- frontend/src/pages/CrowdSecConfig.tsx -
lapiStatusQuery - frontend/src/pages/CrowdSecConfig.tsx - Warning display logic
- backend/internal/api/handlers/crowdsec_handler.go -
Status()
The Problem:
-
LAPI Status Query Uses Wrong Condition: In CrowdSecConfig.tsx#L30-L40:
const lapiStatusQuery = useQuery<CrowdSecStatus>({ queryKey: ['crowdsec-lapi-status'], queryFn: statusCrowdsec, enabled: consoleEnrollmentEnabled && initialCheckComplete, refetchInterval: 5000, retry: false, })The query is
enabledonly whenconsoleEnrollmentEnabled(feature flag for console enrollment). -
Warning Shows When Process Not Running: In CrowdSecConfig.tsx#L172-L196:
{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && ( <div className="..." data-testid="lapi-not-running-warning"> <p>CrowdSec is not running</p> ... </div> )}This shows when
lapiStatusQuery.data.running === false. -
Status Check May Return Stale Data: The
Status()backend handler checks:- PID file existence
- Process status via
kill -0 - LAPI health via
cscli lapi status
But if CrowdSec was just enabled, there may be a race condition where the settings say "enabled" but the process hasn't started yet.
-
Startup Reconciliation Timing:
ReconcileCrowdSecOnStartup()in crowdsec_startup.go runs at container start, but if the user enables CrowdSec AFTER startup, the process won't auto-start.
The Fix
-
Improve Warning Message: The "not running" warning should include:
- A "Start CrowdSec" button that calls
startCrowdsec()API - Or a link to the Security Dashboard where the toggle is
- A "Start CrowdSec" button that calls
-
Check Both States: Show the warning only when:
- User has enabled CrowdSec (via either toggle)
- AND the process is not running
-
Add Auto-Retry: After enabling CrowdSec, poll status more aggressively for 30 seconds
Implementation Plan
Phase 1: Backend Fixes (Priority: High)
1.1 Unify State Source
File: backend/internal/api/handlers/security_handler.go
Change: Modify GetStatus() to include actual process status:
// Add after line 137:
// Check actual CrowdSec process status
if h.crowdsecExecutor != nil {
ctx := c.Request.Context()
running, pid, _ := h.crowdsecExecutor.Status(ctx, h.dataDir)
// Override enabled state based on actual process
crowdsecProcessRunning = running
}
Add crowdsecExecutor field to SecurityHandler struct and inject it during initialization.
1.2 Consistent Mode Updates
File: backend/internal/api/handlers/crowdsec_handler.go
Change: In Start() and Stop(), also update the settings table:
// In Start(), after updating SecurityConfig (line ~165):
if h.DB != nil {
setting := models.Setting{Key: "security.crowdsec.enabled", Value: "true", Category: "security", Type: "bool"}
h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(setting).FirstOrCreate(&setting)
}
// In Stop(), after updating SecurityConfig (line ~228):
if h.DB != nil {
setting := models.Setting{Key: "security.crowdsec.enabled", Value: "false", Category: "security", Type: "bool"}
h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(setting).FirstOrCreate(&setting)
}
Phase 2: Frontend Fixes (Priority: High)
2.1 Fix CrowdSec Toggle State
File: frontend/src/pages/Security.tsx
Change 1: Use actual process status for toggle (around line 203):
// Replace: checked={status.crowdsec.enabled}
// With:
checked={crowdsecStatus?.running ?? status.crowdsec.enabled}
Change 2: After successful toggle, refetch both status and process status
2.2 Fix LiveLogViewer Connection State
File: frontend/src/components/LiveLogViewer.tsx
Change 1: Remove isPaused from useEffect dependencies (line 237):
// Change from:
}, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]);
// To:
}, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]);
Change 2: Handle pause inside message handler (line 192):
const handleMessage = (entry: SecurityLogEntry) => {
// isPaused check stays here, not in effect
if (isPausedRef.current) return; // Use ref instead of state
// ... rest of handler
};
Change 3: Add ref for isPaused:
const isPausedRef = useRef(isPaused);
useEffect(() => { isPausedRef.current = isPaused; }, [isPaused]);
2.3 Remove Deprecated Mode Toggle
File: frontend/src/pages/CrowdSecConfig.tsx
Change: Remove the entire "CrowdSec Mode" Card (lines 291-311 in current render):
// DELETE: The entire <Card> block containing "CrowdSec Mode"
Add informational banner instead:
{/* Replace mode toggle with info banner */}
<div className="bg-blue-900/20 border border-blue-700 rounded-lg p-4">
<p className="text-sm text-blue-200">
<strong>Note:</strong> CrowdSec is controlled via the toggle on the{' '}
<Link to="/security" className="underline">Security Dashboard</Link>.
Enable/disable CrowdSec there, then configure presets and files here.
</p>
</div>
2.4 Fix Enrollment Warning
File: frontend/src/pages/CrowdSecConfig.tsx
Change: Add "Start CrowdSec" button to the warning (around line 185):
<Button
variant="primary"
size="sm"
onClick={async () => {
try {
await startCrowdsec();
toast.info('Starting CrowdSec...');
lapiStatusQuery.refetch();
} catch (err) {
toast.error('Failed to start CrowdSec');
}
}}
>
Start CrowdSec
</Button>
Phase 3: Remove Deprecated Mode (Priority: Medium)
3.1 Backend Model Cleanup (Future)
File: backend/internal/models/security_config.go
Mark CrowdSecMode as deprecated with migration path.
3.2 Settings Migration
Create migration to ensure all users have security.crowdsec.enabled setting derived from CrowdSecMode.
Files to Modify Summary
Backend
| File | Changes |
|---|---|
backend/internal/api/handlers/security_handler.go |
Add process status check to GetStatus() |
backend/internal/api/handlers/crowdsec_handler.go |
Sync settings table in Start()/Stop() |
Frontend
| File | Changes |
|---|---|
frontend/src/pages/Security.tsx |
Use crowdsecStatus?.running for toggle state |
frontend/src/components/LiveLogViewer.tsx |
Fix isPaused dependency, use ref |
frontend/src/pages/CrowdSecConfig.tsx |
Remove mode toggle, add info banner, add "Start CrowdSec" button |
Testing Checklist
- Toggle CrowdSec on Security Dashboard → verify process starts
- Toggle CrowdSec off → verify process stops
- Refresh page → verify toggle state matches process state
- Open LiveLogViewer → verify "Connected" status
- Pause logs → verify connection remains open
- Navigate away and back → logs are cleared (expected) but connection re-establishes
- CrowdSec Config page → no mode toggle, info banner present
- Enrollment section → shows "Start CrowdSec" button when process not running