- Add --tags tenant:X when tenant/organization is provided - Add --overwrite flag when force (rotate key) is requested - Add extractUserFriendlyError() to parse cscli errors for user display - Add comprehensive tests for command construction Fixes enrollment not reaching CrowdSec.net when using the console enrollment form.
25 KiB
Investigation Report: CrowdSec Enrollment & Live Log Viewer Issues
Date: December 15, 2025 Investigator: GitHub Copilot Status: ✅ Issue A RESOLVED - Issue B Analysis Pending
Executive Summary (Updated December 16, 2025)
This document covers TWO issues:
-
CrowdSec Enrollment ✅ FIXED: Shows success locally but engine doesn't appear in CrowdSec.net dashboard
- Root Cause: Code incorrectly set status to
enrolledaftercscli console enrollsucceeded, but CrowdSec's help explicitly states users must "validate the enrollment in the webapp" - Fix Applied: Changed status to
pending_acceptanceand updated frontend to inform users they must accept on app.crowdsec.net
- Root Cause: Code incorrectly set status to
-
Live Log Viewer: Shows "Disconnected" status (Analysis pending implementation)
✅ RESOLVED Issue A: CrowdSec Console Enrollment Not Working
Symptoms
- User submits enrollment with valid key
- Charon shows "Enrollment submitted" success message
- No engine appears in CrowdSec.net dashboard
- User reports: "The CrowdSec enrollment request NEVER reached crowdsec.net"
Root Cause (CONFIRMED)
The Bug: After a successful cscli console enroll <key> command (exit code 0), CrowdSec's help explicitly states:
"After running this command you will need to validate the enrollment in the webapp."
Exit code 0 = enrollment REQUEST sent, NOT enrollment COMPLETE.
The code incorrectly set status = enrolled when it should have been status = pending_acceptance.
Fixes Applied (December 16, 2025)
Fix A1: Backend Status Semantics
File: backend/internal/crowdsec/console_enroll.go
- Added
consoleStatusPendingAcceptance = "pending_acceptance"constant - Changed success status from
enrolledtopending_acceptance - Fixed idempotency check to also skip re-enrollment when status is
pending_acceptance - Fixed config path check to look in
config/config.yamlsubdirectory first - Updated log message to say "pending acceptance on crowdsec.net"
Fix A2: Frontend User Guidance
File: frontend/src/pages/CrowdSecConfig.tsx
- Updated success toast to say "Accept the enrollment on app.crowdsec.net to complete registration"
- Added
isConsolePendingAcceptancevariable - Updated
canRotateKeyto includepending_acceptancestatus - Added info box with link to app.crowdsec.net when status is
pending_acceptance
Fix A3: Test Updates
Files: backend/internal/crowdsec/console_enroll_test.go, backend/internal/api/handlers/crowdsec_handler_test.go
- Updated all tests expecting
enrolledto expectpending_acceptance - Updated test for idempotency to verify second call is blocked for
pending_acceptance - Changed
EnrolledAtassertion toLastAttemptAt(enrollment is not complete yet)
Verification
All backend tests pass:
TestConsoleEnrollSuccess✅TestConsoleEnrollIdempotentWhenAlreadyEnrolled✅TestConsoleEnrollNormalizesFullCommand✅TestConsoleEnrollDoesNotPassTenant✅TestConsoleEnrollmentStatus/returns_pending_acceptance_status_after_enrollment✅TestConsoleStatusAfterEnroll✅
Frontend type-check passes ✅
NEW Issue B: Live Log Viewer Shows "Disconnected"
Symptoms
- Live Log Viewer component shows "Disconnected" status badge
- No logs appear (even when there should be logs)
- WebSocket connection may not be establishing
Root Cause Analysis
Primary Finding: WebSocket Connection Works But Logs Are Sparse
The WebSocket implementation is correct. The issue is likely:
- No logs being generated - If CrowdSec/Caddy aren't actively processing requests, there are no logs
- Initial connection timing - The
isConnectedstate depends ononOpencallback
Verified Working Components:
-
Backend WebSocket Handler:
backend/internal/api/handlers/logs_ws.go- Properly upgrades HTTP to WebSocket
- Subscribes to
BroadcastHookfor log entries - Sends ping messages every 30 seconds
-
Frontend Connection Logic:
frontend/src/api/logs.tsconnectLiveLogs()correctly builds WebSocket URL- Properly handles
onOpen,onClose,onErrorcallbacks
-
Frontend Component:
frontend/src/components/LiveLogViewer.tsxisConnectedstate is set inhandleOpencallback- Connection effect runs on mount and mode changes
Potential Issues Found
Issue B1: WebSocket Route May Be Protected
Location: backend/internal/api/routes/routes.go Line 158
The WebSocket endpoint is under the protected route group, meaning it requires authentication:
protected.GET("/logs/live", handlers.LogsWebSocketHandler)
Problem: WebSocket connections may fail silently if auth token isn't being passed. The browser's native WebSocket API doesn't automatically include HTTP-only cookies or Authorization headers.
Verification Steps:
- Check browser DevTools Network tab for WebSocket connection
- Look for 401/403 responses
- Check if
tokenquery parameter is being sent
Issue B2: No Error Display to User
Location: frontend/src/components/LiveLogViewer.tsx Lines 170-172
const handleError = (error: Event) => {
console.error('WebSocket error:', error);
setIsConnected(false);
};
Problem: Errors are only logged to console, not displayed to user. User sees "Disconnected" without knowing why.
Required Fixes for Issue B
Fix B1: Add Error State Display
File: frontend/src/components/LiveLogViewer.tsx
Add error state tracking:
const [connectionError, setConnectionError] = useState<string | null>(null);
const handleError = (error: Event) => {
console.error('WebSocket error:', error);
setIsConnected(false);
setConnectionError('Failed to connect to log stream. Check authentication.');
};
const handleOpen = () => {
console.log(`${currentMode} log viewer connected`);
setIsConnected(true);
setConnectionError(null); // Clear any previous errors
};
Display error in UI:
{connectionError && (
<div className="text-red-400 text-xs p-2">{connectionError}</div>
)}
Fix B2: Add Authentication to WebSocket URL
File: frontend/src/api/logs.ts
The WebSocket needs to pass auth token as query parameter since WebSocket API doesn't support custom headers:
export const connectLiveLogs = (
filters: LiveLogFilter,
onMessage: (log: LiveLogEntry) => void,
onOpen?: () => void,
onError?: (error: Event) => void,
onClose?: () => void
): (() => void) => {
const params = new URLSearchParams();
if (filters.level) params.append('level', filters.level);
if (filters.source) params.append('source', filters.source);
// Add auth token from localStorage if available
const token = localStorage.getItem('token');
if (token) {
params.append('token', token);
}
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
const wsUrl = `${protocol}//${window.location.host}/api/v1/logs/live?${params.toString()}`;
// ...
};
Backend Auth Check (verify this exists):
The backend auth middleware must check for token query parameter in addition to headers/cookies for WebSocket connections.
Fix B3: Add Reconnection Logic
File: frontend/src/components/LiveLogViewer.tsx
Add automatic reconnection with exponential backoff:
const [reconnectAttempts, setReconnectAttempts] = useState(0);
const maxReconnectAttempts = 5;
const handleClose = () => {
console.log(`${currentMode} log viewer disconnected`);
setIsConnected(false);
// Auto-reconnect logic
if (reconnectAttempts < maxReconnectAttempts) {
const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
setTimeout(() => {
setReconnectAttempts(prev => prev + 1);
// Trigger reconnection by updating a dependency
}, delay);
}
};
Summary of All Fixes
Issue A: CrowdSec Enrollment
| File | Change |
|---|---|
frontend/src/pages/CrowdSecConfig.tsx |
Update success toast to mention acceptance step |
frontend/src/pages/CrowdSecConfig.tsx |
Add info box with link to crowdsec.net |
backend/internal/crowdsec/console_enroll.go |
Add pending_acceptance status constant |
docs/cerberus.md |
Add documentation about acceptance requirement |
Issue B: Live Log Viewer
| File | Change |
|---|---|
frontend/src/components/LiveLogViewer.tsx |
Add error state display |
frontend/src/api/logs.ts |
Pass auth token in WebSocket URL |
frontend/src/components/LiveLogViewer.tsx |
Add reconnection logic with backoff |
Testing Checklist
Enrollment Testing
- Submit enrollment with valid key
- Verify success message mentions acceptance step
- Verify UI shows guidance to accept on crowdsec.net
- Accept enrollment on crowdsec.net
- Verify engine appears in dashboard
Live Logs Testing
- Open Live Log Viewer page
- Verify WebSocket connects (check Network tab)
- Verify "Connected" badge shows
- Generate some logs (make HTTP request to proxy)
- Verify logs appear in viewer
- Test disconnect/reconnect behavior
References
PREVIOUS ANALYSIS (Resolved Issues - Kept for Reference)
Issue 1: CrowdSec Card Toggle Broken on Cerberus Dashboard
Symptoms
- CrowdSec card shows "Active" but toggle doesn't work properly
- Shows "on and active" but CrowdSec is NOT actually on
Root Cause Analysis
Files Involved:
- frontend/src/pages/Security.tsx -
crowdsecPowerMutation - frontend/src/api/crowdsec.ts -
startCrowdsec,stopCrowdsec,statusCrowdsec - backend/internal/api/handlers/security_handler.go -
GetStatus() - backend/internal/api/handlers/crowdsec_handler.go -
Start(),Stop(),Status()
The Problem:
-
Dual-Source State Conflict: The
GetStatus()endpoint in security_handler.go#L61-L137 combines state from TWO sources:settingstable:security.crowdsec.enabledandsecurity.crowdsec.modesecurity_configstable:CrowdSecModefield
-
Toggle Updates Wrong Store: When the user toggles CrowdSec via
crowdsecPowerMutation:- It calls
updateSetting('security.crowdsec.enabled', ...)which updates thesettingstable - It calls
startCrowdsec()/stopCrowdsec()which updatessecurity_configs.CrowdSecMode
- It calls
-
State Priority Mismatch: In security_handler.go#L100-L108:
// CrowdSec enabled override (from settings table) if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&setting).Error; err == nil && setting.Value != "" { if strings.EqualFold(setting.Value, "true") { crowdSecMode = "local" } else { crowdSecMode = "disabled" } }The
settingstable overridessecurity_configs, but theStart()handler updatessecurity_configs. -
Process State Not Verified: The frontend shows "Active" based on
status.crowdsec.enabledfrom the API, but this is computed from DB settings, NOT from actual process status. ThecrowdsecStatusstate (line 43-44) fetches real process status but this is a separate query displayed below the card.
The Fix
Backend (security_handler.go):
GetStatus()should check actual CrowdSec process status via theCrowdsecExecutor.Status()call, not just DB state
Frontend (Security.tsx):
- The toggle's
checkedstate should usecrowdsecStatus?.running(actual process state) instead ofstatus.crowdsec.enabled(DB setting) - Or sync both states properly after toggle
Issue 2: Live Log Viewer Shows "Disconnected" But Logs Appear
Symptoms
- Shows "Disconnected" status badge but logs ARE appearing
- Navigating away and back causes logs to disappear
Root Cause Analysis
Files Involved:
- frontend/src/components/LiveLogViewer.tsx
- frontend/src/api/logs.ts -
connectLiveLogs,connectSecurityLogs
The Problem:
-
Connection State Race Condition: In LiveLogViewer.tsx#L165-L240:
useEffect(() => { // Close existing connection if (closeConnectionRef.current) { closeConnectionRef.current(); closeConnectionRef.current = null; } // ... setup handlers ... return () => { if (closeConnectionRef.current) { closeConnectionRef.current(); closeConnectionRef.current = null; } setIsConnected(false); // <-- Issue: cleanup runs AFTER effect re-runs }; }, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]); -
Dependency Array Includes
isPaused: WhenisPausedchanges, the entire effect re-runs, creating a new WebSocket. But the cleanup of the old connection setsisConnected(false)AFTER the new connection'sonOpensetsisConnected(true), causing a flash of "Disconnected". -
Logs Disappear on Navigation: The
logsstate is stored locally in the component viauseState<DisplayLogEntry[]>([]). When the component unmounts (navigation) and remounts, state resets to empty array. There's no persistence or caching.
The Fix
-
Fix State Race: Use a ref to track connection state transitions:
const connectionIdRef = useRef(0); // In effect: increment connectionId, check it in callbacks -
Remove
isPausedfrom Dependencies: Pausing should NOT close/reopen the WebSocket. Instead, just skip adding messages when paused:// Current (wrong): connection is in dependency array // Fixed: only filter/process messages based on isPaused flag -
Persist Logs Across Navigation: Either:
- Store logs in React Query cache
- Use a global store (zustand/context)
- Accept the limitation with a "Logs cleared on navigation" note
Issue 3: DEPRECATED CrowdSec Mode Toggle Still in UI
Symptoms
- CrowdSec config page shows "Disabled/Local/External" mode toggle
- This is confusing because CrowdSec should run based SOLELY on the Feature Flag in System Settings
Root Cause Analysis
Files Involved:
- frontend/src/pages/CrowdSecConfig.tsx - Mode toggle UI
- frontend/src/pages/SystemSettings.tsx - Feature flag toggle
- backend/internal/models/security_config.go -
CrowdSecModefield
The Problem:
-
Redundant Control Surfaces: There are THREE ways to control CrowdSec:
- Feature Flag:
feature.cerberus.enabledin Settings (System Settings page) - Per-Service Toggle:
security.crowdsec.enabledin Settings (Security Dashboard) - Mode Toggle:
CrowdSecModein SecurityConfig (CrowdSec Config page)
- Feature Flag:
-
Deprecated UI Still Present: In CrowdSecConfig.tsx#L68-L100:
<Card> <div className="flex items-center justify-between gap-4 flex-wrap"> <div className="space-y-1"> <h2 className="text-lg font-semibold">CrowdSec Mode</h2> <p className="text-sm text-gray-400"> {isLocalMode ? 'CrowdSec runs locally...' : 'CrowdSec decisions are paused...'} </p> </div> <div className="flex items-center gap-3"> <span className="text-sm text-gray-400">Disabled</span> <Switch checked={isLocalMode} onChange={(e) => handleModeToggle(e.target.checked)} ... /> <span className="text-sm text-gray-200">Local</span> </div> </div> </Card> -
isLocalModeDerived from Wrong Source: Line 28:const isLocalMode = !!status && status.crowdsec?.mode !== 'disabled'This checks
modefromsecurity_configs.CrowdSecMode, not the feature flag. -
handleModeToggleUpdates Wrong Setting: Lines 72-77:const handleModeToggle = (nextEnabled: boolean) => { const mode = nextEnabled ? 'local' : 'disabled' updateModeMutation.mutate(mode) // Updates security.crowdsec.mode in settings }
The Fix
- Remove the Mode Toggle Card entirely (lines 68-100)
- Add a notice: "CrowdSec is controlled via the toggle on the Security Dashboard or System Settings"
Backend Cleanup (optional future work):
- Remove
CrowdSecModefield from SecurityConfig model - Migrate all state to use only
security.crowdsec.enabledsetting
Issue 4: Enrollment Shows "CrowdSec is not running"
Symptoms
- CrowdSec enrollment shows error even when enabled
- Red warning box: "CrowdSec is not running"
Root Cause Analysis
Files Involved:
- frontend/src/pages/CrowdSecConfig.tsx -
lapiStatusQuery - frontend/src/pages/CrowdSecConfig.tsx - Warning display logic
- backend/internal/api/handlers/crowdsec_handler.go -
Status()
The Problem:
-
LAPI Status Query Uses Wrong Condition: In CrowdSecConfig.tsx#L30-L40:
const lapiStatusQuery = useQuery<CrowdSecStatus>({ queryKey: ['crowdsec-lapi-status'], queryFn: statusCrowdsec, enabled: consoleEnrollmentEnabled && initialCheckComplete, refetchInterval: 5000, retry: false, })The query is
enabledonly whenconsoleEnrollmentEnabled(feature flag for console enrollment). -
Warning Shows When Process Not Running: In CrowdSecConfig.tsx#L172-L196:
{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && ( <div className="..." data-testid="lapi-not-running-warning"> <p>CrowdSec is not running</p> ... </div> )}This shows when
lapiStatusQuery.data.running === false. -
Status Check May Return Stale Data: The
Status()backend handler checks:- PID file existence
- Process status via
kill -0 - LAPI health via
cscli lapi status
But if CrowdSec was just enabled, there may be a race condition where the settings say "enabled" but the process hasn't started yet.
-
Startup Reconciliation Timing:
ReconcileCrowdSecOnStartup()in crowdsec_startup.go runs at container start, but if the user enables CrowdSec AFTER startup, the process won't auto-start.
The Fix
-
Improve Warning Message: The "not running" warning should include:
- A "Start CrowdSec" button that calls
startCrowdsec()API - Or a link to the Security Dashboard where the toggle is
- A "Start CrowdSec" button that calls
-
Check Both States: Show the warning only when:
- User has enabled CrowdSec (via either toggle)
- AND the process is not running
-
Add Auto-Retry: After enabling CrowdSec, poll status more aggressively for 30 seconds
Implementation Plan
Phase 1: Backend Fixes (Priority: High)
1.1 Unify State Source
File: backend/internal/api/handlers/security_handler.go
Change: Modify GetStatus() to include actual process status:
// Add after line 137:
// Check actual CrowdSec process status
if h.crowdsecExecutor != nil {
ctx := c.Request.Context()
running, pid, _ := h.crowdsecExecutor.Status(ctx, h.dataDir)
// Override enabled state based on actual process
crowdsecProcessRunning = running
}
Add crowdsecExecutor field to SecurityHandler struct and inject it during initialization.
1.2 Consistent Mode Updates
File: backend/internal/api/handlers/crowdsec_handler.go
Change: In Start() and Stop(), also update the settings table:
// In Start(), after updating SecurityConfig (line ~165):
if h.DB != nil {
setting := models.Setting{Key: "security.crowdsec.enabled", Value: "true", Category: "security", Type: "bool"}
h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(setting).FirstOrCreate(&setting)
}
// In Stop(), after updating SecurityConfig (line ~228):
if h.DB != nil {
setting := models.Setting{Key: "security.crowdsec.enabled", Value: "false", Category: "security", Type: "bool"}
h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(setting).FirstOrCreate(&setting)
}
Phase 2: Frontend Fixes (Priority: High)
2.1 Fix CrowdSec Toggle State
File: frontend/src/pages/Security.tsx
Change 1: Use actual process status for toggle (around line 203):
// Replace: checked={status.crowdsec.enabled}
// With:
checked={crowdsecStatus?.running ?? status.crowdsec.enabled}
Change 2: After successful toggle, refetch both status and process status
2.2 Fix LiveLogViewer Connection State
File: frontend/src/components/LiveLogViewer.tsx
Change 1: Remove isPaused from useEffect dependencies (line 237):
// Change from:
}, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]);
// To:
}, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]);
Change 2: Handle pause inside message handler (line 192):
const handleMessage = (entry: SecurityLogEntry) => {
// isPaused check stays here, not in effect
if (isPausedRef.current) return; // Use ref instead of state
// ... rest of handler
};
Change 3: Add ref for isPaused:
const isPausedRef = useRef(isPaused);
useEffect(() => { isPausedRef.current = isPaused; }, [isPaused]);
2.3 Remove Deprecated Mode Toggle
File: frontend/src/pages/CrowdSecConfig.tsx
Change: Remove the entire "CrowdSec Mode" Card (lines 291-311 in current render):
// DELETE: The entire <Card> block containing "CrowdSec Mode"
Add informational banner instead:
{/* Replace mode toggle with info banner */}
<div className="bg-blue-900/20 border border-blue-700 rounded-lg p-4">
<p className="text-sm text-blue-200">
<strong>Note:</strong> CrowdSec is controlled via the toggle on the{' '}
<Link to="/security" className="underline">Security Dashboard</Link>.
Enable/disable CrowdSec there, then configure presets and files here.
</p>
</div>
2.4 Fix Enrollment Warning
File: frontend/src/pages/CrowdSecConfig.tsx
Change: Add "Start CrowdSec" button to the warning (around line 185):
<Button
variant="primary"
size="sm"
onClick={async () => {
try {
await startCrowdsec();
toast.info('Starting CrowdSec...');
lapiStatusQuery.refetch();
} catch (err) {
toast.error('Failed to start CrowdSec');
}
}}
>
Start CrowdSec
</Button>
Phase 3: Remove Deprecated Mode (Priority: Medium)
3.1 Backend Model Cleanup (Future)
File: backend/internal/models/security_config.go
Mark CrowdSecMode as deprecated with migration path.
3.2 Settings Migration
Create migration to ensure all users have security.crowdsec.enabled setting derived from CrowdSecMode.
Files to Modify Summary
Backend
| File | Changes |
|---|---|
backend/internal/api/handlers/security_handler.go |
Add process status check to GetStatus() |
backend/internal/api/handlers/crowdsec_handler.go |
Sync settings table in Start()/Stop() |
Frontend
| File | Changes |
|---|---|
frontend/src/pages/Security.tsx |
Use crowdsecStatus?.running for toggle state |
frontend/src/components/LiveLogViewer.tsx |
Fix isPaused dependency, use ref |
frontend/src/pages/CrowdSecConfig.tsx |
Remove mode toggle, add info banner, add "Start CrowdSec" button |
Testing Checklist
- Toggle CrowdSec on Security Dashboard → verify process starts
- Toggle CrowdSec off → verify process stops
- Refresh page → verify toggle state matches process state
- Open LiveLogViewer → verify "Connected" status
- Pause logs → verify connection remains open
- Navigate away and back → logs are cleared (expected) but connection re-establishes
- CrowdSec Config page → no mode toggle, info banner present
- Enrollment section → shows "Start CrowdSec" button when process not running