Files
Charon/docs/plans/current_spec.md
T
GitHub Actions 45102ae312 feat: Add CrowdSec console re-enrollment support
- Add logging when enrollment is silently skipped due to existing state
- Add DELETE /admin/crowdsec/console/enrollment endpoint to clear state
- Add re-enrollment UI section with guidance and crowdsec.net link
- Add useClearConsoleEnrollment hook for state clearing

Fixes silent idempotency bug where backend returned 200 OK without
actually executing cscli when status was already enrolled.
2025-12-16 03:39:08 +00:00

46 KiB

Investigation Report: CrowdSec Enrollment & Live Log Viewer Issues

Date: December 15, 2025 (Updated: December 16, 2025) Investigator: GitHub Copilot Status: Analysis Complete - Re-Enrollment UX Options Evaluated


📋 CrowdSec Re-Enrollment UX Research (December 16, 2025)

CrowdSec CLI Capabilities

Available Console Commands (cscli console --help):

Available Commands:
  disable     Disable a console option
  enable      Enable a console option
  enroll      Enroll this instance to https://app.crowdsec.net
  status      Shows status of the console options

Enroll Command Flags (cscli console enroll --help):

Flags:
  -d, --disable strings   Disable console options
  -e, --enable strings    Enable console options
  -h, --help              help for enroll
  -n, --name string       Name to display in the console
      --overwrite         Force enroll the instance  ← KEY FLAG FOR RE-ENROLLMENT
  -t, --tags strings      Tags to display in the console

Key Finding: NO "unenroll" or "disconnect" command exists in CrowdSec CLI.

The disable --all command only disables data sharing options (custom, tainted, manual, context, console_management) - it does NOT unenroll from the console.

Current Data Model Analysis

Model: CrowdsecConsoleEnrollment (crowdsec_console_enrollment.go):

type CrowdsecConsoleEnrollment struct {
    ID                 uint       // Primary key
    UUID               string     // Unique identifier
    Status             string     // not_enrolled, enrolling, pending_acceptance, enrolled, failed
    Tenant             string     // Organization identifier
    AgentName          string     // Display name in console
    EncryptedEnrollKey string     // ← KEY IS STORED (encrypted with AES-GCM)
    LastError          string     // Error message if failed
    LastCorrelationID  string     // For debugging
    LastAttemptAt      *time.Time
    EnrolledAt         *time.Time
    LastHeartbeatAt    *time.Time
    CreatedAt          time.Time
    UpdatedAt          time.Time
}

Current Implementation Already Stores Enrollment Key:

  • The key is encrypted using AES-256-GCM with a key derived from a secret
  • Stored in EncryptedEnrollKey field (excluded from JSON via json:"-")
  • Encryption implemented in console_enroll.go lines 377-409

Enrollment Key Lifecycle (from crowdsec.net)

  1. Generation: User generates enrollment key on app.crowdsec.net
  2. Usage: Key is used with cscli console enroll <key> to request enrollment
  3. Validation: CrowdSec validates the key against their API
  4. Acceptance: User must accept enrollment request on app.crowdsec.net
  5. Reusability: The SAME key can be used multiple times with --overwrite flag
  6. Expiration: Keys do not expire but may be revoked by user on console

UX Options Evaluation

How it works:

  • User provides a new enrollment key from crowdsec.net
  • Backend sends cscli console enroll --overwrite --name <agent> <new_key>
  • User accepts on crowdsec.net

Pros:

  • Simple implementation (already supported via force: true)
  • Secure - no key storage concerns beyond current encrypted storage
  • Fresh key guarantees user has console access
  • Matches CrowdSec's intended workflow

Cons:

  • ⚠️ Requires user to visit crowdsec.net to get new key
  • ⚠️ Extra step for user

Current UI Support:

  • "Rotate key" button already calls submitConsoleEnrollment(true) with force=true
  • "Retry enrollment" button appears when status is degraded

Option B: "Re-enroll" with STORED Key

How it works:

  • Use the encrypted key already stored in EncryptedEnrollKey
  • Decrypt and re-send enrollment request

Pros:

  • Simplest UX - one-click re-enrollment
  • Key is already stored and encrypted

Cons:

  • ⚠️ Security concern: Re-using stored keys increases exposure window
  • ⚠️ Key may have been revoked on crowdsec.net without Charon knowing
  • ⚠️ Old key may belong to different CrowdSec account
  • ⚠️ Violates principle of least privilege

Current Implementation Gap:

  • decrypt() method exists but is marked as "only used in tests"
  • Would need new endpoint to retrieve stored key for re-enrollment

Option C: "Unenroll" + Manual Re-enroll NOT SUPPORTED

How it would work:

  • Clear local enrollment state
  • User goes through fresh enrollment

Blockers:

  • CrowdSec CLI has NO unenroll/disconnect command
  • Would require manual deletion of config files
  • May leave orphaned engine on crowdsec.net console

Files that would need cleanup:

/app/data/crowdsec/config/console.yaml     # Console options
/app/data/crowdsec/config/online_api_credentials.yaml  # CAPI credentials

Note: Deleting these files would also affect CAPI registration, not just console enrollment.

Justification:

  1. Security First: CrowdSec enrollment keys should be treated as sensitive credentials
  2. User Intent: Re-enrollment implies user wants fresh connection to console
  3. Minimal Risk: User must actively obtain new key, preventing accidental re-enrollments
  4. CrowdSec Best Practice: The --overwrite flag is CrowdSec's designed mechanism for this

UI Flow Enhancement:

┌─────────────────────────────────────────────────────────────────┐
│  Console Enrollment                                    [?] Help │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Status: ● Enrolled                                             │
│  Agent: Charon-Home                                             │
│  Tenant: my-organization                                        │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ Need to re-enroll?                                       │   │
│  │                                                          │   │
│  │ To connect to a different CrowdSec console account or   │   │
│  │ reset your enrollment, you'll need a new enrollment key │   │
│  │ from app.crowdsec.net.                                   │   │
│  │                                                          │   │
│  │ [Get new key ↗] [Re-enroll with new key]                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ New Enrollment Key:  [________________________]          │   │
│  │ Agent Name:          [Charon-Home_____________]          │   │
│  │ Tenant:              [my-organization_________]          │   │
│  │                                                          │   │
│  │ [Re-enroll]                                              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Implementation Steps

Step 1: Update Frontend UI (Priority: HIGH)

File: frontend/src/pages/CrowdSecConfig.tsx

Changes:

  1. Add "Re-enroll" section visible when status === 'enrolled'
  2. Add expandable/collapsible panel for re-enrollment
  3. Add link to app.crowdsec.net/enrollment-keys
  4. Rename "Rotate key" button to "Re-enroll" for clarity
  5. Add explanatory text about why re-enrollment requires new key

Step 2: Improve Backend Logging (Priority: MEDIUM)

File: backend/internal/crowdsec/console_enroll.go

Changes:

  1. Add logging when enrollment is skipped due to existing status
  2. Return skipped: true field in response when idempotency check triggers
  3. Consider adding reason field to explain why enrollment was skipped

Step 3: Add "Clear Enrollment" Admin Function (Priority: LOW)

File: backend/internal/api/handlers/crowdsec_handler.go

New endpoint: DELETE /api/v1/admin/crowdsec/console/enrollment

Purpose: Reset local enrollment state to not_enrolled without touching CrowdSec config files.

Note: This does NOT unenroll from crowdsec.net - that must be done manually on the console.

Step 4: Documentation Update (Priority: MEDIUM)

File: docs/cerberus.md

Add section explaining:

  • Why re-enrollment requires new key
  • How to get new enrollment key from crowdsec.net
  • What happens to old engine on crowdsec.net (must be manually removed)
  • Troubleshooting common enrollment issues

Executive Summary

This document covers THREE issues:

  1. CrowdSec Enrollment Backend 🔴 CRITICAL BUG FOUND: Backend returns 200 OK but cscli is NEVER executed

    • Root Cause: Silent idempotency check returns success without running enrollment command
    • Evidence: POST returns 200 OK with 137ms latency, but NO cscli logs appear
    • Fix Required: Add logging for skipped enrollments and clear guidance to use force=true
  2. Live Log Viewer: Shows "Disconnected" status (Analysis pending implementation)

  3. Stale Database State: Old enrolled status from pre-fix deployment blocks new enrollments

    • Symptoms: User clicks Enroll, sees 200 OK, but nothing happens on crowdsec.net
    • Root Cause: Database has status=enrolled from before the pending_acceptance fix was deployed

🔴 CRITICAL BUG: Silent Idempotency Check (December 16, 2025)

Problem Statement

User submits enrollment form, backend returns 200 OK (confirmed in Docker logs), but the enrollment NEVER appears on crowdsec.net. No cscli command execution visible in logs.

Docker Log Evidence

POST /api/v1/admin/crowdsec/console/enroll → 200 OK (137ms latency)
NO "starting crowdsec console enrollment" log ← cscli NEVER executed
NO cscli output logs

Code Path Analysis

File: backend/internal/crowdsec/console_enroll.go

Step 1: Handler calls service (line 865-920)

// crowdsec_handler.go:888-895
status, err := h.Console.Enroll(ctx, crowdsec.ConsoleEnrollRequest{
    EnrollmentKey: payload.EnrollmentKey,
    Tenant:        payload.Tenant,
    AgentName:     payload.AgentName,
    Force:         payload.Force,  // <-- User did NOT check Force checkbox
})

Step 2: Idempotency Check (lines 155-165) ⚠️ BUG HERE

// console_enroll.go:155-165
if rec.Status == consoleStatusEnrolling {
    return s.statusFromModel(rec), fmt.Errorf("enrollment already in progress")
}
// If already enrolled or pending acceptance, skip unless Force is set
if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force {
    return s.statusFromModel(rec), nil  // <-- RETURNS SUCCESS WITHOUT LOGGING OR RUNNING CSCLI!
}

Step 3: Database State (confirmed via container inspection)

uuid: fb129bb5-d223-4c66-941c-a30e2e2b3040
status: enrolled  ← SET BY OLD CODE BEFORE pending_acceptance FIX
tenant: 5e045b3c-5196-406b-99cd-503bc64c7b0d
agent_name: Charon

Root Cause

  1. Historical State: User enrolled BEFORE the pending_acceptance fix was deployed
  2. Old Code Bug: Previous code set status = enrolled immediately after cscli returned exit 0
  3. Silent Skip: Current code silently skips enrollment when status is enrolled (or pending_acceptance)
  4. No User Feedback: Returns 200 OK without logging or informing user enrollment was skipped

Manual Test Results from Container

# cscli is available and working
docker exec charon cscli console enroll --help
# ✅ Shows help

# LAPI is running
docker exec charon cscli lapi status
# ✅ "You can successfully interact with Local API (LAPI)"

# Console status
docker exec charon cscli console status
# ✅ Shows options table (custom=true, tainted=true)

# Manual enrollment with invalid key shows proper error
docker exec charon cscli console enroll --name test TESTINVALIDKEY123
# ✅ Error: "the attachment key provided is not valid"

# Config path exists and is correct
docker exec charon ls /app/data/crowdsec/config/config.yaml
# ✅ File exists

Required Fixes

Fix 1: Add Logging for Skipped Enrollments

File: backend/internal/crowdsec/console_enroll.go lines 162-165

Current:

if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force {
    return s.statusFromModel(rec), nil
}

Fixed:

if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force {
    logger.Log().WithField("status", rec.Status).WithField("agent", rec.AgentName).WithField("tenant", rec.Tenant).Info("enrollment skipped: already enrolled or pending - use force=true to re-enroll")
    return s.statusFromModel(rec), nil
}

Fix 2: Add "Skipped" Indicator to Response

Add a field to indicate enrollment was skipped vs actually submitted:

type ConsoleEnrollmentStatus struct {
    Status          string     `json:"status"`
    Skipped         bool       `json:"skipped,omitempty"`  // <-- NEW
    // ... other fields
}

And in the idempotency return:

status := s.statusFromModel(rec)
status.Skipped = true
return status, nil

Fix 3: Frontend Should Show "Already Enrolled" State

File: frontend/src/pages/CrowdSecConfig.tsx

When consoleStatusQuery.data?.status === 'enrolled' or 'pending_acceptance':

  • Show "You are already enrolled" message
  • Show "Force Re-Enrollment" button with checkbox
  • Explain that acceptance on crowdsec.net may be required

Fix 4: Migrate Stale "enrolled" Status to "pending_acceptance"

Either:

  1. Add a database migration to change all enrolled to pending_acceptance
  2. Or have users click "Force Re-Enroll" once

Workaround for User

Until fix is deployed, user can re-enroll using the Force option:

  1. In the UI: Check "Force re-enrollment" checkbox before clicking Enroll
  2. Or via curl:
curl -X POST http://localhost:8080/api/v1/admin/crowdsec/console/enroll \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"enrollment_key":"<key>", "agent_name":"Charon", "force":true}'

Previous Frontend Analysis (Still Valid for Reference)

Enrollment Flow Path

User clicks "Enroll" button
    ↓
CrowdSecConfig.tsx: <Button onClick={() => submitConsoleEnrollment(false)} ...>
    ↓
submitConsoleEnrollment() function (line 269-299)
    ↓
validateConsoleEnrollment() check (line 254-267)
    ↓
enrollConsoleMutation.mutateAsync(payload)
    ↓
useConsoleEnrollment.ts: enrollConsole(payload)
    ↓
consoleEnrollment.ts: client.post('/admin/crowdsec/console/enroll', payload)

Conditions That Block the Enrollment Request

1. Feature Flag Disabled (POSSIBLE BLOCKER)

File: CrowdSecConfig.tsx

const { data: featureFlags } = useQuery({ queryKey: ['feature-flags'], queryFn: getFeatureFlags })
const consoleEnrollmentEnabled = Boolean(featureFlags?.['feature.crowdsec.console_enrollment'])

Impact: If feature.crowdsec.console_enrollment is false or undefined, the entire enrollment card is not rendered:

{consoleEnrollmentEnabled && (
  <Card data-testid="console-enrollment-card">
    ... enrollment UI ...
  </Card>
)}

2. Enroll Button Disabled Conditions ⚠️ HIGH PROBABILITY

File: CrowdSecConfig.tsx

disabled={isConsolePending || (lapiStatusQuery.data && !lapiStatusQuery.data.lapi_ready) || !enrollmentToken.trim()}

The button is disabled when:

Condition Description
isConsolePending Enrollment mutation is already in progress OR status is 'enrolling'
lapiStatusQuery.data && !lapiStatusQuery.data.lapi_ready LAPI status query returned data but lapi_ready is false
!enrollmentToken.trim() Enrollment token input is empty

⚠️ CRITICAL FINDING: The LAPI ready check can block enrollment:

  • If lapiStatusQuery.data exists AND lapi_ready is false, button is DISABLED
  • This can happen if CrowdSec process is running but LAPI hasn't fully initialized

3. Validation Blocks in submitConsoleEnrollment() ⚠️ HIGH PROBABILITY

File: CrowdSecConfig.tsx

const submitConsoleEnrollment = async (force = false) => {
  const allowMissingTenant = force && !consoleTenant.trim()
  const requireAck = normalizedConsoleStatus === 'not_enrolled'
  if (!validateConsoleEnrollment({ allowMissingTenant, requireAck })) return  // <-- EARLY RETURN
  ...
}

Validation function (line 254-267):

const validateConsoleEnrollment = (options?) => {
  const nextErrors = {}
  if (!enrollmentToken.trim()) {
    nextErrors.token = 'Enrollment token is required'
  }
  if (!consoleAgentName.trim()) {
    nextErrors.agent = 'Agent name is required'
  }
  if (!consoleTenant.trim() && !options?.allowMissingTenant) {
    nextErrors.tenant = 'Tenant / organization is required'  // <-- BLOCKS if tenant empty
  }
  if (options?.requireAck && !consoleAck) {
    nextErrors.ack = 'You must acknowledge...'  // <-- BLOCKS if checkbox unchecked
  }
  setConsoleErrors(nextErrors)
  return Object.keys(nextErrors).length === 0
}

Validation will SILENTLY block the request if:

  1. enrollmentToken is empty
  2. consoleAgentName is empty
  3. consoleTenant is empty (for non-force enrollment)
  4. consoleAck checkbox is unchecked (for first-time enrollment where status is not_enrolled)

Summary of Blocking Conditions

Condition Where Effect
Feature flag disabled Line 44-45 Entire enrollment card not rendered
LAPI not ready Line 692 Button disabled
Token empty Line 692, validation Button disabled + validation blocks
Agent name empty Validation line 260 Validation silently blocks
Tenant empty Validation line 262 Validation silently blocks
Acknowledgment unchecked Validation line 265 Validation silently blocks
Already enrolling Line 692 Button disabled

Most Likely Root Causes (Ordered by Probability)

1. LAPI Not Ready Check ⚠️ HIGH PROBABILITY

The condition (lapiStatusQuery.data && !lapiStatusQuery.data.lapi_ready) will disable the button if:

  • The status query has completed (data exists)
  • But lapi_ready is false

Check: Call GET /api/v1/admin/crowdsec/status and verify lapi_ready field.

2. Acknowledgment Checkbox Not Checked ⚠️ HIGH PROBABILITY

For first-time enrollment (status === 'not_enrolled'), the checkbox MUST be checked. The validation will silently return without making the API call.

Check: Ensure checkbox with data-testid="console-ack-checkbox" is checked.

3. Tenant Field Empty

For non-force enrollment, the tenant field is required. An empty tenant will block the request silently.

Check: Ensure tenant input has a value.

Code Sections That Need Fixes

Fix 1: Add Debug Logging (Temporary)

Add to submitConsoleEnrollment():

const submitConsoleEnrollment = async (force = false) => {
  console.log('[DEBUG] submitConsoleEnrollment called', {
    force,
    enrollmentToken: enrollmentToken.trim() ? 'present' : 'empty',
    consoleTenant,
    consoleAgentName,
    consoleAck,
    normalizedConsoleStatus,
    lapiReady: lapiStatusQuery.data?.lapi_ready,
  })
  // ... rest
}

Fix 2: Improve Validation Feedback

The validation currently sets consoleErrors but these may not be visible to the user. Ensure error messages are displayed.

Fix 3: Check LAPI Status Polling

The LAPI status query starts only after 3 seconds (initialCheckComplete). If the user clicks before then, the button may be enabled (good) but LAPI might not actually be ready (backend will fail).

  1. Open browser DevTools → Console
  2. Check if enrollment card is rendered (look for data-testid="console-enrollment-card")
  3. Inspect button element - check if disabled attribute is present
  4. Check Network tab for:
    • GET /api/v1/feature-flags response
    • GET /api/v1/admin/crowdsec/status response (check lapi_ready)
  5. Verify form state:
    • Token field has value
    • Agent name has value
    • Tenant has value
    • Checkbox is checked

API Client Verification

File: consoleEnrollment.ts

export async function enrollConsole(payload: ConsoleEnrollPayload): Promise<ConsoleEnrollmentStatus> {
  const resp = await client.post<ConsoleEnrollmentStatus>('/admin/crowdsec/console/enroll', payload)
  return resp.data
}

The API client is correctly implemented. The issue is upstream - the function is never being called because conditions are blocking it.


RESOLVED Issue A: CrowdSec Console Enrollment Not Working

Symptoms

  • User submits enrollment with valid key
  • Charon shows "Enrollment submitted" success message
  • No engine appears in CrowdSec.net dashboard
  • User reports: "The CrowdSec enrollment request NEVER reached crowdsec.net"

Root Cause (CONFIRMED)

The Bug: After a successful cscli console enroll <key> command (exit code 0), CrowdSec's help explicitly states:

"After running this command you will need to validate the enrollment in the webapp."

Exit code 0 = enrollment REQUEST sent, NOT enrollment COMPLETE.

The code incorrectly set status = enrolled when it should have been status = pending_acceptance.

Fixes Applied (December 16, 2025)

Fix A1: Backend Status Semantics

File: backend/internal/crowdsec/console_enroll.go

  • Added consoleStatusPendingAcceptance = "pending_acceptance" constant
  • Changed success status from enrolled to pending_acceptance
  • Fixed idempotency check to also skip re-enrollment when status is pending_acceptance
  • Fixed config path check to look in config/config.yaml subdirectory first
  • Updated log message to say "pending acceptance on crowdsec.net"

Fix A2: Frontend User Guidance

File: frontend/src/pages/CrowdSecConfig.tsx

  • Updated success toast to say "Accept the enrollment on app.crowdsec.net to complete registration"
  • Added isConsolePendingAcceptance variable
  • Updated canRotateKey to include pending_acceptance status
  • Added info box with link to app.crowdsec.net when status is pending_acceptance

Fix A3: Test Updates

Files: backend/internal/crowdsec/console_enroll_test.go, backend/internal/api/handlers/crowdsec_handler_test.go

  • Updated all tests expecting enrolled to expect pending_acceptance
  • Updated test for idempotency to verify second call is blocked for pending_acceptance
  • Changed EnrolledAt assertion to LastAttemptAt (enrollment is not complete yet)

Verification

All backend tests pass:

  • TestConsoleEnrollSuccess
  • TestConsoleEnrollIdempotentWhenAlreadyEnrolled
  • TestConsoleEnrollNormalizesFullCommand
  • TestConsoleEnrollDoesNotPassTenant
  • TestConsoleEnrollmentStatus/returns_pending_acceptance_status_after_enrollment
  • TestConsoleStatusAfterEnroll

Frontend type-check passes


NEW Issue B: Live Log Viewer Shows "Disconnected"

Symptoms

  • Live Log Viewer component shows "Disconnected" status badge
  • No logs appear (even when there should be logs)
  • WebSocket connection may not be establishing

Root Cause Analysis

Primary Finding: WebSocket Connection Works But Logs Are Sparse

The WebSocket implementation is correct. The issue is likely:

  1. No logs being generated - If CrowdSec/Caddy aren't actively processing requests, there are no logs
  2. Initial connection timing - The isConnected state depends on onOpen callback

Verified Working Components:

  1. Backend WebSocket Handler: backend/internal/api/handlers/logs_ws.go

    • Properly upgrades HTTP to WebSocket
    • Subscribes to BroadcastHook for log entries
    • Sends ping messages every 30 seconds
  2. Frontend Connection Logic: frontend/src/api/logs.ts

    • connectLiveLogs() correctly builds WebSocket URL
    • Properly handles onOpen, onClose, onError callbacks
  3. Frontend Component: frontend/src/components/LiveLogViewer.tsx

    • isConnected state is set in handleOpen callback
    • Connection effect runs on mount and mode changes

Potential Issues Found

Issue B1: WebSocket Route May Be Protected

Location: backend/internal/api/routes/routes.go Line 158

The WebSocket endpoint is under the protected route group, meaning it requires authentication:

protected.GET("/logs/live", handlers.LogsWebSocketHandler)

Problem: WebSocket connections may fail silently if auth token isn't being passed. The browser's native WebSocket API doesn't automatically include HTTP-only cookies or Authorization headers.

Verification Steps:

  1. Check browser DevTools Network tab for WebSocket connection
  2. Look for 401/403 responses
  3. Check if token query parameter is being sent

Issue B2: No Error Display to User

Location: frontend/src/components/LiveLogViewer.tsx Lines 170-172

const handleError = (error: Event) => {
  console.error('WebSocket error:', error);
  setIsConnected(false);
};

Problem: Errors are only logged to console, not displayed to user. User sees "Disconnected" without knowing why.

Required Fixes for Issue B

Fix B1: Add Error State Display

File: frontend/src/components/LiveLogViewer.tsx

Add error state tracking:

const [connectionError, setConnectionError] = useState<string | null>(null);

const handleError = (error: Event) => {
  console.error('WebSocket error:', error);
  setIsConnected(false);
  setConnectionError('Failed to connect to log stream. Check authentication.');
};

const handleOpen = () => {
  console.log(`${currentMode} log viewer connected`);
  setIsConnected(true);
  setConnectionError(null); // Clear any previous errors
};

Display error in UI:

{connectionError && (
  <div className="text-red-400 text-xs p-2">{connectionError}</div>
)}

Fix B2: Add Authentication to WebSocket URL

File: frontend/src/api/logs.ts

The WebSocket needs to pass auth token as query parameter since WebSocket API doesn't support custom headers:

export const connectLiveLogs = (
  filters: LiveLogFilter,
  onMessage: (log: LiveLogEntry) => void,
  onOpen?: () => void,
  onError?: (error: Event) => void,
  onClose?: () => void
): (() => void) => {
  const params = new URLSearchParams();
  if (filters.level) params.append('level', filters.level);
  if (filters.source) params.append('source', filters.source);

  // Add auth token from localStorage if available
  const token = localStorage.getItem('token');
  if (token) {
    params.append('token', token);
  }

  const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:';
  const wsUrl = `${protocol}//${window.location.host}/api/v1/logs/live?${params.toString()}`;
  // ...
};

Backend Auth Check (verify this exists): The backend auth middleware must check for token query parameter in addition to headers/cookies for WebSocket connections.

Fix B3: Add Reconnection Logic

File: frontend/src/components/LiveLogViewer.tsx

Add automatic reconnection with exponential backoff:

const [reconnectAttempts, setReconnectAttempts] = useState(0);
const maxReconnectAttempts = 5;

const handleClose = () => {
  console.log(`${currentMode} log viewer disconnected`);
  setIsConnected(false);

  // Auto-reconnect logic
  if (reconnectAttempts < maxReconnectAttempts) {
    const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
    setTimeout(() => {
      setReconnectAttempts(prev => prev + 1);
      // Trigger reconnection by updating a dependency
    }, delay);
  }
};

Summary of All Fixes

Issue A: CrowdSec Enrollment

File Change
frontend/src/pages/CrowdSecConfig.tsx Update success toast to mention acceptance step
frontend/src/pages/CrowdSecConfig.tsx Add info box with link to crowdsec.net
backend/internal/crowdsec/console_enroll.go Add pending_acceptance status constant
docs/cerberus.md Add documentation about acceptance requirement

Issue B: Live Log Viewer

File Change
frontend/src/components/LiveLogViewer.tsx Add error state display
frontend/src/api/logs.ts Pass auth token in WebSocket URL
frontend/src/components/LiveLogViewer.tsx Add reconnection logic with backoff

Testing Checklist

Enrollment Testing

  • Submit enrollment with valid key
  • Verify success message mentions acceptance step
  • Verify UI shows guidance to accept on crowdsec.net
  • Accept enrollment on crowdsec.net
  • Verify engine appears in dashboard

Live Logs Testing

  • Open Live Log Viewer page
  • Verify WebSocket connects (check Network tab)
  • Verify "Connected" badge shows
  • Generate some logs (make HTTP request to proxy)
  • Verify logs appear in viewer
  • Test disconnect/reconnect behavior

References



PREVIOUS ANALYSIS (Resolved Issues - Kept for Reference)


Issue 1: CrowdSec Card Toggle Broken on Cerberus Dashboard

Symptoms

  • CrowdSec card shows "Active" but toggle doesn't work properly
  • Shows "on and active" but CrowdSec is NOT actually on

Root Cause Analysis

Files Involved:

The Problem:

  1. Dual-Source State Conflict: The GetStatus() endpoint in security_handler.go#L61-L137 combines state from TWO sources:

    • settings table: security.crowdsec.enabled and security.crowdsec.mode
    • security_configs table: CrowdSecMode field
  2. Toggle Updates Wrong Store: When the user toggles CrowdSec via crowdsecPowerMutation:

    • It calls updateSetting('security.crowdsec.enabled', ...) which updates the settings table
    • It calls startCrowdsec() / stopCrowdsec() which updates security_configs.CrowdSecMode
  3. State Priority Mismatch: In security_handler.go#L100-L108:

    // CrowdSec enabled override (from settings table)
    if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&setting).Error; err == nil && setting.Value != "" {
        if strings.EqualFold(setting.Value, "true") {
            crowdSecMode = "local"
        } else {
            crowdSecMode = "disabled"
        }
    }
    

    The settings table overrides security_configs, but the Start() handler updates security_configs.

  4. Process State Not Verified: The frontend shows "Active" based on status.crowdsec.enabled from the API, but this is computed from DB settings, NOT from actual process status. The crowdsecStatus state (line 43-44) fetches real process status but this is a separate query displayed below the card.

The Fix

Backend (security_handler.go):

  • GetStatus() should check actual CrowdSec process status via the CrowdsecExecutor.Status() call, not just DB state

Frontend (Security.tsx):

  • The toggle's checked state should use crowdsecStatus?.running (actual process state) instead of status.crowdsec.enabled (DB setting)
  • Or sync both states properly after toggle

Issue 2: Live Log Viewer Shows "Disconnected" But Logs Appear

Symptoms

  • Shows "Disconnected" status badge but logs ARE appearing
  • Navigating away and back causes logs to disappear

Root Cause Analysis

Files Involved:

The Problem:

  1. Connection State Race Condition: In LiveLogViewer.tsx#L165-L240:

    useEffect(() => {
      // Close existing connection
      if (closeConnectionRef.current) {
        closeConnectionRef.current();
        closeConnectionRef.current = null;
      }
      // ... setup handlers ...
      return () => {
        if (closeConnectionRef.current) {
          closeConnectionRef.current();
          closeConnectionRef.current = null;
        }
        setIsConnected(false);  // <-- Issue: cleanup runs AFTER effect re-runs
      };
    }, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]);
    
  2. Dependency Array Includes isPaused: When isPaused changes, the entire effect re-runs, creating a new WebSocket. But the cleanup of the old connection sets isConnected(false) AFTER the new connection's onOpen sets isConnected(true), causing a flash of "Disconnected".

  3. Logs Disappear on Navigation: The logs state is stored locally in the component via useState<DisplayLogEntry[]>([]). When the component unmounts (navigation) and remounts, state resets to empty array. There's no persistence or caching.

The Fix

LiveLogViewer.tsx:

  1. Fix State Race: Use a ref to track connection state transitions:

    const connectionIdRef = useRef(0);
    // In effect: increment connectionId, check it in callbacks
    
  2. Remove isPaused from Dependencies: Pausing should NOT close/reopen the WebSocket. Instead, just skip adding messages when paused:

    // Current (wrong): connection is in dependency array
    // Fixed: only filter/process messages based on isPaused flag
    
  3. Persist Logs Across Navigation: Either:

    • Store logs in React Query cache
    • Use a global store (zustand/context)
    • Accept the limitation with a "Logs cleared on navigation" note

Issue 3: DEPRECATED CrowdSec Mode Toggle Still in UI

Symptoms

  • CrowdSec config page shows "Disabled/Local/External" mode toggle
  • This is confusing because CrowdSec should run based SOLELY on the Feature Flag in System Settings

Root Cause Analysis

Files Involved:

The Problem:

  1. Redundant Control Surfaces: There are THREE ways to control CrowdSec:

    • Feature Flag: feature.cerberus.enabled in Settings (System Settings page)
    • Per-Service Toggle: security.crowdsec.enabled in Settings (Security Dashboard)
    • Mode Toggle: CrowdSecMode in SecurityConfig (CrowdSec Config page)
  2. Deprecated UI Still Present: In CrowdSecConfig.tsx#L68-L100:

    <Card>
      <div className="flex items-center justify-between gap-4 flex-wrap">
        <div className="space-y-1">
          <h2 className="text-lg font-semibold">CrowdSec Mode</h2>
          <p className="text-sm text-gray-400">
            {isLocalMode ? 'CrowdSec runs locally...' : 'CrowdSec decisions are paused...'}
          </p>
        </div>
        <div className="flex items-center gap-3">
          <span className="text-sm text-gray-400">Disabled</span>
          <Switch
            checked={isLocalMode}
            onChange={(e) => handleModeToggle(e.target.checked)}
            ...
          />
          <span className="text-sm text-gray-200">Local</span>
        </div>
      </div>
    </Card>
    
  3. isLocalMode Derived from Wrong Source: Line 28:

    const isLocalMode = !!status && status.crowdsec?.mode !== 'disabled'
    

    This checks mode from security_configs.CrowdSecMode, not the feature flag.

  4. handleModeToggle Updates Wrong Setting: Lines 72-77:

    const handleModeToggle = (nextEnabled: boolean) => {
      const mode = nextEnabled ? 'local' : 'disabled'
      updateModeMutation.mutate(mode)  // Updates security.crowdsec.mode in settings
    }
    

The Fix

CrowdSecConfig.tsx:

  1. Remove the Mode Toggle Card entirely (lines 68-100)
  2. Add a notice: "CrowdSec is controlled via the toggle on the Security Dashboard or System Settings"

Backend Cleanup (optional future work):

  • Remove CrowdSecMode field from SecurityConfig model
  • Migrate all state to use only security.crowdsec.enabled setting

Issue 4: Enrollment Shows "CrowdSec is not running"

Symptoms

  • CrowdSec enrollment shows error even when enabled
  • Red warning box: "CrowdSec is not running"

Root Cause Analysis

Files Involved:

The Problem:

  1. LAPI Status Query Uses Wrong Condition: In CrowdSecConfig.tsx#L30-L40:

    const lapiStatusQuery = useQuery<CrowdSecStatus>({
      queryKey: ['crowdsec-lapi-status'],
      queryFn: statusCrowdsec,
      enabled: consoleEnrollmentEnabled && initialCheckComplete,
      refetchInterval: 5000,
      retry: false,
    })
    

    The query is enabled only when consoleEnrollmentEnabled (feature flag for console enrollment).

  2. Warning Shows When Process Not Running: In CrowdSecConfig.tsx#L172-L196:

    {lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && (
      <div className="..." data-testid="lapi-not-running-warning">
        <p>CrowdSec is not running</p>
        ...
      </div>
    )}
    

    This shows when lapiStatusQuery.data.running === false.

  3. Status Check May Return Stale Data: The Status() backend handler checks:

    • PID file existence
    • Process status via kill -0
    • LAPI health via cscli lapi status

    But if CrowdSec was just enabled, there may be a race condition where the settings say "enabled" but the process hasn't started yet.

  4. Startup Reconciliation Timing: ReconcileCrowdSecOnStartup() in crowdsec_startup.go runs at container start, but if the user enables CrowdSec AFTER startup, the process won't auto-start.

The Fix

CrowdSecConfig.tsx:

  1. Improve Warning Message: The "not running" warning should include:

    • A "Start CrowdSec" button that calls startCrowdsec() API
    • Or a link to the Security Dashboard where the toggle is
  2. Check Both States: Show the warning only when:

    • User has enabled CrowdSec (via either toggle)
    • AND the process is not running
  3. Add Auto-Retry: After enabling CrowdSec, poll status more aggressively for 30 seconds


Implementation Plan

Phase 1: Backend Fixes (Priority: High)

1.1 Unify State Source

File: backend/internal/api/handlers/security_handler.go

Change: Modify GetStatus() to include actual process status:

// Add after line 137:
// Check actual CrowdSec process status
if h.crowdsecExecutor != nil {
    ctx := c.Request.Context()
    running, pid, _ := h.crowdsecExecutor.Status(ctx, h.dataDir)
    // Override enabled state based on actual process
    crowdsecProcessRunning = running
}

Add crowdsecExecutor field to SecurityHandler struct and inject it during initialization.

1.2 Consistent Mode Updates

File: backend/internal/api/handlers/crowdsec_handler.go

Change: In Start() and Stop(), also update the settings table:

// In Start(), after updating SecurityConfig (line ~165):
if h.DB != nil {
    setting := models.Setting{Key: "security.crowdsec.enabled", Value: "true", Category: "security", Type: "bool"}
    h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(setting).FirstOrCreate(&setting)
}

// In Stop(), after updating SecurityConfig (line ~228):
if h.DB != nil {
    setting := models.Setting{Key: "security.crowdsec.enabled", Value: "false", Category: "security", Type: "bool"}
    h.DB.Where(models.Setting{Key: "security.crowdsec.enabled"}).Assign(setting).FirstOrCreate(&setting)
}

Phase 2: Frontend Fixes (Priority: High)

2.1 Fix CrowdSec Toggle State

File: frontend/src/pages/Security.tsx

Change 1: Use actual process status for toggle (around line 203):

// Replace: checked={status.crowdsec.enabled}
// With:
checked={crowdsecStatus?.running ?? status.crowdsec.enabled}

Change 2: After successful toggle, refetch both status and process status

2.2 Fix LiveLogViewer Connection State

File: frontend/src/components/LiveLogViewer.tsx

Change 1: Remove isPaused from useEffect dependencies (line 237):

// Change from:
}, [currentMode, filters, securityFilters, isPaused, maxLogs, showBlockedOnly]);
// To:
}, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]);

Change 2: Handle pause inside message handler (line 192):

const handleMessage = (entry: SecurityLogEntry) => {
  // isPaused check stays here, not in effect
  if (isPausedRef.current) return;  // Use ref instead of state
  // ... rest of handler
};

Change 3: Add ref for isPaused:

const isPausedRef = useRef(isPaused);
useEffect(() => { isPausedRef.current = isPaused; }, [isPaused]);

2.3 Remove Deprecated Mode Toggle

File: frontend/src/pages/CrowdSecConfig.tsx

Change: Remove the entire "CrowdSec Mode" Card (lines 291-311 in current render):

// DELETE: The entire <Card> block containing "CrowdSec Mode"

Add informational banner instead:

{/* Replace mode toggle with info banner */}
<div className="bg-blue-900/20 border border-blue-700 rounded-lg p-4">
  <p className="text-sm text-blue-200">
    <strong>Note:</strong> CrowdSec is controlled via the toggle on the{' '}
    <Link to="/security" className="underline">Security Dashboard</Link>.
    Enable/disable CrowdSec there, then configure presets and files here.
  </p>
</div>

2.4 Fix Enrollment Warning

File: frontend/src/pages/CrowdSecConfig.tsx

Change: Add "Start CrowdSec" button to the warning (around line 185):

<Button
  variant="primary"
  size="sm"
  onClick={async () => {
    try {
      await startCrowdsec();
      toast.info('Starting CrowdSec...');
      lapiStatusQuery.refetch();
    } catch (err) {
      toast.error('Failed to start CrowdSec');
    }
  }}
>
  Start CrowdSec
</Button>

Phase 3: Remove Deprecated Mode (Priority: Medium)

3.1 Backend Model Cleanup (Future)

File: backend/internal/models/security_config.go

Mark CrowdSecMode as deprecated with migration path.

3.2 Settings Migration

Create migration to ensure all users have security.crowdsec.enabled setting derived from CrowdSecMode.


Files to Modify Summary

Backend

File Changes
backend/internal/api/handlers/security_handler.go Add process status check to GetStatus()
backend/internal/api/handlers/crowdsec_handler.go Sync settings table in Start()/Stop()

Frontend

File Changes
frontend/src/pages/Security.tsx Use crowdsecStatus?.running for toggle state
frontend/src/components/LiveLogViewer.tsx Fix isPaused dependency, use ref
frontend/src/pages/CrowdSecConfig.tsx Remove mode toggle, add info banner, add "Start CrowdSec" button

Testing Checklist

  • Toggle CrowdSec on Security Dashboard → verify process starts
  • Toggle CrowdSec off → verify process stops
  • Refresh page → verify toggle state matches process state
  • Open LiveLogViewer → verify "Connected" status
  • Pause logs → verify connection remains open
  • Navigate away and back → logs are cleared (expected) but connection re-establishes
  • CrowdSec Config page → no mode toggle, info banner present
  • Enrollment section → shows "Start CrowdSec" button when process not running