diff --git a/docs/plans/current_spec.md b/docs/plans/current_spec.md index 4a91ff4a..798b738a 100644 --- a/docs/plans/current_spec.md +++ b/docs/plans/current_spec.md @@ -1,1737 +1,229 @@ -# Investigation Report: Re-Enrollment & Live Log Viewer Issues +# Security Dashboard Live Log Viewer Bug Fix Plan **Date:** December 16, 2025 -**Investigator:** GitHub Copilot -**Status:** ✅ Investigation Complete - Root Causes Identified - ---- - -## 📋 Executive Summary - -**Issue 1: Re-enrollment with NEW key didn't work** -- **Root Cause:** `force` parameter is correctly sent by frontend, but backend has LAPI availability check that may time out -- **Status:** ✅ Working as designed - re-enrollment requires `force=true` and uses `--overwrite` flag -- **User Issue:** User needed to use SAME key because new key was invalid or enrollment was already pending - -**Issue 2: Live Log Viewer shows "Disconnected"** -- **Root Cause:** WebSocket endpoint is `/api/v1/cerberus/logs/ws` (security logs), NOT `/api/v1/logs/live` (app logs) -- **Status:** ✅ Working as designed - different endpoints for different log types -- **User Issue:** Frontend defaults to wrong mode or wrong endpoint - ---- - -## � Issue 1: Re-Enrollment Investigation (December 16, 2025) - -### User Report -> "Re-enrollment with NEW key didn't work - I had to use the SAME enrollment token from the first time." - -### Investigation Findings - -#### Frontend Code Analysis - -**File:** `frontend/src/pages/CrowdSecConfig.tsx` - -**Re-enrollment Button** (Line 588): -```tsx - -``` - -**Submission Function** (Line 278): -```tsx -const submitConsoleEnrollment = async (force = false) => { - // ... validation ... - await enrollConsoleMutation.mutateAsync({ - enrollment_key: enrollmentToken.trim(), - tenant: tenantValue, - agent_name: consoleAgentName.trim(), - force, // ✅ CORRECTLY PASSES force PARAMETER - }) -} -``` - -**API Call** (`frontend/src/api/consoleEnrollment.ts`): -```typescript -export interface ConsoleEnrollPayload { - enrollment_key: string - tenant?: string - agent_name: string - force?: boolean // ✅ DEFINED IN INTERFACE -} - -export async function enrollConsole(payload: ConsoleEnrollPayload): Promise { - const resp = await client.post('/admin/crowdsec/console/enroll', payload) - return resp.data -} -``` - -✅ **Verdict:** Frontend correctly sends `force: true` when re-enrolling. - -#### Backend Code Analysis - -**File:** `backend/internal/crowdsec/console_enroll.go` - -**Force Parameter Handling** (Line 167-169): -```go -// Add overwrite flag if force is requested -if req.Force { - args = append(args, "--overwrite") // ✅ ADDS --overwrite FLAG -} -``` - -**Command Execution** (Line 178): -```go -logger.Log().WithField("tenant", tenant).WithField("agent", agent).WithField("force", req.Force).WithField("correlation_id", rec.LastCorrelationID).WithField("config", configPath).Info("starting crowdsec console enrollment") -out, cmdErr := s.exec.ExecuteWithEnv(cmdCtx, "cscli", args, nil) -``` - -**Docker Logs Evidence:** -``` -{"agent":"Charon","config":"/app/data/crowdsec/config/config.yaml","correlation_id":"de557798-3081-4bc2-9dbf-10e035f09eaf","force":true,"level":"info","msg":"starting crowdsec console enrollment","tenant":"5e045b3c-5196-406b-99cd-503bc64c7b0d","time":"2025-12-15T22:43:10-05:00"} -``` -✅ Shows `"force":true` in the log - -**Error in Logs:** -``` -Error: cscli console enroll: could not enroll instance: API error: the attachment key provided is not valid (hint: get your enrollement key from console, crowdsec login or machine id are not valid values) -``` - -✅ **Verdict:** Backend correctly receives `force=true` and passes `--overwrite` to cscli. The enrollment FAILED because the key itself was invalid according to CrowdSec API. - -#### LAPI Availability Check - -**Critical Code** (Line 223-244): -```go -func (s *ConsoleEnrollmentService) checkLAPIAvailable(ctx context.Context) error { - maxRetries := 3 - retryDelay := 2 * time.Second - - var lastErr error - for i := 0; i < maxRetries; i++ { - args := []string{"lapi", "status"} - configPath := s.findConfigPath() - if configPath != "" { - args = append([]string{"-c", configPath}, args...) - } - - checkCtx, cancel := context.WithTimeout(ctx, 3*time.Second) - out, err := s.exec.ExecuteWithEnv(checkCtx, "cscli", args, nil) - cancel() - - if err == nil { - logger.Log().WithField("config", configPath).Debug("LAPI check succeeded") - return nil // LAPI is available - } - - lastErr = err - if i < maxRetries-1 { - logger.Log().WithError(err).WithField("attempt", i+1).WithField("output", string(out)).Debug("LAPI not ready, retrying") - time.Sleep(retryDelay) - } - } - - return fmt.Errorf("CrowdSec Local API is not running after %d attempts - please wait for LAPI to initialize (typically 5-10 seconds after enabling CrowdSec): %w", maxRetries, lastErr) -} -``` - -**Frontend LAPI Check:** -```tsx -const lapiStatusQuery = useQuery({ - queryKey: ['crowdsec-lapi-status'], - queryFn: statusCrowdsec, - enabled: consoleEnrollmentEnabled && initialCheckComplete, - refetchInterval: 5000, // Poll every 5 seconds - retry: false, -}) -``` - -✅ **Verdict:** LAPI check is robust with 3 retries and 2-second delays. Frontend polls every 5 seconds. - -### Root Cause Determination - -**The re-enrollment with "NEW key" failed because:** - -1. ✅ `force=true` was correctly sent -2. ✅ `--overwrite` flag was correctly added -3. ❌ **The new enrollment key was INVALID** according to CrowdSec API - -**Evidence from logs:** -``` -Error: cscli console enroll: could not enroll instance: API error: the attachment key provided is not valid -``` - -**Why the SAME key worked:** -- The original key was still valid in CrowdSec's system -- Using the same key with `--overwrite` flag allowed re-enrollment to the same account - -### Conclusion - -✅ **No bug found.** The implementation is correct. User's new enrollment key was rejected by CrowdSec API. - -**User Action Required:** -1. Generate a new enrollment key from app.crowdsec.net -2. Ensure the key is copied completely (no spaces/newlines) -3. Try re-enrollment again - ---- - -## 🔍 Issue 2: Live Log Viewer "Disconnected" (December 16, 2025) - -### User Report -> "Live Log Viewer shows 'Disconnected' and no logs appear. I only need SECURITY logs (CrowdSec/Cerberus), not application logs." - -### Investigation Findings - -#### LiveLogViewer Component Analysis - -**File:** `frontend/src/components/LiveLogViewer.tsx` - -**Mode Toggle** (Line 350-366): -```tsx -
- - -
-``` - -**WebSocket Connection Logic** (Line 155-213): -```tsx -useEffect(() => { - // ... close existing connection ... - - if (currentMode === 'security') { - // Connect to security logs endpoint - closeConnectionRef.current = connectSecurityLogs( - effectiveFilters, - handleSecurityMessage, - handleOpen, - handleError, - handleClose - ); - } else { - // Connect to application logs endpoint - closeConnectionRef.current = connectLiveLogs( - filters, - handleLiveMessage, - handleOpen, - handleError, - handleClose - ); - } -}, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]); -``` - -#### WebSocket Endpoints - -**Application Logs:** -```typescript -// frontend/src/api/logs.ts:95-135 -const wsUrl = `${protocol}//${window.location.host}/api/v1/logs/live?${params.toString()}`; -``` - -**Security Logs:** -```typescript -// frontend/src/api/logs.ts:153-174 -const wsUrl = `${protocol}//${window.location.host}/api/v1/cerberus/logs/ws?${params.toString()}`; -``` - -#### Backend WebSocket Handlers - -**Application Logs Handler:** -```go -// backend/internal/api/handlers/logs_ws.go -func LogsWebSocketHandler(c *gin.Context) { - // Subscribes to logger.BroadcastHook for app logs - hook := logger.GetBroadcastHook() - logChan := hook.Subscribe(subscriberID) -} -``` - -**Security Logs Handler:** -```go -// backend/internal/api/handlers/cerberus_logs_ws.go -func (h *CerberusLogsHandler) LiveLogs(c *gin.Context) { - // Subscribes to LogWatcher for Caddy access logs - logChan := h.watcher.Subscribe() -} -``` - -**LogWatcher Implementation:** -```go -// backend/internal/services/log_watcher.go -func NewLogWatcher(logPath string) *LogWatcher { - // Tails /app/data/logs/access.log - return &LogWatcher{ - logPath: logPath, // Defaults to access.log - } -} -``` - -✅ **LogWatcher is actively tailing:** Verified via Docker logs showing successful access.log reads - -#### Access Log Verification - -**Command:** `docker exec charon tail -20 /app/data/logs/access.log` - -✅ **Result:** Access log has MANY recent entries (20+ lines shown, JSON format, proper structure) - -**Sample Entry:** -```json -{ - "level":"info", - "ts":1765577040.5798745, - "logger":"http.log.access.access_log", - "msg":"handled request", - "request": { - "remote_ip":"172.59.136.4", - "method":"GET", - "host":"sonarr.hatfieldhosted.com", - "uri":"/api/v3/command" - }, - "status":200, - "duration":0.066689363 -} -``` - -#### Routes Configuration - -**File:** `backend/internal/api/routes/routes.go` - -```go -// Line 158 -protected.GET("/logs/live", handlers.LogsWebSocketHandler) - -// Line 394 -protected.GET("/cerberus/logs/ws", cerberusLogsHandler.LiveLogs) -``` - -✅ Both endpoints are registered and protected (require authentication) - -### Root Cause Analysis - -#### Possible Issues - -1. **Default Mode May Be Wrong** - - Component defaults to `mode='application'` (Line 142) - - User needs security logs, which requires `mode='security'` - -2. **WebSocket Authentication** - - Both endpoints are under `protected` route group - - WebSocket connections may not automatically include auth headers - - Native WebSocket API doesn't support custom headers - -3. **No WebSocket Connection Logs** - - Docker logs show NO "WebSocket connection attempt" messages - - This suggests connections are NOT reaching the backend - -4. **Frontend Connection State** - - `isConnected` is set only in `onOpen` callback - - If connection fails during upgrade, `onOpen` never fires - - Result: "Disconnected" status persists - -### Testing Commands - -```bash -# Check if LogWatcher is running -docker logs charon 2>&1 | grep -i "LogWatcher started" - -# Check for WebSocket connection attempts -docker logs charon 2>&1 | grep -i "websocket" | tail -20 - -# Check if Cerberus logs handler is initialized -docker logs charon 2>&1 | grep -i "cerberus.*logs" | tail -10 -``` - -**Result from earlier grep:** -``` -[GIN-debug] GET /api/v1/cerberus/logs/ws --> ... .LiveLogs-fm (10 handlers) -``` -✅ Route is registered - -**No connection attempt logs found** → Connections are NOT reaching backend - -### Diagnosis - -**Most Likely Issue:** WebSocket authentication failure - -1. Frontend attempts WebSocket connection -2. Browser sends `ws://` or `wss://` request without auth headers -3. Backend auth middleware rejects with 401 -4. WebSocket upgrade fails silently -5. `onError` fires but doesn't show useful message to user - -### Recommended Fixes - -#### Fix 1: Add Auth Token to WebSocket URL - -**File:** `frontend/src/api/logs.ts` - -```typescript -export const connectSecurityLogs = ( - filters: SecurityLogFilter, - onMessage: (log: SecurityLogEntry) => void, - onOpen?: () => void, - onError?: (error: Event) => void, - onClose?: () => void -): (() => void) => { - const params = new URLSearchParams(); - if (filters.source) params.append('source', filters.source); - if (filters.level) params.append('level', filters.level); - if (filters.ip) params.append('ip', filters.ip); - if (filters.host) params.append('host', filters.host); - if (filters.blocked_only) params.append('blocked_only', 'true'); - - // ✅ ADD AUTH TOKEN - const token = localStorage.getItem('token') || sessionStorage.getItem('token'); - if (token) { - params.append('token', token); - } - - const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'; - const wsUrl = `${protocol}//${window.location.host}/api/v1/cerberus/logs/ws?${params.toString()}`; - // ... -}; -``` - -**Apply same fix to** `connectLiveLogs()` - -#### Fix 2: Backend Auth Middleware Must Check Query Param - -**File:** `backend/internal/api/middleware/auth.go` (assumed location) - -Ensure the auth middleware checks for token in: -1. `Authorization` header -2. Cookie (if using session auth) -3. **Query parameter `token`** (for WebSocket compatibility) - -#### Fix 3: Add Error Display to UI - -**File:** `frontend/src/components/LiveLogViewer.tsx` - -```tsx -const [connectionError, setConnectionError] = useState(null); - -const handleError = (error: Event) => { - console.error('WebSocket error:', error); - setIsConnected(false); - setConnectionError('Failed to connect to log stream. Check authentication.'); -}; - -const handleOpen = () => { - console.log(`${currentMode} log viewer connected`); - setIsConnected(true); - setConnectionError(null); -}; - -// In JSX: -{connectionError && ( -
- {connectionError} -
-)} -``` - -#### Fix 4: Change Default Mode to Security - -**File:** `frontend/src/components/LiveLogViewer.tsx` (Line 142) - -```tsx -export function LiveLogViewer({ - filters = {}, - securityFilters = {}, - mode = 'security', // ✅ CHANGE FROM 'application' TO 'security' - maxLogs = 500, - className = '', -}: LiveLogViewerProps) { -``` - -### Verification Steps - -1. **Check browser DevTools Network tab:** - - Look for WebSocket connection to `/api/v1/cerberus/logs/ws` - - Check status code (should be 101 Switching Protocols, not 401/403) - -2. **Check backend logs:** - - Should see "Cerberus logs WebSocket connection attempt" - - Should see "Cerberus logs WebSocket connected" - -3. **Generate test traffic:** - - Make HTTP request to any proxied host - - Check if log appears in viewer - ---- - -## 📋 CrowdSec Re-Enrollment UX Research (PREVIOUS SECTION - KEPT FOR REFERENCE) - -### CrowdSec CLI Capabilities - -**Available Console Commands (`cscli console --help`):** - -```text -Available Commands: - disable Disable a console option - enable Enable a console option - enroll Enroll this instance to https://app.crowdsec.net - status Shows status of the console options -``` - -**Enroll Command Flags (`cscli console enroll --help`):** - -```text -Flags: - -d, --disable strings Disable console options - -e, --enable strings Enable console options - -h, --help help for enroll - -n, --name string Name to display in the console - --overwrite Force enroll the instance ← KEY FLAG FOR RE-ENROLLMENT - -t, --tags strings Tags to display in the console -``` - -**Key Finding: NO "unenroll" or "disconnect" command exists in CrowdSec CLI.** - -The `disable --all` command only disables data sharing options (custom, tainted, manual, context, console_management) - it does NOT unenroll from the console. - -### Current Data Model Analysis - -**Model: `CrowdsecConsoleEnrollment`** ([crowdsec_console_enrollment.go](../../backend/internal/models/crowdsec_console_enrollment.go)): - -```go -type CrowdsecConsoleEnrollment struct { - ID uint // Primary key - UUID string // Unique identifier - Status string // not_enrolled, enrolling, pending_acceptance, enrolled, failed - Tenant string // Organization identifier - AgentName string // Display name in console - EncryptedEnrollKey string // ← KEY IS STORED (encrypted with AES-GCM) - LastError string // Error message if failed - LastCorrelationID string // For debugging - LastAttemptAt *time.Time - EnrolledAt *time.Time - LastHeartbeatAt *time.Time - CreatedAt time.Time - UpdatedAt time.Time -} -``` - -**✅ Current Implementation Already Stores Enrollment Key:** - -- The key is encrypted using AES-256-GCM with a key derived from a secret -- Stored in `EncryptedEnrollKey` field (excluded from JSON via `json:"-"`) -- Encryption implemented in `console_enroll.go` lines 377-409 - -### Enrollment Key Lifecycle (from crowdsec.net) - -1. **Generation**: User generates enrollment key on app.crowdsec.net -2. **Usage**: Key is used with `cscli console enroll ` to request enrollment -3. **Validation**: CrowdSec validates the key against their API -4. **Acceptance**: User must accept enrollment request on app.crowdsec.net -5. **Reusability**: The SAME key can be used multiple times with `--overwrite` flag -6. **Expiration**: Keys do not expire but may be revoked by user on console - -### UX Options Evaluation - -#### Option A: "Re-enroll" Button Requiring NEW Key ✅ RECOMMENDED - -**How it works:** - -- User provides a new enrollment key from crowdsec.net -- Backend sends `cscli console enroll --overwrite --name ` -- User accepts on crowdsec.net - -**Pros:** - -- ✅ Simple implementation (already supported via `force: true`) -- ✅ Secure - no key storage concerns beyond current encrypted storage -- ✅ Fresh key guarantees user has console access -- ✅ Matches CrowdSec's intended workflow - -**Cons:** - -- ⚠️ Requires user to visit crowdsec.net to get new key -- ⚠️ Extra step for user - -**Current UI Support:** - -- "Rotate key" button already calls `submitConsoleEnrollment(true)` with `force=true` -- "Retry enrollment" button appears when status is `degraded` - -#### Option B: "Re-enroll" with STORED Key - -**How it works:** - -- Use the encrypted key already stored in `EncryptedEnrollKey` -- Decrypt and re-send enrollment request - -**Pros:** - -- ✅ Simplest UX - one-click re-enrollment -- ✅ Key is already stored and encrypted - -**Cons:** - -- ⚠️ Security concern: Re-using stored keys increases exposure window -- ⚠️ Key may have been revoked on crowdsec.net without Charon knowing -- ⚠️ Old key may belong to different CrowdSec account -- ⚠️ Violates principle of least privilege - -**Current Implementation Gap:** - -- `decrypt()` method exists but is marked as "only used in tests" -- Would need new endpoint to retrieve stored key for re-enrollment - -#### Option C: "Unenroll" + Manual Re-enroll ❌ NOT SUPPORTED - -**How it would work:** - -- Clear local enrollment state -- User goes through fresh enrollment - -**Blockers:** - -- ❌ CrowdSec CLI has NO unenroll/disconnect command -- ❌ Would require manual deletion of config files -- ❌ May leave orphaned engine on crowdsec.net console - -**Files that would need cleanup:** - -```text -/app/data/crowdsec/config/console.yaml # Console options -/app/data/crowdsec/config/online_api_credentials.yaml # CAPI credentials -``` - -Note: Deleting these files would also affect CAPI registration, not just console enrollment. - -### 🎯 Recommended Approach: Option A (Enhanced) - -**Justification:** - -1. **Security First**: CrowdSec enrollment keys should be treated as sensitive credentials -2. **User Intent**: Re-enrollment implies user wants fresh connection to console -3. **Minimal Risk**: User must actively obtain new key, preventing accidental re-enrollments -4. **CrowdSec Best Practice**: The `--overwrite` flag is CrowdSec's designed mechanism for this - -**UI Flow Enhancement:** - -```text -┌─────────────────────────────────────────────────────────────────┐ -│ Console Enrollment [?] Help │ -├─────────────────────────────────────────────────────────────────┤ -│ │ -│ Status: ● Enrolled │ -│ Agent: Charon-Home │ -│ Tenant: my-organization │ -│ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ Need to re-enroll? │ │ -│ │ │ │ -│ │ To connect to a different CrowdSec console account or │ │ -│ │ reset your enrollment, you'll need a new enrollment key │ │ -│ │ from app.crowdsec.net. │ │ -│ │ │ │ -│ │ [Get new key ↗] [Re-enroll with new key] │ │ -│ └─────────────────────────────────────────────────────────┘ │ -│ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ New Enrollment Key: [________________________] │ │ -│ │ Agent Name: [Charon-Home_____________] │ │ -│ │ Tenant: [my-organization_________] │ │ -│ │ │ │ -│ │ [Re-enroll] │ │ -│ └─────────────────────────────────────────────────────────┘ │ -│ │ -└─────────────────────────────────────────────────────────────────┘ -``` - -### Implementation Steps - -#### Step 1: Update Frontend UI (Priority: HIGH) - -**File:** `frontend/src/pages/CrowdSecConfig.tsx` - -Changes: - -1. Add "Re-enroll" section visible when `status === 'enrolled'` -2. Add expandable/collapsible panel for re-enrollment -3. Add link to app.crowdsec.net/enrollment-keys -4. Rename "Rotate key" button to "Re-enroll" for clarity -5. Add explanatory text about why re-enrollment requires new key - -#### Step 2: Improve Backend Logging (Priority: MEDIUM) - -**File:** `backend/internal/crowdsec/console_enroll.go` - -Changes: - -1. Add logging when enrollment is skipped due to existing status -2. Return `skipped: true` field in response when idempotency check triggers -3. Consider adding `reason` field to explain why enrollment was skipped - -#### Step 3: Add "Clear Enrollment" Admin Function (Priority: LOW) - -**File:** `backend/internal/api/handlers/crowdsec_handler.go` - -New endpoint: `DELETE /api/v1/admin/crowdsec/console/enrollment` - -Purpose: Reset local enrollment state to `not_enrolled` without touching CrowdSec config files. - -Note: This does NOT unenroll from crowdsec.net - that must be done manually on the console. - -#### Step 4: Documentation Update (Priority: MEDIUM) - -**File:** `docs/cerberus.md` - -Add section explaining: - -- Why re-enrollment requires new key -- How to get new enrollment key from crowdsec.net -- What happens to old engine on crowdsec.net (must be manually removed) -- Troubleshooting common enrollment issues +**Issue:** Live Log Viewer shows "Disconnected" error with rapid connect/disconnect flashing --- ## Executive Summary -This document covers THREE issues: +**ROOT CAUSE IDENTIFIED:** The WebSocket authentication token retrieval uses the wrong localStorage key. -1. **CrowdSec Enrollment Backend** 🔴 **CRITICAL BUG FOUND**: Backend returns 200 OK but `cscli` is NEVER executed - - **Root Cause**: Silent idempotency check returns success without running enrollment command - - **Evidence**: POST returns 200 OK with 137ms latency, but NO `cscli` logs appear - - **Fix Required**: Add logging for skipped enrollments and clear guidance to use `force=true` - -2. **Live Log Viewer**: Shows "Disconnected" status (Analysis pending implementation) - -3. **Stale Database State**: Old `enrolled` status from pre-fix deployment blocks new enrollments - - **Symptoms**: User clicks Enroll, sees 200 OK, but nothing happens on crowdsec.net - - **Root Cause**: Database has `status=enrolled` from before the `pending_acceptance` fix was deployed +The auth token is stored under `charon_auth_token` but the WebSocket code reads from `token` (which doesn't exist), causing **every WebSocket connection to be sent without authentication**, resulting in immediate 401 rejection. --- -## 🔴 CRITICAL BUG: Silent Idempotency Check (December 16, 2025) +## 1. Root Cause Analysis -### Problem Statement +### Token Storage Key Mismatch -User submits enrollment form, backend returns 200 OK (confirmed in Docker logs), but the enrollment NEVER appears on crowdsec.net. No `cscli` command execution visible in logs. +| Component | localStorage Key Used | Correct? | +|-----------|----------------------|----------| +| AuthContext.tsx (login) | `charon_auth_token` | ✅ Source of truth | +| AuthContext.tsx (logout) | `charon_auth_token` | ✅ | +| client.ts (axios) | Gets token from AuthContext | ✅ | +| **logs.ts (WebSocket)** | **`token`** | ❌ **WRONG** | -### Docker Log Evidence - -``` -POST /api/v1/admin/crowdsec/console/enroll → 200 OK (137ms latency) -NO "starting crowdsec console enrollment" log ← cscli NEVER executed -NO cscli output logs -``` - -### Code Path Analysis - -**File:** [backend/internal/crowdsec/console_enroll.go](backend/internal/crowdsec/console_enroll.go) - -#### Step 1: Handler calls service (line 865-920) - -```go -// crowdsec_handler.go:888-895 -status, err := h.Console.Enroll(ctx, crowdsec.ConsoleEnrollRequest{ - EnrollmentKey: payload.EnrollmentKey, - Tenant: payload.Tenant, - AgentName: payload.AgentName, - Force: payload.Force, // <-- User did NOT check Force checkbox -}) -``` - -#### Step 2: Idempotency Check (lines 155-165) ⚠️ BUG HERE - -```go -// console_enroll.go:155-165 -if rec.Status == consoleStatusEnrolling { - return s.statusFromModel(rec), fmt.Errorf("enrollment already in progress") -} -// If already enrolled or pending acceptance, skip unless Force is set -if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force { - return s.statusFromModel(rec), nil // <-- RETURNS SUCCESS WITHOUT LOGGING OR RUNNING CSCLI! -} -``` - -#### Step 3: Database State (confirmed via container inspection) - -``` -uuid: fb129bb5-d223-4c66-941c-a30e2e2b3040 -status: enrolled ← SET BY OLD CODE BEFORE pending_acceptance FIX -tenant: 5e045b3c-5196-406b-99cd-503bc64c7b0d -agent_name: Charon -``` - -### Root Cause - -1. **Historical State**: User enrolled BEFORE the `pending_acceptance` fix was deployed -2. **Old Code Bug**: Previous code set `status = enrolled` immediately after cscli returned exit 0 -3. **Silent Skip**: Current code silently skips enrollment when `status` is `enrolled` (or `pending_acceptance`) -4. **No User Feedback**: Returns 200 OK without logging or informing user enrollment was skipped - -### Manual Test Results from Container - -```bash -# cscli is available and working -docker exec charon cscli console enroll --help -# ✅ Shows help - -# LAPI is running -docker exec charon cscli lapi status -# ✅ "You can successfully interact with Local API (LAPI)" - -# Console status -docker exec charon cscli console status -# ✅ Shows options table (custom=true, tainted=true) - -# Manual enrollment with invalid key shows proper error -docker exec charon cscli console enroll --name test TESTINVALIDKEY123 -# ✅ Error: "the attachment key provided is not valid" - -# Config path exists and is correct -docker exec charon ls /app/data/crowdsec/config/config.yaml -# ✅ File exists -``` - -### Required Fixes - -#### Fix 1: Add Logging for Skipped Enrollments - -**File:** `backend/internal/crowdsec/console_enroll.go` lines 162-165 - -**Current:** -```go -if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force { - return s.statusFromModel(rec), nil -} -``` - -**Fixed:** -```go -if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force { - logger.Log().WithField("status", rec.Status).WithField("agent", rec.AgentName).WithField("tenant", rec.Tenant).Info("enrollment skipped: already enrolled or pending - use force=true to re-enroll") - return s.statusFromModel(rec), nil -} -``` - -#### Fix 2: Add "Skipped" Indicator to Response - -Add a field to indicate enrollment was skipped vs actually submitted: - -```go -type ConsoleEnrollmentStatus struct { - Status string `json:"status"` - Skipped bool `json:"skipped,omitempty"` // <-- NEW - // ... other fields -} -``` - -And in the idempotency return: -```go -status := s.statusFromModel(rec) -status.Skipped = true -return status, nil -``` - -#### Fix 3: Frontend Should Show "Already Enrolled" State - -**File:** `frontend/src/pages/CrowdSecConfig.tsx` - -When `consoleStatusQuery.data?.status === 'enrolled'` or `'pending_acceptance'`: -- Show "You are already enrolled" message -- Show "Force Re-Enrollment" button with checkbox -- Explain that acceptance on crowdsec.net may be required - -#### Fix 4: Migrate Stale "enrolled" Status to "pending_acceptance" - -Either: -1. Add a database migration to change all `enrolled` to `pending_acceptance` -2. Or have users click "Force Re-Enroll" once - -### Workaround for User - -Until fix is deployed, user can re-enroll using the Force option: - -1. In the UI: Check "Force re-enrollment" checkbox before clicking Enroll -2. Or via curl: -```bash -curl -X POST http://localhost:8080/api/v1/admin/crowdsec/console/enroll \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - -d '{"enrollment_key":"", "agent_name":"Charon", "force":true}' -``` - ---- - -## Previous Frontend Analysis (Still Valid for Reference) - -### Enrollment Flow Path - -``` -User clicks "Enroll" button - ↓ -CrowdSecConfig.tsx: -``` - -### Phase 3: Remove Deprecated Mode (Priority: Medium) - -#### 3.1 Backend Model Cleanup (Future) -**File**: [backend/internal/models/security_config.go](backend/internal/models/security_config.go) - -Mark `CrowdSecMode` as deprecated with migration path. - -#### 3.2 Settings Migration -Create migration to ensure all users have `security.crowdsec.enabled` setting derived from `CrowdSecMode`. - --- -## Files to Modify Summary +## 7. Regression Prevention -### Backend -| File | Changes | -|------|---------| -| `backend/internal/api/handlers/security_handler.go` | Add process status check to `GetStatus()` | -| `backend/internal/api/handlers/crowdsec_handler.go` | Sync `settings` table in `Start()`/`Stop()` | +### Add ESLint Rule (Optional) -### Frontend -| File | Changes | -|------|---------| -| `frontend/src/pages/Security.tsx` | Use `crowdsecStatus?.running` for toggle state | -| `frontend/src/components/LiveLogViewer.tsx` | Fix `isPaused` dependency, use ref | -| `frontend/src/pages/CrowdSecConfig.tsx` | Remove mode toggle, add info banner, add "Start CrowdSec" button | +Consider adding a custom ESLint rule or code comment to prevent future mismatches: + +```typescript +// frontend/src/constants/auth.ts +/** + * IMPORTANT: This key must match across all auth-related code. + * Used by: AuthContext.tsx, logs.ts (WebSocket) + * @see AuthContext.tsx for login/logout logic + */ +export const AUTH_TOKEN_STORAGE_KEY = 'charon_auth_token'; +``` + +### Update Contributing Guidelines + +Add to `.github/copilot-instructions.md` or `CONTRIBUTING.md`: + +> **Auth Token Key:** Always use `charon_auth_token` for localStorage operations. Import from shared constants when possible. --- -## Testing Checklist +## 8. Summary -- [ ] Toggle CrowdSec on Security Dashboard → verify process starts -- [ ] Toggle CrowdSec off → verify process stops -- [ ] Refresh page → verify toggle state matches process state -- [ ] Open LiveLogViewer → verify "Connected" status -- [ ] Pause logs → verify connection remains open -- [ ] Navigate away and back → logs are cleared (expected) but connection re-establishes -- [ ] CrowdSec Config page → no mode toggle, info banner present -- [ ] Enrollment section → shows "Start CrowdSec" button when process not running +| Aspect | Details | +|--------|---------| +| **Bug** | WebSocket reads wrong localStorage key | +| **Impact** | Security Dashboard Live Logs never connects | +| **Fix** | Change `token` → `charon_auth_token` in 2 locations | +| **Risk** | Low - isolated change, no side effects | +| **Effort** | 5 minutes | +| **Testing** | Manual verification + existing unit tests | + +--- + +## 9. Implementation Command + +```bash +# Single sed command to fix both occurrences +sed -i "s/localStorage.getItem('token')/localStorage.getItem('charon_auth_token')/g" \ + frontend/src/api/logs.ts +``` + +Or apply the fix manually in VS Code. diff --git a/docs/plans/prev_spec_archived_dec16.md b/docs/plans/prev_spec_archived_dec16.md new file mode 100644 index 00000000..4a91ff4a --- /dev/null +++ b/docs/plans/prev_spec_archived_dec16.md @@ -0,0 +1,1737 @@ +# Investigation Report: Re-Enrollment & Live Log Viewer Issues + +**Date:** December 16, 2025 +**Investigator:** GitHub Copilot +**Status:** ✅ Investigation Complete - Root Causes Identified + +--- + +## 📋 Executive Summary + +**Issue 1: Re-enrollment with NEW key didn't work** +- **Root Cause:** `force` parameter is correctly sent by frontend, but backend has LAPI availability check that may time out +- **Status:** ✅ Working as designed - re-enrollment requires `force=true` and uses `--overwrite` flag +- **User Issue:** User needed to use SAME key because new key was invalid or enrollment was already pending + +**Issue 2: Live Log Viewer shows "Disconnected"** +- **Root Cause:** WebSocket endpoint is `/api/v1/cerberus/logs/ws` (security logs), NOT `/api/v1/logs/live` (app logs) +- **Status:** ✅ Working as designed - different endpoints for different log types +- **User Issue:** Frontend defaults to wrong mode or wrong endpoint + +--- + +## � Issue 1: Re-Enrollment Investigation (December 16, 2025) + +### User Report +> "Re-enrollment with NEW key didn't work - I had to use the SAME enrollment token from the first time." + +### Investigation Findings + +#### Frontend Code Analysis + +**File:** `frontend/src/pages/CrowdSecConfig.tsx` + +**Re-enrollment Button** (Line 588): +```tsx + +``` + +**Submission Function** (Line 278): +```tsx +const submitConsoleEnrollment = async (force = false) => { + // ... validation ... + await enrollConsoleMutation.mutateAsync({ + enrollment_key: enrollmentToken.trim(), + tenant: tenantValue, + agent_name: consoleAgentName.trim(), + force, // ✅ CORRECTLY PASSES force PARAMETER + }) +} +``` + +**API Call** (`frontend/src/api/consoleEnrollment.ts`): +```typescript +export interface ConsoleEnrollPayload { + enrollment_key: string + tenant?: string + agent_name: string + force?: boolean // ✅ DEFINED IN INTERFACE +} + +export async function enrollConsole(payload: ConsoleEnrollPayload): Promise { + const resp = await client.post('/admin/crowdsec/console/enroll', payload) + return resp.data +} +``` + +✅ **Verdict:** Frontend correctly sends `force: true` when re-enrolling. + +#### Backend Code Analysis + +**File:** `backend/internal/crowdsec/console_enroll.go` + +**Force Parameter Handling** (Line 167-169): +```go +// Add overwrite flag if force is requested +if req.Force { + args = append(args, "--overwrite") // ✅ ADDS --overwrite FLAG +} +``` + +**Command Execution** (Line 178): +```go +logger.Log().WithField("tenant", tenant).WithField("agent", agent).WithField("force", req.Force).WithField("correlation_id", rec.LastCorrelationID).WithField("config", configPath).Info("starting crowdsec console enrollment") +out, cmdErr := s.exec.ExecuteWithEnv(cmdCtx, "cscli", args, nil) +``` + +**Docker Logs Evidence:** +``` +{"agent":"Charon","config":"/app/data/crowdsec/config/config.yaml","correlation_id":"de557798-3081-4bc2-9dbf-10e035f09eaf","force":true,"level":"info","msg":"starting crowdsec console enrollment","tenant":"5e045b3c-5196-406b-99cd-503bc64c7b0d","time":"2025-12-15T22:43:10-05:00"} +``` +✅ Shows `"force":true` in the log + +**Error in Logs:** +``` +Error: cscli console enroll: could not enroll instance: API error: the attachment key provided is not valid (hint: get your enrollement key from console, crowdsec login or machine id are not valid values) +``` + +✅ **Verdict:** Backend correctly receives `force=true` and passes `--overwrite` to cscli. The enrollment FAILED because the key itself was invalid according to CrowdSec API. + +#### LAPI Availability Check + +**Critical Code** (Line 223-244): +```go +func (s *ConsoleEnrollmentService) checkLAPIAvailable(ctx context.Context) error { + maxRetries := 3 + retryDelay := 2 * time.Second + + var lastErr error + for i := 0; i < maxRetries; i++ { + args := []string{"lapi", "status"} + configPath := s.findConfigPath() + if configPath != "" { + args = append([]string{"-c", configPath}, args...) + } + + checkCtx, cancel := context.WithTimeout(ctx, 3*time.Second) + out, err := s.exec.ExecuteWithEnv(checkCtx, "cscli", args, nil) + cancel() + + if err == nil { + logger.Log().WithField("config", configPath).Debug("LAPI check succeeded") + return nil // LAPI is available + } + + lastErr = err + if i < maxRetries-1 { + logger.Log().WithError(err).WithField("attempt", i+1).WithField("output", string(out)).Debug("LAPI not ready, retrying") + time.Sleep(retryDelay) + } + } + + return fmt.Errorf("CrowdSec Local API is not running after %d attempts - please wait for LAPI to initialize (typically 5-10 seconds after enabling CrowdSec): %w", maxRetries, lastErr) +} +``` + +**Frontend LAPI Check:** +```tsx +const lapiStatusQuery = useQuery({ + queryKey: ['crowdsec-lapi-status'], + queryFn: statusCrowdsec, + enabled: consoleEnrollmentEnabled && initialCheckComplete, + refetchInterval: 5000, // Poll every 5 seconds + retry: false, +}) +``` + +✅ **Verdict:** LAPI check is robust with 3 retries and 2-second delays. Frontend polls every 5 seconds. + +### Root Cause Determination + +**The re-enrollment with "NEW key" failed because:** + +1. ✅ `force=true` was correctly sent +2. ✅ `--overwrite` flag was correctly added +3. ❌ **The new enrollment key was INVALID** according to CrowdSec API + +**Evidence from logs:** +``` +Error: cscli console enroll: could not enroll instance: API error: the attachment key provided is not valid +``` + +**Why the SAME key worked:** +- The original key was still valid in CrowdSec's system +- Using the same key with `--overwrite` flag allowed re-enrollment to the same account + +### Conclusion + +✅ **No bug found.** The implementation is correct. User's new enrollment key was rejected by CrowdSec API. + +**User Action Required:** +1. Generate a new enrollment key from app.crowdsec.net +2. Ensure the key is copied completely (no spaces/newlines) +3. Try re-enrollment again + +--- + +## 🔍 Issue 2: Live Log Viewer "Disconnected" (December 16, 2025) + +### User Report +> "Live Log Viewer shows 'Disconnected' and no logs appear. I only need SECURITY logs (CrowdSec/Cerberus), not application logs." + +### Investigation Findings + +#### LiveLogViewer Component Analysis + +**File:** `frontend/src/components/LiveLogViewer.tsx` + +**Mode Toggle** (Line 350-366): +```tsx +
+ + +
+``` + +**WebSocket Connection Logic** (Line 155-213): +```tsx +useEffect(() => { + // ... close existing connection ... + + if (currentMode === 'security') { + // Connect to security logs endpoint + closeConnectionRef.current = connectSecurityLogs( + effectiveFilters, + handleSecurityMessage, + handleOpen, + handleError, + handleClose + ); + } else { + // Connect to application logs endpoint + closeConnectionRef.current = connectLiveLogs( + filters, + handleLiveMessage, + handleOpen, + handleError, + handleClose + ); + } +}, [currentMode, filters, securityFilters, maxLogs, showBlockedOnly]); +``` + +#### WebSocket Endpoints + +**Application Logs:** +```typescript +// frontend/src/api/logs.ts:95-135 +const wsUrl = `${protocol}//${window.location.host}/api/v1/logs/live?${params.toString()}`; +``` + +**Security Logs:** +```typescript +// frontend/src/api/logs.ts:153-174 +const wsUrl = `${protocol}//${window.location.host}/api/v1/cerberus/logs/ws?${params.toString()}`; +``` + +#### Backend WebSocket Handlers + +**Application Logs Handler:** +```go +// backend/internal/api/handlers/logs_ws.go +func LogsWebSocketHandler(c *gin.Context) { + // Subscribes to logger.BroadcastHook for app logs + hook := logger.GetBroadcastHook() + logChan := hook.Subscribe(subscriberID) +} +``` + +**Security Logs Handler:** +```go +// backend/internal/api/handlers/cerberus_logs_ws.go +func (h *CerberusLogsHandler) LiveLogs(c *gin.Context) { + // Subscribes to LogWatcher for Caddy access logs + logChan := h.watcher.Subscribe() +} +``` + +**LogWatcher Implementation:** +```go +// backend/internal/services/log_watcher.go +func NewLogWatcher(logPath string) *LogWatcher { + // Tails /app/data/logs/access.log + return &LogWatcher{ + logPath: logPath, // Defaults to access.log + } +} +``` + +✅ **LogWatcher is actively tailing:** Verified via Docker logs showing successful access.log reads + +#### Access Log Verification + +**Command:** `docker exec charon tail -20 /app/data/logs/access.log` + +✅ **Result:** Access log has MANY recent entries (20+ lines shown, JSON format, proper structure) + +**Sample Entry:** +```json +{ + "level":"info", + "ts":1765577040.5798745, + "logger":"http.log.access.access_log", + "msg":"handled request", + "request": { + "remote_ip":"172.59.136.4", + "method":"GET", + "host":"sonarr.hatfieldhosted.com", + "uri":"/api/v3/command" + }, + "status":200, + "duration":0.066689363 +} +``` + +#### Routes Configuration + +**File:** `backend/internal/api/routes/routes.go` + +```go +// Line 158 +protected.GET("/logs/live", handlers.LogsWebSocketHandler) + +// Line 394 +protected.GET("/cerberus/logs/ws", cerberusLogsHandler.LiveLogs) +``` + +✅ Both endpoints are registered and protected (require authentication) + +### Root Cause Analysis + +#### Possible Issues + +1. **Default Mode May Be Wrong** + - Component defaults to `mode='application'` (Line 142) + - User needs security logs, which requires `mode='security'` + +2. **WebSocket Authentication** + - Both endpoints are under `protected` route group + - WebSocket connections may not automatically include auth headers + - Native WebSocket API doesn't support custom headers + +3. **No WebSocket Connection Logs** + - Docker logs show NO "WebSocket connection attempt" messages + - This suggests connections are NOT reaching the backend + +4. **Frontend Connection State** + - `isConnected` is set only in `onOpen` callback + - If connection fails during upgrade, `onOpen` never fires + - Result: "Disconnected" status persists + +### Testing Commands + +```bash +# Check if LogWatcher is running +docker logs charon 2>&1 | grep -i "LogWatcher started" + +# Check for WebSocket connection attempts +docker logs charon 2>&1 | grep -i "websocket" | tail -20 + +# Check if Cerberus logs handler is initialized +docker logs charon 2>&1 | grep -i "cerberus.*logs" | tail -10 +``` + +**Result from earlier grep:** +``` +[GIN-debug] GET /api/v1/cerberus/logs/ws --> ... .LiveLogs-fm (10 handlers) +``` +✅ Route is registered + +**No connection attempt logs found** → Connections are NOT reaching backend + +### Diagnosis + +**Most Likely Issue:** WebSocket authentication failure + +1. Frontend attempts WebSocket connection +2. Browser sends `ws://` or `wss://` request without auth headers +3. Backend auth middleware rejects with 401 +4. WebSocket upgrade fails silently +5. `onError` fires but doesn't show useful message to user + +### Recommended Fixes + +#### Fix 1: Add Auth Token to WebSocket URL + +**File:** `frontend/src/api/logs.ts` + +```typescript +export const connectSecurityLogs = ( + filters: SecurityLogFilter, + onMessage: (log: SecurityLogEntry) => void, + onOpen?: () => void, + onError?: (error: Event) => void, + onClose?: () => void +): (() => void) => { + const params = new URLSearchParams(); + if (filters.source) params.append('source', filters.source); + if (filters.level) params.append('level', filters.level); + if (filters.ip) params.append('ip', filters.ip); + if (filters.host) params.append('host', filters.host); + if (filters.blocked_only) params.append('blocked_only', 'true'); + + // ✅ ADD AUTH TOKEN + const token = localStorage.getItem('token') || sessionStorage.getItem('token'); + if (token) { + params.append('token', token); + } + + const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'; + const wsUrl = `${protocol}//${window.location.host}/api/v1/cerberus/logs/ws?${params.toString()}`; + // ... +}; +``` + +**Apply same fix to** `connectLiveLogs()` + +#### Fix 2: Backend Auth Middleware Must Check Query Param + +**File:** `backend/internal/api/middleware/auth.go` (assumed location) + +Ensure the auth middleware checks for token in: +1. `Authorization` header +2. Cookie (if using session auth) +3. **Query parameter `token`** (for WebSocket compatibility) + +#### Fix 3: Add Error Display to UI + +**File:** `frontend/src/components/LiveLogViewer.tsx` + +```tsx +const [connectionError, setConnectionError] = useState(null); + +const handleError = (error: Event) => { + console.error('WebSocket error:', error); + setIsConnected(false); + setConnectionError('Failed to connect to log stream. Check authentication.'); +}; + +const handleOpen = () => { + console.log(`${currentMode} log viewer connected`); + setIsConnected(true); + setConnectionError(null); +}; + +// In JSX: +{connectionError && ( +
+ {connectionError} +
+)} +``` + +#### Fix 4: Change Default Mode to Security + +**File:** `frontend/src/components/LiveLogViewer.tsx` (Line 142) + +```tsx +export function LiveLogViewer({ + filters = {}, + securityFilters = {}, + mode = 'security', // ✅ CHANGE FROM 'application' TO 'security' + maxLogs = 500, + className = '', +}: LiveLogViewerProps) { +``` + +### Verification Steps + +1. **Check browser DevTools Network tab:** + - Look for WebSocket connection to `/api/v1/cerberus/logs/ws` + - Check status code (should be 101 Switching Protocols, not 401/403) + +2. **Check backend logs:** + - Should see "Cerberus logs WebSocket connection attempt" + - Should see "Cerberus logs WebSocket connected" + +3. **Generate test traffic:** + - Make HTTP request to any proxied host + - Check if log appears in viewer + +--- + +## 📋 CrowdSec Re-Enrollment UX Research (PREVIOUS SECTION - KEPT FOR REFERENCE) + +### CrowdSec CLI Capabilities + +**Available Console Commands (`cscli console --help`):** + +```text +Available Commands: + disable Disable a console option + enable Enable a console option + enroll Enroll this instance to https://app.crowdsec.net + status Shows status of the console options +``` + +**Enroll Command Flags (`cscli console enroll --help`):** + +```text +Flags: + -d, --disable strings Disable console options + -e, --enable strings Enable console options + -h, --help help for enroll + -n, --name string Name to display in the console + --overwrite Force enroll the instance ← KEY FLAG FOR RE-ENROLLMENT + -t, --tags strings Tags to display in the console +``` + +**Key Finding: NO "unenroll" or "disconnect" command exists in CrowdSec CLI.** + +The `disable --all` command only disables data sharing options (custom, tainted, manual, context, console_management) - it does NOT unenroll from the console. + +### Current Data Model Analysis + +**Model: `CrowdsecConsoleEnrollment`** ([crowdsec_console_enrollment.go](../../backend/internal/models/crowdsec_console_enrollment.go)): + +```go +type CrowdsecConsoleEnrollment struct { + ID uint // Primary key + UUID string // Unique identifier + Status string // not_enrolled, enrolling, pending_acceptance, enrolled, failed + Tenant string // Organization identifier + AgentName string // Display name in console + EncryptedEnrollKey string // ← KEY IS STORED (encrypted with AES-GCM) + LastError string // Error message if failed + LastCorrelationID string // For debugging + LastAttemptAt *time.Time + EnrolledAt *time.Time + LastHeartbeatAt *time.Time + CreatedAt time.Time + UpdatedAt time.Time +} +``` + +**✅ Current Implementation Already Stores Enrollment Key:** + +- The key is encrypted using AES-256-GCM with a key derived from a secret +- Stored in `EncryptedEnrollKey` field (excluded from JSON via `json:"-"`) +- Encryption implemented in `console_enroll.go` lines 377-409 + +### Enrollment Key Lifecycle (from crowdsec.net) + +1. **Generation**: User generates enrollment key on app.crowdsec.net +2. **Usage**: Key is used with `cscli console enroll ` to request enrollment +3. **Validation**: CrowdSec validates the key against their API +4. **Acceptance**: User must accept enrollment request on app.crowdsec.net +5. **Reusability**: The SAME key can be used multiple times with `--overwrite` flag +6. **Expiration**: Keys do not expire but may be revoked by user on console + +### UX Options Evaluation + +#### Option A: "Re-enroll" Button Requiring NEW Key ✅ RECOMMENDED + +**How it works:** + +- User provides a new enrollment key from crowdsec.net +- Backend sends `cscli console enroll --overwrite --name ` +- User accepts on crowdsec.net + +**Pros:** + +- ✅ Simple implementation (already supported via `force: true`) +- ✅ Secure - no key storage concerns beyond current encrypted storage +- ✅ Fresh key guarantees user has console access +- ✅ Matches CrowdSec's intended workflow + +**Cons:** + +- ⚠️ Requires user to visit crowdsec.net to get new key +- ⚠️ Extra step for user + +**Current UI Support:** + +- "Rotate key" button already calls `submitConsoleEnrollment(true)` with `force=true` +- "Retry enrollment" button appears when status is `degraded` + +#### Option B: "Re-enroll" with STORED Key + +**How it works:** + +- Use the encrypted key already stored in `EncryptedEnrollKey` +- Decrypt and re-send enrollment request + +**Pros:** + +- ✅ Simplest UX - one-click re-enrollment +- ✅ Key is already stored and encrypted + +**Cons:** + +- ⚠️ Security concern: Re-using stored keys increases exposure window +- ⚠️ Key may have been revoked on crowdsec.net without Charon knowing +- ⚠️ Old key may belong to different CrowdSec account +- ⚠️ Violates principle of least privilege + +**Current Implementation Gap:** + +- `decrypt()` method exists but is marked as "only used in tests" +- Would need new endpoint to retrieve stored key for re-enrollment + +#### Option C: "Unenroll" + Manual Re-enroll ❌ NOT SUPPORTED + +**How it would work:** + +- Clear local enrollment state +- User goes through fresh enrollment + +**Blockers:** + +- ❌ CrowdSec CLI has NO unenroll/disconnect command +- ❌ Would require manual deletion of config files +- ❌ May leave orphaned engine on crowdsec.net console + +**Files that would need cleanup:** + +```text +/app/data/crowdsec/config/console.yaml # Console options +/app/data/crowdsec/config/online_api_credentials.yaml # CAPI credentials +``` + +Note: Deleting these files would also affect CAPI registration, not just console enrollment. + +### 🎯 Recommended Approach: Option A (Enhanced) + +**Justification:** + +1. **Security First**: CrowdSec enrollment keys should be treated as sensitive credentials +2. **User Intent**: Re-enrollment implies user wants fresh connection to console +3. **Minimal Risk**: User must actively obtain new key, preventing accidental re-enrollments +4. **CrowdSec Best Practice**: The `--overwrite` flag is CrowdSec's designed mechanism for this + +**UI Flow Enhancement:** + +```text +┌─────────────────────────────────────────────────────────────────┐ +│ Console Enrollment [?] Help │ +├─────────────────────────────────────────────────────────────────┤ +│ │ +│ Status: ● Enrolled │ +│ Agent: Charon-Home │ +│ Tenant: my-organization │ +│ │ +│ ┌─────────────────────────────────────────────────────────┐ │ +│ │ Need to re-enroll? │ │ +│ │ │ │ +│ │ To connect to a different CrowdSec console account or │ │ +│ │ reset your enrollment, you'll need a new enrollment key │ │ +│ │ from app.crowdsec.net. │ │ +│ │ │ │ +│ │ [Get new key ↗] [Re-enroll with new key] │ │ +│ └─────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────┐ │ +│ │ New Enrollment Key: [________________________] │ │ +│ │ Agent Name: [Charon-Home_____________] │ │ +│ │ Tenant: [my-organization_________] │ │ +│ │ │ │ +│ │ [Re-enroll] │ │ +│ └─────────────────────────────────────────────────────────┘ │ +│ │ +└─────────────────────────────────────────────────────────────────┘ +``` + +### Implementation Steps + +#### Step 1: Update Frontend UI (Priority: HIGH) + +**File:** `frontend/src/pages/CrowdSecConfig.tsx` + +Changes: + +1. Add "Re-enroll" section visible when `status === 'enrolled'` +2. Add expandable/collapsible panel for re-enrollment +3. Add link to app.crowdsec.net/enrollment-keys +4. Rename "Rotate key" button to "Re-enroll" for clarity +5. Add explanatory text about why re-enrollment requires new key + +#### Step 2: Improve Backend Logging (Priority: MEDIUM) + +**File:** `backend/internal/crowdsec/console_enroll.go` + +Changes: + +1. Add logging when enrollment is skipped due to existing status +2. Return `skipped: true` field in response when idempotency check triggers +3. Consider adding `reason` field to explain why enrollment was skipped + +#### Step 3: Add "Clear Enrollment" Admin Function (Priority: LOW) + +**File:** `backend/internal/api/handlers/crowdsec_handler.go` + +New endpoint: `DELETE /api/v1/admin/crowdsec/console/enrollment` + +Purpose: Reset local enrollment state to `not_enrolled` without touching CrowdSec config files. + +Note: This does NOT unenroll from crowdsec.net - that must be done manually on the console. + +#### Step 4: Documentation Update (Priority: MEDIUM) + +**File:** `docs/cerberus.md` + +Add section explaining: + +- Why re-enrollment requires new key +- How to get new enrollment key from crowdsec.net +- What happens to old engine on crowdsec.net (must be manually removed) +- Troubleshooting common enrollment issues + +--- + +## Executive Summary + +This document covers THREE issues: + +1. **CrowdSec Enrollment Backend** 🔴 **CRITICAL BUG FOUND**: Backend returns 200 OK but `cscli` is NEVER executed + - **Root Cause**: Silent idempotency check returns success without running enrollment command + - **Evidence**: POST returns 200 OK with 137ms latency, but NO `cscli` logs appear + - **Fix Required**: Add logging for skipped enrollments and clear guidance to use `force=true` + +2. **Live Log Viewer**: Shows "Disconnected" status (Analysis pending implementation) + +3. **Stale Database State**: Old `enrolled` status from pre-fix deployment blocks new enrollments + - **Symptoms**: User clicks Enroll, sees 200 OK, but nothing happens on crowdsec.net + - **Root Cause**: Database has `status=enrolled` from before the `pending_acceptance` fix was deployed + +--- + +## 🔴 CRITICAL BUG: Silent Idempotency Check (December 16, 2025) + +### Problem Statement + +User submits enrollment form, backend returns 200 OK (confirmed in Docker logs), but the enrollment NEVER appears on crowdsec.net. No `cscli` command execution visible in logs. + +### Docker Log Evidence + +``` +POST /api/v1/admin/crowdsec/console/enroll → 200 OK (137ms latency) +NO "starting crowdsec console enrollment" log ← cscli NEVER executed +NO cscli output logs +``` + +### Code Path Analysis + +**File:** [backend/internal/crowdsec/console_enroll.go](backend/internal/crowdsec/console_enroll.go) + +#### Step 1: Handler calls service (line 865-920) + +```go +// crowdsec_handler.go:888-895 +status, err := h.Console.Enroll(ctx, crowdsec.ConsoleEnrollRequest{ + EnrollmentKey: payload.EnrollmentKey, + Tenant: payload.Tenant, + AgentName: payload.AgentName, + Force: payload.Force, // <-- User did NOT check Force checkbox +}) +``` + +#### Step 2: Idempotency Check (lines 155-165) ⚠️ BUG HERE + +```go +// console_enroll.go:155-165 +if rec.Status == consoleStatusEnrolling { + return s.statusFromModel(rec), fmt.Errorf("enrollment already in progress") +} +// If already enrolled or pending acceptance, skip unless Force is set +if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force { + return s.statusFromModel(rec), nil // <-- RETURNS SUCCESS WITHOUT LOGGING OR RUNNING CSCLI! +} +``` + +#### Step 3: Database State (confirmed via container inspection) + +``` +uuid: fb129bb5-d223-4c66-941c-a30e2e2b3040 +status: enrolled ← SET BY OLD CODE BEFORE pending_acceptance FIX +tenant: 5e045b3c-5196-406b-99cd-503bc64c7b0d +agent_name: Charon +``` + +### Root Cause + +1. **Historical State**: User enrolled BEFORE the `pending_acceptance` fix was deployed +2. **Old Code Bug**: Previous code set `status = enrolled` immediately after cscli returned exit 0 +3. **Silent Skip**: Current code silently skips enrollment when `status` is `enrolled` (or `pending_acceptance`) +4. **No User Feedback**: Returns 200 OK without logging or informing user enrollment was skipped + +### Manual Test Results from Container + +```bash +# cscli is available and working +docker exec charon cscli console enroll --help +# ✅ Shows help + +# LAPI is running +docker exec charon cscli lapi status +# ✅ "You can successfully interact with Local API (LAPI)" + +# Console status +docker exec charon cscli console status +# ✅ Shows options table (custom=true, tainted=true) + +# Manual enrollment with invalid key shows proper error +docker exec charon cscli console enroll --name test TESTINVALIDKEY123 +# ✅ Error: "the attachment key provided is not valid" + +# Config path exists and is correct +docker exec charon ls /app/data/crowdsec/config/config.yaml +# ✅ File exists +``` + +### Required Fixes + +#### Fix 1: Add Logging for Skipped Enrollments + +**File:** `backend/internal/crowdsec/console_enroll.go` lines 162-165 + +**Current:** +```go +if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force { + return s.statusFromModel(rec), nil +} +``` + +**Fixed:** +```go +if (rec.Status == consoleStatusEnrolled || rec.Status == consoleStatusPendingAcceptance) && !req.Force { + logger.Log().WithField("status", rec.Status).WithField("agent", rec.AgentName).WithField("tenant", rec.Tenant).Info("enrollment skipped: already enrolled or pending - use force=true to re-enroll") + return s.statusFromModel(rec), nil +} +``` + +#### Fix 2: Add "Skipped" Indicator to Response + +Add a field to indicate enrollment was skipped vs actually submitted: + +```go +type ConsoleEnrollmentStatus struct { + Status string `json:"status"` + Skipped bool `json:"skipped,omitempty"` // <-- NEW + // ... other fields +} +``` + +And in the idempotency return: +```go +status := s.statusFromModel(rec) +status.Skipped = true +return status, nil +``` + +#### Fix 3: Frontend Should Show "Already Enrolled" State + +**File:** `frontend/src/pages/CrowdSecConfig.tsx` + +When `consoleStatusQuery.data?.status === 'enrolled'` or `'pending_acceptance'`: +- Show "You are already enrolled" message +- Show "Force Re-Enrollment" button with checkbox +- Explain that acceptance on crowdsec.net may be required + +#### Fix 4: Migrate Stale "enrolled" Status to "pending_acceptance" + +Either: +1. Add a database migration to change all `enrolled` to `pending_acceptance` +2. Or have users click "Force Re-Enroll" once + +### Workaround for User + +Until fix is deployed, user can re-enroll using the Force option: + +1. In the UI: Check "Force re-enrollment" checkbox before clicking Enroll +2. Or via curl: +```bash +curl -X POST http://localhost:8080/api/v1/admin/crowdsec/console/enroll \ + -H "Authorization: Bearer " \ + -H "Content-Type: application/json" \ + -d '{"enrollment_key":"", "agent_name":"Charon", "force":true}' +``` + +--- + +## Previous Frontend Analysis (Still Valid for Reference) + +### Enrollment Flow Path + +``` +User clicks "Enroll" button + ↓ +CrowdSecConfig.tsx: +``` + +### Phase 3: Remove Deprecated Mode (Priority: Medium) + +#### 3.1 Backend Model Cleanup (Future) +**File**: [backend/internal/models/security_config.go](backend/internal/models/security_config.go) + +Mark `CrowdSecMode` as deprecated with migration path. + +#### 3.2 Settings Migration +Create migration to ensure all users have `security.crowdsec.enabled` setting derived from `CrowdSecMode`. + +--- + +## Files to Modify Summary + +### Backend +| File | Changes | +|------|---------| +| `backend/internal/api/handlers/security_handler.go` | Add process status check to `GetStatus()` | +| `backend/internal/api/handlers/crowdsec_handler.go` | Sync `settings` table in `Start()`/`Stop()` | + +### Frontend +| File | Changes | +|------|---------| +| `frontend/src/pages/Security.tsx` | Use `crowdsecStatus?.running` for toggle state | +| `frontend/src/components/LiveLogViewer.tsx` | Fix `isPaused` dependency, use ref | +| `frontend/src/pages/CrowdSecConfig.tsx` | Remove mode toggle, add info banner, add "Start CrowdSec" button | + +--- + +## Testing Checklist + +- [ ] Toggle CrowdSec on Security Dashboard → verify process starts +- [ ] Toggle CrowdSec off → verify process stops +- [ ] Refresh page → verify toggle state matches process state +- [ ] Open LiveLogViewer → verify "Connected" status +- [ ] Pause logs → verify connection remains open +- [ ] Navigate away and back → logs are cleared (expected) but connection re-establishes +- [ ] CrowdSec Config page → no mode toggle, info banner present +- [ ] Enrollment section → shows "Start CrowdSec" button when process not running diff --git a/docs/reports/qa_report.md b/docs/reports/qa_report.md index 1c75ee68..3a7cbc6f 100644 --- a/docs/reports/qa_report.md +++ b/docs/reports/qa_report.md @@ -1,7 +1,7 @@ -# QA Audit Report: CrowdSec Re-Enrollment Fixes +# QA Audit Report: WebSocket Auth Fix **Date:** December 16, 2025 -**Scope:** Backend and frontend fixes for CrowdSec re-enrollment functionality +**Change:** Fixed localStorage key in `frontend/src/api/logs.ts` from `token` to `charon_auth_token` --- @@ -9,70 +9,80 @@ | Check | Status | Details | |-------|--------|---------| -| Backend Tests | ✅ PASS | All tests passed | -| Frontend Tests | ✅ PASS | 956 passed, 2 skipped | -| TypeScript Check | ✅ PASS | No type errors | -| Frontend Lint | ✅ PASS | 0 errors, 12 warnings | -| Pre-commit Checks | ✅ PASS | All hooks passed | +| Frontend Build | ✅ PASS | Built successfully in 5.17s, 52 assets generated | +| Frontend Lint | ✅ PASS | 0 errors, 12 warnings (pre-existing, unrelated to change) | +| Frontend Type Check | ✅ PASS | No TypeScript errors | +| Frontend Tests | ⚠️ PASS* | 956 passed, 2 skipped, 1 unhandled rejection (pre-existing) | +| Pre-commit (All Files) | ✅ PASS | All hooks passed including Go coverage (85.2%) | | Backend Build | ✅ PASS | Compiled successfully | -| Frontend Build | ✅ PASS | Built successfully | +| Backend Tests | ✅ PASS | All packages passed | --- ## Detailed Results -### 1. Backend Tests +### 1. Frontend Build -**Command:** `cd /projects/Charon/backend && go test ./... -v` +**Command:** `cd /projects/Charon/frontend && npm run build` **Result:** ✅ PASS -- All packages tested successfully -- Coverage: 85.2% (minimum required: 85%) -- No test failures +``` +✓ 2234 modules transformed +✓ built in 5.17s +``` -### 2. Frontend Tests +- All 52 output assets generated correctly +- Main bundle: 251.10 kB (81.36 kB gzipped) -**Command:** `cd /projects/Charon/frontend && npm run test -- --run` +### 2. Frontend Lint + +**Command:** `cd /projects/Charon/frontend && npm run lint` **Result:** ✅ PASS -- **Test Files:** 91 passed -- **Tests:** 956 passed, 2 skipped -- **Duration:** 67.51s -- No test failures +``` +✖ 12 problems (0 errors, 12 warnings) +``` -### 3. TypeScript Check +**Note:** All 12 warnings are pre-existing and unrelated to the WebSocket auth fix: + +- `@typescript-eslint/no-explicit-any` warnings in test files +- `@typescript-eslint/no-unused-vars` in e2e tests +- `react-hooks/exhaustive-deps` in CrowdSecConfig.tsx + +### 3. Frontend Type Check **Command:** `cd /projects/Charon/frontend && npm run type-check` **Result:** ✅ PASS -- No type errors found +``` +tsc --noEmit completed successfully +``` -### 4. Frontend Lint +No TypeScript compilation errors. -**Command:** `cd /projects/Charon/frontend && npm run lint` +### 4. Frontend Tests -**Result:** ✅ PASS (with warnings) +**Command:** `cd /projects/Charon/frontend && npm run test` -- **Errors:** 0 -- **Warnings:** 12 +**Result:** ⚠️ PASS* -#### Warnings (non-blocking) +``` +Test Files: 91 passed (91) +Tests: 956 passed | 2 skipped (958) +Errors: 1 error (unhandled rejection) +``` -| File | Line | Warning | -|------|------|---------| -| `e2e/tests/security-mobile.spec.ts` | 289 | Unused variable `onclick` | -| `src/api/__tests__/consoleEnrollment.test.ts` | 485 | Unexpected `any` type | -| `src/pages/CrowdSecConfig.tsx` | 224 | Missing useEffect dependencies | -| `src/pages/CrowdSecConfig.tsx` | 936 | Unexpected `any` type | -| `src/pages/__tests__/CrowdSecConfig.spec.tsx` | 266, 292, 325 | Unexpected `any` type | -| `src/utils/__tests__/crowdsecExport.test.ts` | 142, 154, 181, 427, 432 | Unexpected `any` type | +**Note:** The unhandled rejection error is a **pre-existing issue** in `Security.test.tsx` related to React state updates after component unmount. This is NOT caused by the WebSocket auth fix. -**Note:** These warnings are in test files and do not affect production code quality. +The specific logs API tests all passed: -### 5. Pre-commit Checks +- `src/api/logs.test.ts` (19 tests) ✅ +- `src/api/__tests__/logs-websocket.test.ts` (11 tests | 2 skipped) ✅ + +### 5. Pre-commit (All Files) **Command:** `source .venv/bin/activate && pre-commit run --all-files` @@ -80,7 +90,7 @@ All hooks passed: -- ✅ Go Test (with Coverage) +- ✅ Go Test (with Coverage): 85.2% (minimum 85% required) - ✅ Go Vet - ✅ Check .version matches latest Git tag - ✅ Prevent large files that are not tracked by LFS @@ -98,16 +108,20 @@ All hooks passed: - No compilation errors - All packages built successfully -### 7. Frontend Build +### 7. Backend Tests -**Command:** `cd /projects/Charon/frontend && npm run build` +**Command:** `cd /projects/Charon/backend && go test ./...` **Result:** ✅ PASS -- TypeScript compilation successful -- Vite build completed in 4.92s -- 2234 modules transformed -- All assets generated successfully +All packages passed: + +- `cmd/api` ✅ +- `cmd/seed` ✅ +- `internal/api/handlers` ✅ (231.466s) +- `internal/api/middleware` ✅ +- `internal/services` ✅ (38.993s) +- All other packages ✅ --- @@ -115,13 +129,11 @@ All hooks passed: **No blocking issues found.** -### Non-blocking items (warnings only) +### Non-blocking items (pre-existing) -1. **ESLint `@typescript-eslint/no-explicit-any` warnings:** 10 occurrences in test files using `any` type. These are acceptable in test files for mocking purposes. +1. **Unhandled rejection in Security.test.tsx:** React state update after unmount - pre-existing issue unrelated to this change. -2. **ESLint `react-hooks/exhaustive-deps` warning:** 1 occurrence in `CrowdSecConfig.tsx` at line 224. The missing dependencies (`pullPresetMutation` and `selectedPreset`) appear to be intentionally excluded to prevent infinite loops. - -3. **Unused variable warning:** 1 occurrence in `security-mobile.spec.ts` - an `onclick` variable that's assigned but not used. +2. **ESLint warnings (12 total):** All in test files or unrelated to the WebSocket auth fix. --- @@ -129,10 +141,12 @@ All hooks passed: ## ✅ PASS -All critical QA checks have passed. The CrowdSec re-enrollment fixes are ready for deployment. +The WebSocket auth fix (`token` → `charon_auth_token`) has been verified: -- No test failures -- No type errors -- No lint errors -- Builds compile successfully -- Coverage requirements met (85.2% ≥ 85%) +- ✅ No regressions introduced - All tests pass +- ✅ Build integrity maintained - Both frontend and backend compile successfully +- ✅ Type safety preserved - TypeScript checks pass +- ✅ Code quality maintained - Lint passes (no new issues) +- ✅ Coverage requirement met - 85.2% backend coverage + +The fix correctly aligns the WebSocket authentication with the rest of the application's token storage mechanism. diff --git a/frontend/src/api/logs.ts b/frontend/src/api/logs.ts index 72bccbf8..1f6201c5 100644 --- a/frontend/src/api/logs.ts +++ b/frontend/src/api/logs.ts @@ -128,8 +128,8 @@ export const connectLiveLogs = ( if (filters.level) params.append('level', filters.level); if (filters.source) params.append('source', filters.source); - // Get auth token from localStorage - const token = localStorage.getItem('token'); + // Get auth token from localStorage (key: charon_auth_token) + const token = localStorage.getItem('charon_auth_token'); if (token) { params.append('token', token); } @@ -196,8 +196,8 @@ export const connectSecurityLogs = ( if (filters.host) params.append('host', filters.host); if (filters.blocked_only) params.append('blocked_only', 'true'); - // Get auth token from localStorage - const token = localStorage.getItem('token'); + // Get auth token from localStorage (key: charon_auth_token) + const token = localStorage.getItem('charon_auth_token'); if (token) { params.append('token', token); }