diff --git a/docs/plans/current_spec.md b/docs/plans/current_spec.md
index 8ea6be0f..3f446f28 100644
--- a/docs/plans/current_spec.md
+++ b/docs/plans/current_spec.md
@@ -1,528 +1,369 @@
-# CrowdSec LAPI Status Bug - Diagnostic & Fix Plan
+# Fix Plan: Critical Issues After Docker Rebuild
**Date:** December 14, 2025
-**Issue:** CrowdSecConfig page persistently shows "LAPI is initializing" even when LAPI is running
-**Status:** 🎯 **ROOT CAUSE IDENTIFIED** - Status endpoint checks process, not LAPI connectivity
-**Priority:** HIGH (Blocks Console Enrollment Feature)
-**Previous Issue:** [crowdsec_lapi_error_diagnostic.md](crowdsec_lapi_error_diagnostic.md) - Race condition fix introduced this regression
+**Status:** Planning Phase
+**Priority:** P0 - Urgent
---
-## 🎯 Key Findings
+## Issue Summary
-### Critical Discovery
+After a Docker container rebuild, three critical issues were identified:
-After implementing fixes from `docs/plans/crowdsec_lapi_error_diagnostic.md`, the CrowdSecConfig page now persistently displays:
-
-> "CrowdSec Local API is initializing...
-> The CrowdSec process is running but the Local API (LAPI) is still starting up."
-
-This message appears **even when LAPI is actually running and reachable**. The fix introduced a regression where the Status endpoint was not updated to match the new LAPI-aware Start endpoint.
-
-### Root Cause Chain
-
-1. `Start()` handler was correctly updated to wait for LAPI and return `lapi_ready: true/false`
-2. **BUT** `Status()` handler was **NOT updated** - still only checks process status
-3. Frontend expects `running` to mean "LAPI responding"
-4. Backend returns `running: true` meaning only "process running"
-5. **MISMATCH:** Frontend needs `lapi_ready` field to determine actual LAPI status
-
-### Why This is a Regression
-
-- The original fix added LAPI readiness check to `Start()` handler ✅
-- But forgot to add the same check to `Status()` handler ❌
-- Frontend now uses `statusCrowdsec()` for polling LAPI status
-- This endpoint doesn't actually verify LAPI connectivity
-
-### Impact
-
-- Console enrollment section always shows "initializing" warning
-- Enroll button is disabled even when LAPI is working
-- Users cannot complete console enrollment despite CrowdSec being functional
-
----
-
-## Executive Summary
-
-The `Start()` handler was correctly updated to wait for LAPI readiness before returning (lines 201-236 in [crowdsec_handler.go](../../backend/internal/api/handlers/crowdsec_handler.go#L201-L236)):
-
-```go
-// Start() now waits for LAPI and returns lapi_ready: true/false
-c.JSON(http.StatusOK, gin.H{
- "status": "started",
- "pid": pid,
- "lapi_ready": true, // NEW: indicates LAPI is ready
-})
-```
-
-However, the `Status()` handler was **NOT updated** and still only checks process status (lines 287-294):
-
-```go
-func (h *CrowdsecHandler) Status(c *gin.Context) {
- ctx := c.Request.Context()
- running, pid, err := h.Executor.Status(ctx, h.DataDir) // Only checks PID!
- if err != nil {
- c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
- return
- }
- c.JSON(http.StatusOK, gin.H{"running": running, "pid": pid}) // Missing lapi_ready!
-}
-```
+1. **500 error on CrowdSec Stop()** - Toggling CrowdSec OFF returns "Failed to stop CrowdSec: Request failed with status code 500"
+2. **CrowdSec shows "not running"** - Despite database setting being enabled, the process isn't running after container restart
+3. **Live Logs Disconnected** - WebSocket shows disconnected state even when logs are being generated
---
## Root Cause Analysis
-### The Executor's Status() Method
+### Issue 1: 500 Error on Stop()
-The `DefaultCrowdsecExecutor.Status()` in [crowdsec_exec.go](../../backend/internal/api/handlers/crowdsec_exec.go#L65-L87) only checks:
+**Location:** [crowdsec_exec.go](../../backend/internal/api/handlers/crowdsec_exec.go#L36-L51)
-1. If PID file exists
-2. If process with that PID is running (via signal 0)
+**Root Cause:** The `Stop()` method in `DefaultCrowdsecExecutor` fails when there's no PID file.
+
+```go
+func (e *DefaultCrowdsecExecutor) Stop(ctx context.Context, configDir string) error {
+ b, err := os.ReadFile(e.pidFile(configDir))
+ if err != nil {
+ return fmt.Errorf("pid file read: %w", err) // ← FAILS HERE with 500
+ }
+ // ...
+}
+```
+
+**Problem:** When the container restarts:
+1. The PID file (`/app/data/crowdsec/crowdsec.pid`) is deleted (ephemeral or process cleanup removes it)
+2. The database still has CrowdSec "enabled" = true
+3. User clicks "Disable" (which calls Stop())
+4. Stop() tries to read the PID file → fails → returns error → 500 response
+
+**Why it fails:** The code assumes a PID file always exists when stopping. But after container restart, there's no PID file because the CrowdSec process wasn't running (GUI-controlled lifecycle, NOT auto-started).
+
+### Issue 2: CrowdSec Shows "Not Running" After Restart
+
+**Location:** [docker-entrypoint.sh](../../docker-entrypoint.sh#L91-L97)
+
+**Root Cause:** CrowdSec is **intentionally NOT auto-started** in the entrypoint. The design is "GUI-controlled lifecycle."
+
+From the entrypoint:
+```bash
+# CrowdSec Lifecycle Management:
+# CrowdSec configuration is initialized above (symlinks, directories, hub updates)
+# However, the CrowdSec agent is NOT auto-started in the entrypoint.
+# Instead, CrowdSec lifecycle is managed by the backend handlers via GUI controls.
+```
+
+**Problem:** The database stores the user's "enabled" preference, but there's no reconciliation at startup:
+1. Container restarts
+2. Database says CrowdSec `enabled = true` (user's previous preference)
+3. But CrowdSec process is NOT started (by design)
+4. UI shows "enabled" but status shows "not running" → confusing state mismatch
+
+**Missing Logic:** No startup reconciliation to check "if DB says enabled, start CrowdSec process."
+
+### Issue 3: Live Logs Disconnected
+
+**Location:** [logs_ws.go](../../backend/internal/api/handlers/logs_ws.go) and [log_watcher.go](../../backend/internal/services/log_watcher.go)
+
+**Root Cause:** There are **two separate WebSocket log systems** that may be misconfigured:
+
+1. **`/api/v1/logs/live`** (logs_ws.go) - Streams application logs via `logger.GetBroadcastHook()`
+2. **`/api/v1/cerberus/logs/ws`** (cerberus_logs_ws.go) - Streams Caddy access logs via `LogWatcher`
+
+**Potential Issues:**
+
+a) **LogWatcher not started:** The `LogWatcher` must be explicitly started with `Start(ctx)`. If the watcher isn't started during server initialization, no logs are broadcast.
+
+b) **Log file doesn't exist:** The LogWatcher waits for `/var/log/caddy/access.log` to exist. After container restart with no traffic, this file may not exist yet.
+
+c) **WebSocket connection path mismatch:** Frontend might connect to wrong endpoint or with invalid token.
+
+d) **CSP blocking WebSocket:** Security middleware's Content-Security-Policy must allow `ws:` and `wss:` protocols.
+
+---
+
+## Detailed Code Analysis
+
+### Stop() Method - Full Code Review
+
+```go
+// File: backend/internal/api/handlers/crowdsec_exec.go
+func (e *DefaultCrowdsecExecutor) Stop(ctx context.Context, configDir string) error {
+ b, err := os.ReadFile(e.pidFile(configDir))
+ if err != nil {
+ return fmt.Errorf("pid file read: %w", err) // ← CRITICAL: Returns error on ENOENT
+ }
+ pid, err := strconv.Atoi(string(b))
+ if err != nil {
+ return fmt.Errorf("invalid pid: %w", err)
+ }
+ proc, err := os.FindProcess(pid)
+ if err != nil {
+ return err
+ }
+ if err := proc.Signal(syscall.SIGTERM); err != nil {
+ return err
+ }
+ _ = os.Remove(e.pidFile(configDir))
+ return nil
+}
+```
+
+The problem is clear: `os.ReadFile()` returns `os.ErrNotExist` when the PID file doesn't exist, and this is propagated as a 500 error.
+
+### Status() Method - Already Handles Missing PID Gracefully
```go
func (e *DefaultCrowdsecExecutor) Status(ctx context.Context, configDir string) (running bool, pid int, err error) {
b, err := os.ReadFile(e.pidFile(configDir))
if err != nil {
- // Missing pid file is treated as not running
+ // Missing pid file is treated as not running ← GOOD PATTERN
return false, 0, nil
}
- // ... check if process is alive via signal 0 ...
- return true, pid, nil
+ // ...
}
```
-It does **NOT** check if LAPI HTTP endpoint is responding.
-
-### Frontend Expectation Mismatch
-
-The frontend in [CrowdSecConfig.tsx](../../frontend/src/pages/CrowdSecConfig.tsx#L71-L77) queries LAPI status:
-
-```tsx
-const lapiStatusQuery = useQuery({
- queryKey: ['crowdsec-lapi-status'],
- queryFn: statusCrowdsec,
- enabled: consoleEnrollmentEnabled && initialCheckComplete,
- refetchInterval: 5000, // Poll every 5 seconds
- retry: false,
-})
-```
-
-And displays a warning based on `running` field (lines 207-231):
-
-```tsx
-{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && (
-
-
CrowdSec Local API is initializing...
-
-)}
-```
-
-**The Problem:** The frontend checks `lapiStatusQuery.data?.running` expecting it to indicate LAPI connectivity. But the backend returns `running: true` which only means "process is running", not "LAPI is responding".
-
-### Evidence Chain
-
-| Component | File | Line | Returns | Actually Checks |
-|-----------|------|------|---------|-----------------|
-| Backend Handler | crowdsec_handler.go | 287-294 | `{running, pid}` | Process running via PID |
-| Backend Executor | crowdsec_exec.go | 65-87 | `(running, pid, err)` | PID file + signal 0 |
-| Frontend API | crowdsec.ts | 18-21 | `resp.data` | N/A (passthrough) |
-| Frontend Query | CrowdSecConfig.tsx | 71-77 | `lapiStatusQuery.data` | Checks `.running` field |
-| Frontend UI | CrowdSecConfig.tsx | 207-231 | Shows warning | `!running` |
-
-**Bug:** Frontend interprets `running` as "LAPI responding" but backend returns "process running".
+**Key Insight:** `Status()` already handles missing PID file gracefully by returning `(false, 0, nil)`. The `Stop()` method should follow the same pattern.
---
-## Detailed Analysis: Why Warning Always Shows
+## Fix Plan
-Looking at the conditional again:
+### Fix 1: Make Stop() Idempotent (No Error When Already Stopped)
-```tsx
-{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && (
-```
+**File:** `backend/internal/api/handlers/crowdsec_exec.go`
-This shows the warning when:
-- `lapiStatusQuery.data` is truthy ✓
-- `!lapiStatusQuery.data.running` is truthy (i.e., `running` is falsy)
-- `initialCheckComplete` is truthy ✓
-
-**Re-analyzing:** If `running: true`, then `!true = false`, so warning should NOT show.
-
-**But user reports it DOES show!**
-
-**Possible causes:**
-
-1. **Process not actually running:** The `Status()` endpoint returns `running: false` because CrowdSec process crashed or PID file is missing/stale
-2. **Different `running` field:** Frontend might be checking a different property
-3. **Query state issue:** React Query might be returning stale data
-
-**Most Likely:** Looking at the message being displayed:
-
-> "CrowdSec Local API is **initializing**..."
-
-This message was designed for the case where **process IS running** but **LAPI is NOT ready yet**. But the current conditional shows it when `running` is false!
-
-**The Fix Needed:** The conditional should check:
-- Process running (`running: true`) AND
-- LAPI not ready (`lapi_ready: false`)
-
-NOT just:
-- Process not running (`running: false`)
-
----
-
-## The Complete Fix
-
-### Files to Modify
-
-1. **Backend:** [backend/internal/api/handlers/crowdsec_handler.go](../../backend/internal/api/handlers/crowdsec_handler.go#L287-L294)
-2. **Frontend API:** [frontend/src/api/crowdsec.ts](../../frontend/src/api/crowdsec.ts#L18-L21)
-3. **Frontend UI:** [frontend/src/pages/CrowdSecConfig.tsx](../../frontend/src/pages/CrowdSecConfig.tsx#L207-L231)
-4. **Tests:** [backend/internal/api/handlers/crowdsec_handler_test.go](../../backend/internal/api/handlers/crowdsec_handler_test.go)
-
-### Change 1: Backend Status Handler
-
-**File:** `backend/internal/api/handlers/crowdsec_handler.go`
-**Location:** Lines 287-294
-
-**Before:**
+**Current Code (lines 36-51):**
```go
-// Status returns simple running state.
-func (h *CrowdsecHandler) Status(c *gin.Context) {
- ctx := c.Request.Context()
- running, pid, err := h.Executor.Status(ctx, h.DataDir)
+func (e *DefaultCrowdsecExecutor) Stop(ctx context.Context, configDir string) error {
+ b, err := os.ReadFile(e.pidFile(configDir))
if err != nil {
- c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
- return
+ return fmt.Errorf("pid file read: %w", err)
}
- c.JSON(http.StatusOK, gin.H{"running": running, "pid": pid})
+ pid, err := strconv.Atoi(string(b))
+ if err != nil {
+ return fmt.Errorf("invalid pid: %w", err)
+ }
+ proc, err := os.FindProcess(pid)
+ if err != nil {
+ return err
+ }
+ if err := proc.Signal(syscall.SIGTERM); err != nil {
+ return err
+ }
+ // best-effort remove pid file
+ _ = os.Remove(e.pidFile(configDir))
+ return nil
}
```
-**After:**
+**Fixed Code:**
```go
-// Status returns running state including LAPI availability check.
-func (h *CrowdsecHandler) Status(c *gin.Context) {
- ctx := c.Request.Context()
- running, pid, err := h.Executor.Status(ctx, h.DataDir)
+func (e *DefaultCrowdsecExecutor) Stop(ctx context.Context, configDir string) error {
+ b, err := os.ReadFile(e.pidFile(configDir))
if err != nil {
- c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
- return
- }
-
- // Check LAPI connectivity if process is running
- lapiReady := false
- if running {
- args := []string{"lapi", "status"}
- if _, err := os.Stat(filepath.Join(h.DataDir, "config.yaml")); err == nil {
- args = append([]string{"-c", filepath.Join(h.DataDir, "config.yaml")}, args...)
+ if os.IsNotExist(err) {
+ // No PID file means process isn't running - that's OK for Stop()
+ // This makes Stop() idempotent (safe to call multiple times)
+ return nil
}
- checkCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
- _, checkErr := h.CmdExec.Execute(checkCtx, "cscli", args...)
- cancel()
- lapiReady = (checkErr == nil)
+ return fmt.Errorf("pid file read: %w", err)
}
-
- c.JSON(http.StatusOK, gin.H{
- "running": running,
- "pid": pid,
- "lapi_ready": lapiReady,
- })
+ pid, err := strconv.Atoi(string(b))
+ if err != nil {
+ // Malformed PID file - remove it and treat as not running
+ _ = os.Remove(e.pidFile(configDir))
+ return nil
+ }
+ proc, err := os.FindProcess(pid)
+ if err != nil {
+ // Process lookup failed - clean up PID file
+ _ = os.Remove(e.pidFile(configDir))
+ return nil
+ }
+ if err := proc.Signal(syscall.SIGTERM); err != nil {
+ // Process may already be dead - clean up PID file
+ _ = os.Remove(e.pidFile(configDir))
+ // Only return error if it's not "process doesn't exist"
+ if !errors.Is(err, os.ErrProcessDone) && !errors.Is(err, syscall.ESRCH) {
+ return err
+ }
+ return nil
+ }
+ // best-effort remove pid file
+ _ = os.Remove(e.pidFile(configDir))
+ return nil
}
```
-### Change 2: Frontend API Type
+**Rationale:** Stop() should be idempotent. Stopping an already-stopped service shouldn't error.
-**File:** `frontend/src/api/crowdsec.ts`
-**Location:** Lines 18-21
+### Fix 2: Add CrowdSec Startup Reconciliation
-**Before:**
-```typescript
-export async function statusCrowdsec() {
- const resp = await client.get('/admin/crowdsec/status')
- return resp.data
+**File:** `backend/internal/api/routes/routes.go` (or create `backend/internal/services/crowdsec_startup.go`)
+
+**New Function:**
+```go
+// ReconcileCrowdSecOnStartup checks if CrowdSec should be running based on DB settings
+// and starts it if necessary. This handles the case where the container restarts
+// but the user's preference was to have CrowdSec enabled.
+func ReconcileCrowdSecOnStartup(db *gorm.DB, executor handlers.CrowdsecExecutor, binPath, dataDir string) {
+ if db == nil {
+ return
+ }
+
+ var secCfg models.SecurityConfig
+ if err := db.First(&secCfg).Error; err != nil {
+ // No config yet or error - nothing to reconcile
+ logger.Log().WithError(err).Debug("No security config found for CrowdSec reconciliation")
+ return
+ }
+
+ // Check if CrowdSec should be running based on mode
+ if secCfg.CrowdSecMode != "local" {
+ logger.Log().WithField("mode", secCfg.CrowdSecMode).Debug("CrowdSec mode is not 'local', skipping auto-start")
+ return
+ }
+
+ // Check if already running
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+
+ running, _, _ := executor.Status(ctx, dataDir)
+ if running {
+ logger.Log().Info("CrowdSec already running, no action needed")
+ return
+ }
+
+ // Start CrowdSec since DB says it should be enabled
+ logger.Log().Info("CrowdSec mode is 'local' but process not running, starting...")
+ _, err := executor.Start(ctx, binPath, dataDir)
+ if err != nil {
+ logger.Log().WithError(err).Warn("Failed to auto-start CrowdSec on startup reconciliation")
+ } else {
+ logger.Log().Info("CrowdSec started successfully via startup reconciliation")
+ }
}
```
-**After:**
-```typescript
-export interface CrowdSecStatus {
- running: boolean
- pid: number
- lapi_ready: boolean
-}
-
-export async function statusCrowdsec(): Promise {
- const resp = await client.get('/admin/crowdsec/status')
- return resp.data
+**Integration Point:** Call this function after database migration in server initialization:
+```go
+// In routes.go or server.go, after DB is ready and handlers are created
+if crowdsecHandler != nil {
+ ReconcileCrowdSecOnStartup(db, crowdsecHandler.Executor, crowdsecHandler.BinPath, crowdsecHandler.DataDir)
}
```
-### Change 3: Frontend CrowdSecConfig Conditional Logic
+### Fix 3: Ensure LogWatcher is Started and Log File Exists
-**File:** `frontend/src/pages/CrowdSecConfig.tsx`
-**Location:** Lines 207-231
+**File:** `backend/internal/api/routes/routes.go`
-**Before:**
-```tsx
-{/* Warning when CrowdSec LAPI is not running */}
-{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && (
-
-
-
-
- CrowdSec Local API is initializing...
-
-
- The CrowdSec process is running but the Local API (LAPI) is still starting up.
- This typically takes 5-10 seconds after enabling CrowdSec.
- {lapiStatusQuery.isRefetching && ' Checking again in 5 seconds...'}
-
-
-
- {!status?.crowdsec?.enabled && (
-
- )}
-
-
-
-)}
-```
+**Check that LogWatcher.Start() is called:**
+```go
+// Ensure LogWatcher is started with proper log path
+logPath := "/var/log/caddy/access.log"
-**After:**
-```tsx
-{/* Warning when CrowdSec process is running but LAPI is not ready */}
-{lapiStatusQuery.data && lapiStatusQuery.data.running && !lapiStatusQuery.data.lapi_ready && initialCheckComplete && (
-
-
-
-
- CrowdSec Local API is initializing...
-
-
- The CrowdSec process is running but the Local API (LAPI) is still starting up.
- This typically takes 5-10 seconds after enabling CrowdSec.
- {lapiStatusQuery.isRefetching && ' Checking again in 5 seconds...'}
-
-
-
-
-
-
-)}
+// Ensure log directory exists
+if err := os.MkdirAll(filepath.Dir(logPath), 0755); err != nil {
+ logger.Log().WithError(err).Warn("Failed to create log directory")
+}
-{/* Warning when CrowdSec is not running at all */}
-{lapiStatusQuery.data && !lapiStatusQuery.data.running && initialCheckComplete && (
-
-
-
-
- CrowdSec is not running
-
-
- Please enable CrowdSec using the toggle switch in the Security dashboard before enrolling in the Console.
-
-
-
-
-)}
-```
+// Create empty log file if it doesn't exist (allows LogWatcher to start tailing immediately)
+if _, err := os.Stat(logPath); os.IsNotExist(err) {
+ if f, err := os.Create(logPath); err == nil {
+ f.Close()
+ logger.Log().WithField("path", logPath).Info("Created empty log file for LogWatcher")
+ }
+}
-### Change 4: Update Enrollment Button Disabled State
-
-**File:** `frontend/src/pages/CrowdSecConfig.tsx`
-**Location:** Lines 255-289 (Enroll, Rotate key, and Retry enrollment buttons)
-
-**Before:**
-```tsx
-disabled={isConsolePending || (lapiStatusQuery.data && !lapiStatusQuery.data.running) || !enrollmentToken.trim()}
-```
-
-**After:**
-```tsx
-disabled={isConsolePending || (lapiStatusQuery.data && !lapiStatusQuery.data.lapi_ready) || !enrollmentToken.trim()}
-```
-
-Also update the `title` attributes:
-
-**Before:**
-```tsx
-title={
- lapiStatusQuery.data && !lapiStatusQuery.data.running
- ? 'CrowdSec LAPI must be running to enroll'
- : ...
+// Create and start the LogWatcher
+watcher := services.NewLogWatcher(logPath)
+if err := watcher.Start(context.Background()); err != nil {
+ logger.Log().WithError(err).Error("Failed to start LogWatcher")
}
```
-**After:**
-```tsx
-title={
- lapiStatusQuery.data && !lapiStatusQuery.data.lapi_ready
- ? 'CrowdSec LAPI must be running to enroll'
- : ...
-}
+**Additionally, verify CSP allows WebSocket:**
+The security middleware in `backend/internal/api/middleware/security.go` already has:
+```go
+directives["connect-src"] = "'self' ws: wss:" // WebSocket for HMR
+```
+
+This should allow WebSocket connections.
+
+---
+
+## Files to Modify
+
+| File | Change |
+|------|--------|
+| `backend/internal/api/handlers/crowdsec_exec.go` | Make `Stop()` idempotent for missing PID file |
+| `backend/internal/api/routes/routes.go` | Add CrowdSec startup reconciliation call |
+| `backend/internal/services/crowdsec_startup.go` | (NEW) Create startup reconciliation function |
+| `backend/internal/api/handlers/crowdsec_exec_test.go` | Add tests for Stop() idempotency |
+
+---
+
+## Testing Plan
+
+### Test 1: Stop() Idempotency
+```bash
+# Start container fresh (no CrowdSec running)
+docker compose down -v && docker compose up -d
+
+# Call Stop() without starting CrowdSec first
+curl -X POST http://localhost:8080/api/v1/admin/crowdsec/stop \
+ -H "Authorization: Bearer $TOKEN"
+
+# Expected: 200 OK {"status": "stopped"}
+# NOT: 500 Internal Server Error
+```
+
+### Test 2: Startup Reconciliation
+```bash
+# Enable CrowdSec via API
+curl -X POST http://localhost:8080/api/v1/admin/crowdsec/start \
+ -H "Authorization: Bearer $TOKEN"
+
+# Verify running
+curl http://localhost:8080/api/v1/admin/crowdsec/status \
+ -H "Authorization: Bearer $TOKEN"
+# Expected: {"running": true, "pid": xxx, "lapi_ready": true}
+
+# Restart container
+docker compose restart charon
+
+# Wait for startup
+sleep 10
+
+# Verify CrowdSec auto-started
+curl http://localhost:8080/api/v1/admin/crowdsec/status \
+ -H "Authorization: Bearer $TOKEN"
+# Expected: {"running": true, "pid": xxx, "lapi_ready": true}
+```
+
+### Test 3: Live Logs
+```bash
+# Connect to WebSocket
+websocat "ws://localhost:8080/api/v1/logs/live?token=$TOKEN"
+
+# In another terminal, generate traffic
+curl http://localhost:8080/api/v1/health
+
+# Verify log entries appear in WebSocket connection
```
---
-## Testing Steps
+## Implementation Order
-### Unit Test: Backend Status Handler
-
-Add test in `backend/internal/api/handlers/crowdsec_handler_test.go`:
-
-```go
-func TestCrowdsecHandler_Status_IncludesLAPIReady(t *testing.T) {
- mockExec := &fakeExec{running: true, pid: 1234}
- mockCmdExec := &mockCommandExecutor{returnErr: nil} // cscli lapi status succeeds
-
- handler := &CrowdsecHandler{
- Executor: mockExec,
- CmdExec: mockCmdExec,
- DataDir: "/app/data",
- }
-
- w := httptest.NewRecorder()
- c, _ := gin.CreateTestContext(w)
- c.Request = httptest.NewRequest(http.MethodGet, "/admin/crowdsec/status", nil)
-
- handler.Status(c)
-
- assert.Equal(t, http.StatusOK, w.Code)
-
- var response map[string]interface{}
- json.Unmarshal(w.Body.Bytes(), &response)
-
- assert.True(t, response["running"].(bool))
- assert.Equal(t, float64(1234), response["pid"].(float64))
- assert.True(t, response["lapi_ready"].(bool)) // NEW: Check lapi_ready is present and true
-}
-
-func TestCrowdsecHandler_Status_LAPINotReady(t *testing.T) {
- mockExec := &fakeExec{running: true, pid: 1234}
- mockCmdExec := &mockCommandExecutor{returnErr: errors.New("connection refused")} // cscli lapi status fails
-
- handler := &CrowdsecHandler{
- Executor: mockExec,
- CmdExec: mockCmdExec,
- DataDir: "/app/data",
- }
-
- w := httptest.NewRecorder()
- c, _ := gin.CreateTestContext(w)
- c.Request = httptest.NewRequest(http.MethodGet, "/admin/crowdsec/status", nil)
-
- handler.Status(c)
-
- assert.Equal(t, http.StatusOK, w.Code)
-
- var response map[string]interface{}
- json.Unmarshal(w.Body.Bytes(), &response)
-
- assert.True(t, response["running"].(bool))
- assert.Equal(t, float64(1234), response["pid"].(float64))
- assert.False(t, response["lapi_ready"].(bool)) // LAPI not ready
-}
-
-func TestCrowdsecHandler_Status_ProcessNotRunning(t *testing.T) {
- mockExec := &fakeExec{running: false, pid: 0}
- mockCmdExec := &mockCommandExecutor{}
-
- handler := &CrowdsecHandler{
- Executor: mockExec,
- CmdExec: mockCmdExec,
- DataDir: "/app/data",
- }
-
- w := httptest.NewRecorder()
- c, _ := gin.CreateTestContext(w)
- c.Request = httptest.NewRequest(http.MethodGet, "/admin/crowdsec/status", nil)
-
- handler.Status(c)
-
- assert.Equal(t, http.StatusOK, w.Code)
-
- var response map[string]interface{}
- json.Unmarshal(w.Body.Bytes(), &response)
-
- assert.False(t, response["running"].(bool))
- assert.False(t, response["lapi_ready"].(bool)) // LAPI can't be ready if process not running
-}
-```
-
-### Manual Testing Procedure
-
-1. **Start Fresh:**
- ```bash
- docker compose down -v
- docker compose up -d
- ```
-
-2. **Enable CrowdSec:**
- - Go to Security dashboard
- - Toggle CrowdSec ON
- - Wait for toast "CrowdSec started and LAPI is ready"
-
-3. **Navigate to Config:**
- - Click "Config" button
- - Verify NO "initializing" warning shows
- - Console enrollment section should be enabled
-
-4. **Verify API Response:**
- ```bash
- curl -s http://localhost:8080/api/v1/admin/crowdsec/status | jq
- ```
- Expected:
- ```json
- {
- "running": true,
- "pid": 123,
- "lapi_ready": true
- }
- ```
-
-5. **Test LAPI Down Scenario:**
- - SSH into container: `docker exec -it charon bash`
- - Stop CrowdSec: `pkill -f crowdsec`
- - Call API:
- ```bash
- curl -s http://localhost:8080/api/v1/admin/crowdsec/status | jq
- ```
- - Expected: `{"running": false, "pid": 0, "lapi_ready": false}`
- - Refresh CrowdSecConfig page
- - Should show "CrowdSec is not running" error (red)
-
-6. **Test Restart Scenario:**
- - Re-enable CrowdSec via Security dashboard
- - Immediately navigate to CrowdSecConfig
- - Should show "initializing" briefly (yellow) then clear when `lapi_ready: true`
+1. **Fix 1 (Stop Idempotency)** - Quick fix, high impact, unblocks users immediately
+2. **Fix 2 (Startup Reconciliation)** - Core fix for state mismatch after restart
+3. **Fix 3 (Live Logs)** - May need more investigation; likely already working if LogWatcher is started
---
@@ -530,46 +371,16 @@ func TestCrowdsecHandler_Status_ProcessNotRunning(t *testing.T) {
| Change | Risk | Mitigation |
|--------|------|------------|
-| Backend Status handler modification | Low | Status handler is read-only, adds 2s timeout check |
-| LAPI check timeout (2s) | Low | Short timeout prevents blocking; async refresh handles retries |
-| Frontend conditional logic change | Low | More precise state handling, clear error states |
-| Type definition update | Low | TypeScript will catch any mismatches at compile time |
-| Two separate warning states | Low | Better UX with distinct yellow (initializing) vs red (not running) |
+| Stop() idempotency | Very Low | Makes existing code more robust |
+| Startup reconciliation | Low | Only runs once at startup, respects DB state |
+| Log file creation | Very Low | Standard file operations with error handling |
---
-## Summary
+## Notes
-**Root Cause:** The `Status()` endpoint was not updated when `Start()` was modified to check LAPI readiness. The frontend expects the status endpoint to indicate LAPI availability, but it only returns process status.
-
-**Fix:** Add `lapi_ready` field to `Status()` response by checking `cscli lapi status`, update frontend to use this new field for the warning display logic.
-
-**Files Changed:**
-1. `backend/internal/api/handlers/crowdsec_handler.go` - Add LAPI check to Status()
-2. `frontend/src/api/crowdsec.ts` - Add TypeScript interface with `lapi_ready`
-3. `frontend/src/pages/CrowdSecConfig.tsx` - Update conditional logic:
- - Yellow warning: process running, LAPI not ready
- - Red warning: process not running
- - No warning: process running AND LAPI ready
-4. `backend/internal/api/handlers/crowdsec_handler_test.go` - Add unit tests
-
-**Estimated Time:** 1-2 hours including testing
-
-**Commit Message:**
-```
-fix: add LAPI readiness check to CrowdSec status endpoint
-
-The Status() handler was only checking if the CrowdSec process was
-running, not if LAPI was actually responding. This caused the
-CrowdSecConfig page to always show "LAPI is initializing" even when
-LAPI was fully operational.
-
-Changes:
-- Backend: Add `lapi_ready` field to /admin/crowdsec/status response
-- Frontend: Add CrowdSecStatus TypeScript interface
-- Frontend: Update conditional logic to check `lapi_ready` not `running`
-- Frontend: Separate warnings for "initializing" vs "not running"
-- Tests: Add unit tests for Status handler LAPI check
-
-Fixes regression from crowdsec_lapi_error_diagnostic.md fixes.
-```
+- The CrowdSec PID file path is `${dataDir}/crowdsec.pid` (e.g., `/app/data/crowdsec/crowdsec.pid`)
+- The LogWatcher monitors `/var/log/caddy/access.log` by default
+- WebSocket ping interval is 30 seconds for keep-alive
+- CrowdSec LAPI runs on port 8085 (not 8080) to avoid conflict with Charon
+- The `Status()` handler already includes `lapi_ready` field (from previous fix)
diff --git a/docs/plans/post_rebuild_diagnostic.md b/docs/plans/post_rebuild_diagnostic.md
new file mode 100644
index 00000000..26389275
--- /dev/null
+++ b/docs/plans/post_rebuild_diagnostic.md
@@ -0,0 +1,526 @@
+# Diagnostic & Fix Plan: CrowdSec and Live Logs Issues Post Docker Rebuild
+
+**Date:** December 14, 2025
+**Investigator:** Planning Agent
+**Scope:** Three user-reported issues after Docker rebuild
+**Status:** ✅ **COMPLETE - Root causes identified with fixes ready**
+
+---
+
+## Executive Summary
+
+After thorough investigation of the backend handlers, executor implementation, entrypoint script, and frontend code, I've identified the root causes for all three reported issues:
+
+1. **CrowdSec shows "not running"** - Process detection via PID file is failing
+2. **500 error when stopping CrowdSec** - PID file doesn't exist when CrowdSec wasn't started via handlers
+3. **Live log viewer disconnected** - LogWatcher can't find the access log file
+
+---
+
+## Issue 1: CrowdSec Shows "Not Running" Even Though Enabled in UI
+
+### Root Cause Analysis
+
+The mismatch occurs because:
+
+1. **Database Setting vs Process State**: The UI toggle updates the setting `security.crowdsec.enabled` in the database, but **does not actually start the CrowdSec process**.
+
+2. **Process Lifecycle Design**: Per [docker-entrypoint.sh](../../docker-entrypoint.sh) (line 56-65), CrowdSec is explicitly **NOT auto-started** in the container entrypoint:
+ ```bash
+ # CrowdSec Lifecycle Management:
+ # CrowdSec agent is NOT auto-started in the entrypoint.
+ # Instead, CrowdSec lifecycle is managed by the backend handlers via GUI controls.
+ ```
+
+3. **Status() Handler Behavior** ([crowdsec_handler.go#L238-L266](../../backend/internal/api/handlers/crowdsec_handler.go)):
+ - Calls `h.Executor.Status()` which reads from PID file at `{configDir}/crowdsec.pid`
+ - If PID file doesn't exist (CrowdSec never started), returns `running: false`
+ - The frontend correctly shows "Stopped" even when setting is "enabled"
+
+4. **The Disconnect**:
+ - Setting `security.crowdsec.enabled = true` ≠ Process running
+ - The setting tells Cerberus middleware to "use CrowdSec for protection" IF running
+ - The actual start requires clicking the toggle which calls `crowdsecPowerMutation.mutate(true)`
+
+### Why It Appears Broken
+
+After Docker rebuild:
+- Fresh container has `security.crowdsec.enabled` potentially still `true` in DB (persisted volume)
+- But PID file is gone (container restart)
+- CrowdSec process not running
+- UI shows "enabled" setting but status shows "not running"
+
+### Status() Handler Already Fixed
+
+Looking at the current implementation in [crowdsec_handler.go#L238-L266](../../backend/internal/api/handlers/crowdsec_handler.go), the `Status()` handler **already includes LAPI readiness check**:
+
+```go
+func (h *CrowdsecHandler) Status(c *gin.Context) {
+ ctx := c.Request.Context()
+ running, pid, err := h.Executor.Status(ctx, h.DataDir)
+ // ...
+ // Check LAPI connectivity if process is running
+ lapiReady := false
+ if running {
+ args := []string{"lapi", "status"}
+ // ... LAPI check implementation ...
+ lapiReady = (checkErr == nil)
+ }
+
+ c.JSON(http.StatusOK, gin.H{
+ "running": running,
+ "pid": pid,
+ "lapi_ready": lapiReady,
+ })
+}
+```
+
+### Additional Enhancement Required
+
+Add `setting_enabled` and `needs_start` fields to help frontend show correct state:
+
+**File:** [backend/internal/api/handlers/crowdsec_handler.go](../../backend/internal/api/handlers/crowdsec_handler.go)
+
+```go
+func (h *CrowdsecHandler) Status(c *gin.Context) {
+ ctx := c.Request.Context()
+ running, pid, err := h.Executor.Status(ctx, h.DataDir)
+ if err != nil {
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
+ return
+ }
+
+ // Check setting state
+ settingEnabled := false
+ if h.DB != nil {
+ var setting models.Setting
+ if err := h.DB.Where("key = ?", "security.crowdsec.enabled").First(&setting).Error; err == nil {
+ settingEnabled = strings.EqualFold(strings.TrimSpace(setting.Value), "true")
+ }
+ }
+
+ // Check LAPI connectivity if process is running
+ lapiReady := false
+ if running {
+ // ... existing LAPI check ...
+ }
+
+ c.JSON(http.StatusOK, gin.H{
+ "running": running,
+ "pid": pid,
+ "lapi_ready": lapiReady,
+ "setting_enabled": settingEnabled,
+ "needs_start": settingEnabled && !running, // NEW: hint for frontend
+ })
+}
+```
+
+---
+
+## Issue 2: 500 Error When Stopping CrowdSec
+
+### Root Cause Analysis
+
+The 500 error occurs in [crowdsec_exec.go#L37-L53](../../backend/internal/api/handlers/crowdsec_exec.go):
+
+```go
+func (e *DefaultCrowdsecExecutor) Stop(ctx context.Context, configDir string) error {
+ b, err := os.ReadFile(e.pidFile(configDir))
+ if err != nil {
+ return fmt.Errorf("pid file read: %w", err) // <-- 500 error here
+ }
+ // ...
+}
+```
+
+**The Problem:**
+1. PID file at `/app/data/crowdsec/crowdsec.pid` doesn't exist
+2. This happens when:
+ - CrowdSec was never started via the handlers
+ - Container was restarted (PID file lost)
+ - CrowdSec was started externally but not via Charon handlers
+
+### Fix Required
+
+Modify `Stop()` in [crowdsec_exec.go](../../backend/internal/api/handlers/crowdsec_exec.go) to handle missing PID gracefully:
+
+```go
+func (e *DefaultCrowdsecExecutor) Stop(ctx context.Context, configDir string) error {
+ b, err := os.ReadFile(e.pidFile(configDir))
+ if err != nil {
+ if os.IsNotExist(err) {
+ // PID file doesn't exist - process likely not running or was started externally
+ // Try to find and stop any running crowdsec process
+ return e.stopByProcessName(ctx)
+ }
+ return fmt.Errorf("pid file read: %w", err)
+ }
+ pid, err := strconv.Atoi(string(b))
+ if err != nil {
+ return fmt.Errorf("invalid pid: %w", err)
+ }
+ proc, err := os.FindProcess(pid)
+ if err != nil {
+ return err
+ }
+ if err := proc.Signal(syscall.SIGTERM); err != nil {
+ // Process might already be dead
+ if errors.Is(err, os.ErrProcessDone) {
+ _ = os.Remove(e.pidFile(configDir))
+ return nil
+ }
+ return err
+ }
+ _ = os.Remove(e.pidFile(configDir))
+ return nil
+}
+
+// stopByProcessName attempts to stop CrowdSec by finding it via process name
+func (e *DefaultCrowdsecExecutor) stopByProcessName(ctx context.Context) error {
+ // Use pkill or pgrep to find crowdsec process
+ cmd := exec.CommandContext(ctx, "pkill", "-TERM", "crowdsec")
+ err := cmd.Run()
+ if err != nil {
+ // pkill returns exit code 1 if no processes matched - that's OK
+ if exitErr, ok := err.(*exec.ExitError); ok && exitErr.ExitCode() == 1 {
+ return nil // No process to kill, already stopped
+ }
+ return fmt.Errorf("failed to stop crowdsec by process name: %w", err)
+ }
+ return nil
+}
+```
+
+**File:** [backend/internal/api/handlers/crowdsec_exec.go](../../backend/internal/api/handlers/crowdsec_exec.go)
+
+---
+
+## Issue 3: Live Log Viewer Disconnected on Cerberus Dashboard
+
+### Root Cause Analysis
+
+The Live Log Viewer uses two WebSocket endpoints:
+
+1. **Application Logs** (`/api/v1/logs/live`) - Works via `BroadcastHook` in logger
+2. **Security Logs** (`/api/v1/cerberus/logs/ws`) - Requires `LogWatcher` to tail access log file
+
+The Cerberus Security Logs WebSocket ([cerberus_logs_ws.go](../../backend/internal/api/handlers/cerberus_logs_ws.go)) depends on `LogWatcher` which tails `/var/log/caddy/access.log`.
+
+**The Problem:**
+
+In [log_watcher.go#L102-L117](../../backend/internal/services/log_watcher.go):
+```go
+func (w *LogWatcher) tailFile() {
+ for {
+ // Wait for file to exist
+ if _, err := os.Stat(w.logPath); os.IsNotExist(err) {
+ logger.Log().WithField("path", w.logPath).Debug("Log file not found, waiting...")
+ time.Sleep(time.Second)
+ continue
+ }
+ // ...
+ }
+}
+```
+
+After Docker rebuild:
+1. Caddy may not have written any logs yet
+2. `/var/log/caddy/access.log` doesn't exist
+3. `LogWatcher` enters infinite "waiting" loop
+4. No log entries are ever sent to WebSocket clients
+5. Frontend shows "disconnected" because no heartbeat/data received
+
+### Why "Disconnected" Appears
+
+From [cerberus_logs_ws.go#L79-L83](../../backend/internal/api/handlers/cerberus_logs_ws.go):
+```go
+case <-ticker.C:
+ // Send ping to keep connection alive
+ if err := conn.WriteMessage(websocket.PingMessage, []byte{}); err != nil {
+ return
+ }
+```
+
+The ping is sent every 30 seconds, but if the frontend's WebSocket connection times out or encounters an error before receiving any message, it shows "disconnected".
+
+### Fix Required
+
+**Fix 1:** Create log file if missing in `LogWatcher.Start()`:
+
+**File:** [backend/internal/services/log_watcher.go](../../backend/internal/services/log_watcher.go)
+
+```go
+import "path/filepath"
+
+func (w *LogWatcher) Start(ctx context.Context) error {
+ w.mu.Lock()
+ if w.started {
+ w.mu.Unlock()
+ return nil
+ }
+ w.started = true
+ w.mu.Unlock()
+
+ // Ensure log file exists
+ logDir := filepath.Dir(w.logPath)
+ if err := os.MkdirAll(logDir, 0755); err != nil {
+ logger.Log().WithError(err).Warn("Failed to create log directory")
+ }
+ if _, err := os.Stat(w.logPath); os.IsNotExist(err) {
+ if f, err := os.Create(w.logPath); err == nil {
+ f.Close()
+ logger.Log().WithField("path", w.logPath).Info("Created empty log file for tailing")
+ }
+ }
+
+ go w.tailFile()
+ logger.Log().WithField("path", w.logPath).Info("LogWatcher started")
+ return nil
+}
+```
+
+**Fix 2:** Send initial heartbeat message on WebSocket connect:
+
+**File:** [backend/internal/api/handlers/cerberus_logs_ws.go](../../backend/internal/api/handlers/cerberus_logs_ws.go)
+
+```go
+func (h *CerberusLogsHandler) LiveLogs(c *gin.Context) {
+ // ... existing upgrade code ...
+
+ logger.Log().WithField("subscriber_id", subscriberID).Info("Cerberus logs WebSocket connected")
+
+ // Send connection confirmation immediately
+ _ = conn.WriteJSON(map[string]interface{}{
+ "type": "connected",
+ "timestamp": time.Now().Format(time.RFC3339),
+ })
+
+ // ... rest unchanged ...
+}
+```
+
+---
+
+## Summary of Required Changes
+
+### File 1: [backend/internal/api/handlers/crowdsec_exec.go](../../backend/internal/api/handlers/crowdsec_exec.go)
+
+**Change:** Make `Stop()` handle missing PID file gracefully
+
+```go
+// Add import for exec
+import "os/exec"
+
+// Add this method
+func (e *DefaultCrowdsecExecutor) stopByProcessName(ctx context.Context) error {
+ cmd := exec.CommandContext(ctx, "pkill", "-TERM", "crowdsec")
+ err := cmd.Run()
+ if err != nil {
+ if exitErr, ok := err.(*exec.ExitError); ok && exitErr.ExitCode() == 1 {
+ return nil
+ }
+ return fmt.Errorf("failed to stop crowdsec by process name: %w", err)
+ }
+ return nil
+}
+
+// Modify Stop()
+func (e *DefaultCrowdsecExecutor) Stop(ctx context.Context, configDir string) error {
+ b, err := os.ReadFile(e.pidFile(configDir))
+ if err != nil {
+ if os.IsNotExist(err) {
+ return e.stopByProcessName(ctx)
+ }
+ return fmt.Errorf("pid file read: %w", err)
+ }
+ // ... rest unchanged ...
+}
+```
+
+### File 2: [backend/internal/services/log_watcher.go](../../backend/internal/services/log_watcher.go)
+
+**Change:** Ensure log file exists before starting tail
+
+```go
+import "path/filepath"
+
+func (w *LogWatcher) Start(ctx context.Context) error {
+ w.mu.Lock()
+ if w.started {
+ w.mu.Unlock()
+ return nil
+ }
+ w.started = true
+ w.mu.Unlock()
+
+ // Ensure log file exists
+ logDir := filepath.Dir(w.logPath)
+ if err := os.MkdirAll(logDir, 0755); err != nil {
+ logger.Log().WithError(err).Warn("Failed to create log directory")
+ }
+ if _, err := os.Stat(w.logPath); os.IsNotExist(err) {
+ if f, err := os.Create(w.logPath); err == nil {
+ f.Close()
+ }
+ }
+
+ go w.tailFile()
+ logger.Log().WithField("path", w.logPath).Info("LogWatcher started")
+ return nil
+}
+```
+
+### File 3: [backend/internal/api/handlers/cerberus_logs_ws.go](../../backend/internal/api/handlers/cerberus_logs_ws.go)
+
+**Change:** Send connection confirmation on WebSocket connect
+
+```go
+func (h *CerberusLogsHandler) LiveLogs(c *gin.Context) {
+ // ... existing upgrade code ...
+
+ logger.Log().WithField("subscriber_id", subscriberID).Info("Cerberus logs WebSocket connected")
+
+ // Send connection confirmation immediately
+ _ = conn.WriteJSON(map[string]interface{}{
+ "type": "connected",
+ "timestamp": time.Now().Format(time.RFC3339),
+ })
+
+ // ... rest unchanged ...
+}
+```
+
+### File 4: [backend/internal/api/handlers/crowdsec_handler.go](../../backend/internal/api/handlers/crowdsec_handler.go)
+
+**Change:** Add setting reconciliation hint in Status response
+
+```go
+func (h *CrowdsecHandler) Status(c *gin.Context) {
+ ctx := c.Request.Context()
+ running, pid, err := h.Executor.Status(ctx, h.DataDir)
+ if err != nil {
+ c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
+ return
+ }
+
+ // Check setting state
+ settingEnabled := false
+ if h.DB != nil {
+ var setting models.Setting
+ if err := h.DB.Where("key = ?", "security.crowdsec.enabled").First(&setting).Error; err == nil {
+ settingEnabled = strings.EqualFold(strings.TrimSpace(setting.Value), "true")
+ }
+ }
+
+ // Check LAPI connectivity if process is running
+ lapiReady := false
+ if running {
+ // ... existing LAPI check ...
+ }
+
+ c.JSON(http.StatusOK, gin.H{
+ "running": running,
+ "pid": pid,
+ "lapi_ready": lapiReady,
+ "setting_enabled": settingEnabled,
+ "needs_start": settingEnabled && !running,
+ })
+}
+```
+
+---
+
+## Testing Steps
+
+### Test Issue 1: CrowdSec Status Consistency
+
+1. Start container fresh
+2. Check Security dashboard - should show CrowdSec as "Disabled"
+3. Toggle CrowdSec on - should start process and show "Running"
+4. Restart container
+5. Check Security dashboard - should show "needs restart" or auto-start
+
+### Test Issue 2: Stop CrowdSec Without Error
+
+1. With CrowdSec not running, try to stop via UI toggle
+2. Should NOT return 500 error
+3. Should return success or "already stopped"
+4. Check logs for graceful handling
+
+### Test Issue 3: Live Logs Connection
+
+1. Start container fresh
+2. Navigate to Cerberus Dashboard
+3. Live Log Viewer should show "Connected" status
+4. Make a request to trigger log entry
+5. Entry should appear in viewer
+
+### Integration Test
+
+```bash
+# Run in container
+cd /projects/Charon/backend
+go test ./internal/api/handlers/... -run TestCrowdsec -v
+```
+
+---
+
+## Debug Commands
+
+```bash
+# Check if CrowdSec PID file exists
+ls -la /app/data/crowdsec/crowdsec.pid
+
+# Check CrowdSec process status
+pgrep -la crowdsec
+
+# Check access log file
+ls -la /var/log/caddy/access.log
+
+# Test LAPI health
+curl http://127.0.0.1:8085/health
+
+# Check WebSocket endpoint
+# In browser console:
+# new WebSocket('ws://localhost:8080/api/v1/cerberus/logs/ws')
+```
+
+---
+
+## Conclusion
+
+All three issues stem from **state synchronization problems** after container restart:
+
+1. **CrowdSec**: Database setting doesn't match process state
+2. **Stop Error**: Handler assumes PID file exists when it may not
+3. **Live Logs**: Log file may not exist, causing LogWatcher to wait indefinitely
+
+The fixes are defensive programming patterns:
+- Handle missing PID file gracefully
+- Create log files if they don't exist
+- Add reconciliation hints in status responses
+- Send WebSocket heartbeats immediately on connect
+
+---
+
+## Commit Message Template
+
+```
+fix: handle container restart edge cases for CrowdSec and Live Logs
+
+Issue 1 - CrowdSec "not running" status:
+- Add setting_enabled and needs_start fields to Status() response
+- Frontend can now show proper "needs restart" state
+
+Issue 2 - 500 error on Stop:
+- Handle missing PID file gracefully in Stop()
+- Fallback to pkill if PID file doesn't exist
+- Return success if process already stopped
+
+Issue 3 - Live Logs disconnected:
+- Create log file if it doesn't exist on LogWatcher.Start()
+- Send WebSocket connection confirmation immediately
+- Ensure clients know connection is alive before first log entry
+
+All fixes are defensive programming patterns for container restart scenarios.
+```