1006 lines
37 KiB
Markdown
1006 lines
37 KiB
Markdown
# CrowdSec Toggle Integration Fix Plan
|
|
|
|
**Date**: December 15, 2025
|
|
**Issue**: CrowdSec toggle stuck ON, reconciliation silently exits, process not starting
|
|
**Root Cause**: Database disconnect between frontend (Settings table) and reconciliation (SecurityConfig table)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The CrowdSec toggle shows "ON" but the process is NOT running. The reconciliation function silently exits without starting CrowdSec because:
|
|
|
|
1. **Frontend writes to Settings table** (`security.crowdsec.enabled`)
|
|
2. **Backend reconciliation reads from SecurityConfig table** (`crowdsec_mode = "local"`)
|
|
3. **No synchronization** between the two tables
|
|
4. **Auto-initialization code EXISTS** (lines 46-71 in crowdsec_startup.go) but creates config with `crowdsec_mode = "disabled"`
|
|
5. **Reconciliation sees "disabled"** and exits silently with no logs
|
|
|
|
---
|
|
|
|
## Root Cause Analysis (DETAILED)
|
|
|
|
### Evidence Trail
|
|
|
|
**Container Logs Show Silent Exit**:
|
|
```
|
|
{"bin_path":"crowdsec","data_dir":"/app/data/crowdsec","level":"info","msg":"CrowdSec reconciliation: starting startup check","time":"2025-12-14T23:32:33-05:00"}
|
|
[NO FURTHER LOGS - Function exited here]
|
|
```
|
|
|
|
**Database State on Fresh Start**:
|
|
```
|
|
SELECT * FROM security_configs → record not found
|
|
{"level":"info","msg":"CrowdSec reconciliation: no SecurityConfig found, creating default config"}
|
|
```
|
|
|
|
**Process Check**:
|
|
```bash
|
|
$ docker exec charon ps aux | grep -i crowdsec
|
|
[NO RESULTS - Process not running]
|
|
```
|
|
|
|
### Why Reconciliation Exits Silently
|
|
|
|
**FILE**: `backend/internal/services/crowdsec_startup.go`
|
|
|
|
**Execution Flow**:
|
|
```
|
|
1. User clicks toggle ON in Security.tsx
|
|
2. Frontend calls updateSetting('security.crowdsec.enabled', 'true')
|
|
3. Settings table updated → security.crowdsec.enabled = "true"
|
|
4. Frontend calls startCrowdsec() → Handler updates SecurityConfig
|
|
5. CrowdSec starts successfully, toggle shows ON
|
|
6. Container restarts (docker restart or reboot)
|
|
7. ReconcileCrowdSecOnStartup() executes at line 26:
|
|
|
|
Line 44: db.First(&cfg) → returns gorm.ErrRecordNotFound
|
|
|
|
Lines 46-71: Auto-initialization block executes:
|
|
- Creates SecurityConfig with crowdsec_mode = "disabled"
|
|
- Logs "default SecurityConfig created successfully"
|
|
- Returns early (line 70) WITHOUT checking Settings table
|
|
- CrowdSec is NEVER started
|
|
|
|
Result: Toggle shows "ON" (Settings table), but process is "OFF" (not running)
|
|
```
|
|
|
|
**THE BUG (Lines 46-71)**:
|
|
```go
|
|
if err == gorm.ErrRecordNotFound {
|
|
// AUTO-INITIALIZE: Create default SecurityConfig on first startup
|
|
logger.Log().Info("CrowdSec reconciliation: no SecurityConfig found, creating default config")
|
|
|
|
defaultCfg := models.SecurityConfig{
|
|
UUID: "default",
|
|
Name: "Default Security Config",
|
|
Enabled: false,
|
|
CrowdSecMode: "disabled", // ← PROBLEM: Ignores Settings table state
|
|
WAFMode: "disabled",
|
|
WAFParanoiaLevel: 1,
|
|
RateLimitMode: "disabled",
|
|
RateLimitBurst: 10,
|
|
RateLimitRequests: 100,
|
|
RateLimitWindowSec: 60,
|
|
}
|
|
|
|
if err := db.Create(&defaultCfg).Error; err != nil {
|
|
logger.Log().WithError(err).Error("CrowdSec reconciliation: failed to create default SecurityConfig")
|
|
return
|
|
}
|
|
|
|
logger.Log().Info("CrowdSec reconciliation: default SecurityConfig created successfully")
|
|
// Don't start CrowdSec on fresh install - user must enable via UI
|
|
return // ← EXITS WITHOUT checking Settings table or starting process
|
|
}
|
|
```
|
|
|
|
**Why This Causes the Issue**:
|
|
|
|
1. **First Container Start**: User enables CrowdSec via toggle
|
|
- Settings: `security.crowdsec.enabled = "true"` ✅
|
|
- SecurityConfig: `crowdsec_mode = "local"` ✅ (via Start handler)
|
|
- Process: Running ✅
|
|
|
|
2. **Container Restart**: Database persists but SecurityConfig table may be empty (migration issue or corruption)
|
|
- Reconciliation runs
|
|
- SecurityConfig table: **EMPTY** (record lost or never migrated)
|
|
- Auto-init creates SecurityConfig with `crowdsec_mode = "disabled"`
|
|
- Returns early without checking Settings table
|
|
- Settings: Still shows `"true"` (UI says ON)
|
|
- SecurityConfig: Says `"disabled"` (reconciliation source)
|
|
- Process: NOT started ❌
|
|
|
|
3. **Result**: **State Mismatch**
|
|
- Frontend toggle: **ON** (reads Settings table)
|
|
- Backend reconciliation: **OFF** (reads SecurityConfig table)
|
|
- Process: **NOT RUNNING** (reconciliation didn't start it)
|
|
|
|
---
|
|
|
|
## Current Code Analysis
|
|
|
|
### 1. Reconciliation Function (crowdsec_startup.go)
|
|
|
|
**Location**: `backend/internal/services/crowdsec_startup.go`
|
|
|
|
**Lines 44-71 (Auto-initialization - THE BUG)**:
|
|
```go
|
|
var cfg models.SecurityConfig
|
|
if err := db.First(&cfg).Error; err != nil {
|
|
if err == gorm.ErrRecordNotFound {
|
|
// AUTO-INITIALIZE: Create default SecurityConfig on first startup
|
|
logger.Log().Info("CrowdSec reconciliation: no SecurityConfig found, creating default config")
|
|
|
|
defaultCfg := models.SecurityConfig{
|
|
UUID: "default",
|
|
Name: "Default Security Config",
|
|
Enabled: false,
|
|
CrowdSecMode: "disabled", // ← IGNORES Settings table
|
|
WAFMode: "disabled",
|
|
WAFParanoiaLevel: 1,
|
|
RateLimitMode: "disabled",
|
|
RateLimitBurst: 10,
|
|
RateLimitRequests: 100,
|
|
RateLimitWindowSec: 60,
|
|
}
|
|
|
|
if err := db.Create(&defaultCfg).Error; err != nil {
|
|
logger.Log().WithError(err).Error("CrowdSec reconciliation: failed to create default SecurityConfig")
|
|
return
|
|
}
|
|
|
|
logger.Log().Info("CrowdSec reconciliation: default SecurityConfig created successfully")
|
|
// Don't start CrowdSec on fresh install - user must enable via UI
|
|
return // ← EARLY EXIT - Never checks Settings table
|
|
}
|
|
logger.Log().WithError(err).Warn("CrowdSec reconciliation: failed to read SecurityConfig")
|
|
return
|
|
}
|
|
```
|
|
|
|
**Lines 74-90 (Runtime Setting Override - UNREACHABLE after auto-init)**:
|
|
```go
|
|
// Also check for runtime setting override in settings table
|
|
var settingOverride struct{ Value string }
|
|
crowdSecEnabled := false
|
|
if err := db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&settingOverride).Error; err == nil && settingOverride.Value != "" {
|
|
crowdSecEnabled = strings.EqualFold(settingOverride.Value, "true")
|
|
logger.Log().WithFields(map[string]interface{}{
|
|
"setting_value": settingOverride.Value,
|
|
"crowdsec_enabled": crowdSecEnabled,
|
|
}).Debug("CrowdSec reconciliation: found runtime setting override")
|
|
}
|
|
```
|
|
|
|
**This code is NEVER REACHED** when SecurityConfig doesn't exist because line 70 returns early!
|
|
|
|
**Lines 91-98 (Decision Logic)**:
|
|
```go
|
|
// Only auto-start if CrowdSecMode is "local" OR runtime setting is enabled
|
|
if cfg.CrowdSecMode != "local" && !crowdSecEnabled {
|
|
logger.Log().WithFields(map[string]interface{}{
|
|
"db_mode": cfg.CrowdSecMode,
|
|
"setting_enabled": crowdSecEnabled,
|
|
}).Debug("CrowdSec reconciliation skipped: mode is not 'local' and setting not enabled")
|
|
return
|
|
}
|
|
```
|
|
|
|
**Also UNREACHABLE** during auto-init scenario!
|
|
|
|
### 2. Start Handler (crowdsec_handler.go)
|
|
|
|
**Location**: `backend/internal/api/handlers/crowdsec_handler.go`
|
|
|
|
**Lines 167-192 - CORRECT IMPLEMENTATION**:
|
|
```go
|
|
func (h *CrowdsecHandler) Start(c *gin.Context) {
|
|
ctx := c.Request.Context()
|
|
|
|
// UPDATE SecurityConfig to persist user's intent
|
|
var cfg models.SecurityConfig
|
|
if err := h.DB.First(&cfg).Error; err != nil {
|
|
if err == gorm.ErrRecordNotFound {
|
|
// Create default config with CrowdSec enabled
|
|
cfg = models.SecurityConfig{
|
|
UUID: "default",
|
|
Name: "Default Security Config",
|
|
Enabled: true,
|
|
CrowdSecMode: "local", // ← CORRECT: Sets mode to "local"
|
|
}
|
|
if err := h.DB.Create(&cfg).Error; err != nil {
|
|
logger.Log().WithError(err).Error("Failed to create SecurityConfig")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist configuration"})
|
|
return
|
|
}
|
|
} else {
|
|
logger.Log().WithError(err).Error("Failed to read SecurityConfig")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read configuration"})
|
|
return
|
|
}
|
|
} else {
|
|
// Update existing config
|
|
cfg.CrowdSecMode = "local"
|
|
cfg.Enabled = true
|
|
if err := h.DB.Save(&cfg).Error; err != nil {
|
|
logger.Log().WithError(err).Error("Failed to update SecurityConfig")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist configuration"})
|
|
return
|
|
}
|
|
}
|
|
|
|
// Start the process...
|
|
}
|
|
```
|
|
|
|
**Analysis**: This is CORRECT. The Start handler properly updates SecurityConfig when user clicks "Start" from the CrowdSec config page (/security/crowdsec).
|
|
|
|
### 3. Frontend Toggle (Security.tsx)
|
|
|
|
**Location**: `frontend/src/pages/Security.tsx`
|
|
|
|
**Lines 64-120 - THE DISCONNECT**:
|
|
```tsx
|
|
const crowdsecPowerMutation = useMutation({
|
|
mutationFn: async (enabled: boolean) => {
|
|
// Step 1: Update Settings table
|
|
await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool')
|
|
|
|
if (enabled) {
|
|
// Step 2: Call Start() which updates SecurityConfig
|
|
const result = await startCrowdsec()
|
|
|
|
// Step 3: Verify running
|
|
const status = await statusCrowdsec()
|
|
if (!status.running) {
|
|
await updateSetting('security.crowdsec.enabled', 'false', 'security', 'bool')
|
|
throw new Error('CrowdSec process failed to start')
|
|
}
|
|
|
|
return result
|
|
} else {
|
|
// Step 2: Call Stop() which DOES NOT update SecurityConfig!
|
|
await stopCrowdsec()
|
|
|
|
// Step 3: Verify stopped
|
|
await new Promise(resolve => setTimeout(resolve, 500))
|
|
const status = await statusCrowdsec()
|
|
if (status.running) {
|
|
throw new Error('CrowdSec process still running')
|
|
}
|
|
|
|
return { enabled: false }
|
|
}
|
|
},
|
|
})
|
|
```
|
|
|
|
**Analysis**:
|
|
- **Enable Path**: Updates Settings → Calls Start() → Start() updates SecurityConfig → ✅ Both tables synced
|
|
- **Disable Path**: Updates Settings → Calls Stop() → Stop() **does NOT always update SecurityConfig** → ❌ Tables out of sync
|
|
|
|
Looking at the Stop handler:
|
|
```go
|
|
func (h *CrowdsecHandler) Stop(c *gin.Context) {
|
|
ctx := c.Request.Context()
|
|
if err := h.Executor.Stop(ctx, h.DataDir); err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
// UPDATE SecurityConfig to persist user's intent
|
|
var cfg models.SecurityConfig
|
|
if err := h.DB.First(&cfg).Error; err == nil {
|
|
cfg.CrowdSecMode = "disabled"
|
|
cfg.Enabled = false
|
|
if err := h.DB.Save(&cfg).Error; err != nil {
|
|
logger.Log().WithError(err).Warn("Failed to update SecurityConfig after stopping CrowdSec")
|
|
}
|
|
}
|
|
|
|
c.JSON(http.StatusOK, gin.H{"status": "stopped"})
|
|
}
|
|
```
|
|
|
|
**This IS CORRECT** - Stop() handler updates SecurityConfig when it can find it. BUT:
|
|
|
|
**Scenario Where It Fails**:
|
|
1. SecurityConfig table gets corrupted/cleared/migrated incorrectly
|
|
2. User clicks toggle OFF
|
|
3. Stop() tries to update SecurityConfig → record not found → skips update
|
|
4. Settings table still updated to "false"
|
|
5. Container restarts → auto-init creates SecurityConfig with "disabled"
|
|
6. Both tables say "disabled" but UI might show stale state
|
|
|
|
---
|
|
|
|
## Comprehensive Fix Strategy
|
|
|
|
### Phase 1: Fix Auto-Initialization (CRITICAL - IMMEDIATE)
|
|
|
|
**FILE**: `backend/internal/services/crowdsec_startup.go`
|
|
|
|
**CHANGE**: Lines 46-71 (auto-initialization block)
|
|
|
|
**AFTER** (with Settings table check):
|
|
```go
|
|
if err == gorm.ErrRecordNotFound {
|
|
// AUTO-INITIALIZE: Create default SecurityConfig by checking Settings table
|
|
logger.Log().Info("CrowdSec reconciliation: no SecurityConfig found, checking Settings table for user preference")
|
|
|
|
// Check if user has already enabled CrowdSec via Settings table (from toggle or legacy config)
|
|
var settingOverride struct{ Value string }
|
|
crowdSecEnabledInSettings := false
|
|
if err := db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&settingOverride).Error; err == nil && settingOverride.Value != "" {
|
|
crowdSecEnabledInSettings = strings.EqualFold(settingOverride.Value, "true")
|
|
logger.Log().WithFields(map[string]interface{}{
|
|
"setting_value": settingOverride.Value,
|
|
"enabled": crowdSecEnabledInSettings,
|
|
}).Info("CrowdSec reconciliation: found existing Settings table preference")
|
|
}
|
|
|
|
// Create SecurityConfig that matches Settings table state
|
|
crowdSecMode := "disabled"
|
|
if crowdSecEnabledInSettings {
|
|
crowdSecMode = "local"
|
|
}
|
|
|
|
defaultCfg := models.SecurityConfig{
|
|
UUID: "default",
|
|
Name: "Default Security Config",
|
|
Enabled: crowdSecEnabledInSettings,
|
|
CrowdSecMode: crowdSecMode, // ← NOW RESPECTS Settings table
|
|
WAFMode: "disabled",
|
|
WAFParanoiaLevel: 1,
|
|
RateLimitMode: "disabled",
|
|
RateLimitBurst: 10,
|
|
RateLimitRequests: 100,
|
|
RateLimitWindowSec: 60,
|
|
}
|
|
|
|
if err := db.Create(&defaultCfg).Error; err != nil {
|
|
logger.Log().WithError(err).Error("CrowdSec reconciliation: failed to create default SecurityConfig")
|
|
return
|
|
}
|
|
|
|
logger.Log().WithFields(map[string]interface{}{
|
|
"crowdsec_mode": defaultCfg.CrowdSecMode,
|
|
"enabled": defaultCfg.Enabled,
|
|
"source": "settings_table",
|
|
}).Info("CrowdSec reconciliation: default SecurityConfig created from Settings preference")
|
|
|
|
// Continue to process the config (DON'T return early)
|
|
cfg = defaultCfg
|
|
}
|
|
```
|
|
|
|
**KEY CHANGES**:
|
|
1. **Check Settings table** during auto-initialization
|
|
2. **Create SecurityConfig matching Settings state** (not hardcoded "disabled")
|
|
3. **Don't return early** - let the rest of the function process the config
|
|
4. **Assign to cfg variable** so flow continues to line 74+
|
|
|
|
### Phase 2: Enhance Logging (IMMEDIATE)
|
|
|
|
**FILE**: `backend/internal/services/crowdsec_startup.go`
|
|
|
|
**CHANGE**: Lines 91-98 (decision logic - better logging)
|
|
|
|
**AFTER**:
|
|
```go
|
|
// Start when EITHER SecurityConfig has mode="local" OR Settings table has enabled=true
|
|
// Exit only when BOTH are disabled
|
|
if cfg.CrowdSecMode != "local" && !crowdSecEnabled {
|
|
logger.Log().WithFields(map[string]interface{}{
|
|
"db_mode": cfg.CrowdSecMode,
|
|
"setting_enabled": crowdSecEnabled,
|
|
}).Info("CrowdSec reconciliation skipped: both SecurityConfig and Settings indicate disabled")
|
|
return
|
|
}
|
|
|
|
// Log which source triggered the start
|
|
if cfg.CrowdSecMode == "local" {
|
|
logger.Log().WithField("mode", cfg.CrowdSecMode).Info("CrowdSec reconciliation: starting based on SecurityConfig mode='local'")
|
|
} else if crowdSecEnabled {
|
|
logger.Log().WithField("setting", "true").Info("CrowdSec reconciliation: starting based on Settings table override")
|
|
}
|
|
```
|
|
|
|
**KEY CHANGES**:
|
|
1. **Change log level** from Debug to Info (so we see it in logs)
|
|
2. **Add source attribution** (which table triggered the start)
|
|
3. **Clarify condition** (exit only when BOTH are disabled)
|
|
|
|
### Phase 3: Add Unified Toggle Endpoint (OPTIONAL BUT RECOMMENDED)
|
|
|
|
**WHY**: Currently the toggle updates Settings, then calls Start/Stop which updates SecurityConfig. This creates potential race conditions. A unified endpoint is safer.
|
|
|
|
**FILE**: `backend/internal/api/handlers/crowdsec_handler.go`
|
|
|
|
**ADD**: New method (after Stop(), around line 260)
|
|
|
|
```go
|
|
// ToggleCrowdSec enables or disables CrowdSec, synchronizing Settings and SecurityConfig atomically
|
|
func (h *CrowdsecHandler) ToggleCrowdSec(c *gin.Context) {
|
|
var payload struct {
|
|
Enabled bool `json:"enabled"`
|
|
}
|
|
if err := c.ShouldBindJSON(&payload); err != nil {
|
|
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid payload"})
|
|
return
|
|
}
|
|
|
|
logger.Log().WithField("enabled", payload.Enabled).Info("CrowdSec toggle: received request")
|
|
|
|
// Use a transaction to ensure Settings and SecurityConfig stay in sync
|
|
tx := h.DB.Begin()
|
|
defer func() {
|
|
if r := recover(); r != nil {
|
|
tx.Rollback()
|
|
}
|
|
}()
|
|
|
|
// STEP 1: Update Settings table
|
|
settingKey := "security.crowdsec.enabled"
|
|
settingValue := "false"
|
|
if payload.Enabled {
|
|
settingValue = "true"
|
|
}
|
|
|
|
var settingModel models.Setting
|
|
if err := tx.Where("key = ?", settingKey).FirstOrCreate(&settingModel, models.Setting{
|
|
Key: settingKey,
|
|
Value: settingValue,
|
|
Type: "bool",
|
|
Category: "security",
|
|
}).Error; err != nil {
|
|
tx.Rollback()
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: failed to update Settings table")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to update settings"})
|
|
return
|
|
}
|
|
settingModel.Value = settingValue
|
|
if err := tx.Save(&settingModel).Error; err != nil {
|
|
tx.Rollback()
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: failed to save Settings table")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to update settings"})
|
|
return
|
|
}
|
|
|
|
// STEP 2: Update SecurityConfig table
|
|
var cfg models.SecurityConfig
|
|
if err := tx.First(&cfg).Error; err != nil {
|
|
if err == gorm.ErrRecordNotFound {
|
|
// Create config matching toggle state
|
|
crowdSecMode := "disabled"
|
|
if payload.Enabled {
|
|
crowdSecMode = "local"
|
|
}
|
|
|
|
cfg = models.SecurityConfig{
|
|
UUID: "default",
|
|
Name: "Default Security Config",
|
|
Enabled: payload.Enabled,
|
|
CrowdSecMode: crowdSecMode,
|
|
WAFMode: "disabled",
|
|
WAFParanoiaLevel: 1,
|
|
RateLimitMode: "disabled",
|
|
RateLimitBurst: 10,
|
|
RateLimitRequests: 100,
|
|
RateLimitWindowSec: 60,
|
|
}
|
|
if err := tx.Create(&cfg).Error; err != nil {
|
|
tx.Rollback()
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: failed to create SecurityConfig")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist configuration"})
|
|
return
|
|
}
|
|
} else {
|
|
tx.Rollback()
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: failed to read SecurityConfig")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to read configuration"})
|
|
return
|
|
}
|
|
} else {
|
|
// Update existing config
|
|
if payload.Enabled {
|
|
cfg.CrowdSecMode = "local"
|
|
cfg.Enabled = true
|
|
} else {
|
|
cfg.CrowdSecMode = "disabled"
|
|
cfg.Enabled = false
|
|
}
|
|
if err := tx.Save(&cfg).Error; err != nil {
|
|
tx.Rollback()
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: failed to update SecurityConfig")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist configuration"})
|
|
return
|
|
}
|
|
}
|
|
|
|
// Commit the transaction before starting/stopping process
|
|
if err := tx.Commit().Error; err != nil {
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: transaction commit failed")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to commit changes"})
|
|
return
|
|
}
|
|
|
|
logger.Log().WithFields(map[string]interface{}{
|
|
"enabled": cfg.Enabled,
|
|
"crowdsec_mode": cfg.CrowdSecMode,
|
|
}).Info("CrowdSec toggle: synchronized Settings and SecurityConfig successfully")
|
|
|
|
// STEP 3: Start or stop the process
|
|
ctx := c.Request.Context()
|
|
if payload.Enabled {
|
|
// Start CrowdSec
|
|
pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir)
|
|
if err != nil {
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: failed to start process, reverting DB changes")
|
|
|
|
// Revert both tables (in new transaction)
|
|
revertTx := h.DB.Begin()
|
|
cfg.CrowdSecMode = "disabled"
|
|
cfg.Enabled = false
|
|
revertTx.Save(&cfg)
|
|
settingModel.Value = "false"
|
|
revertTx.Save(&settingModel)
|
|
revertTx.Commit()
|
|
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
// Wait for LAPI readiness
|
|
lapiReady := false
|
|
maxWait := 30 * time.Second
|
|
pollInterval := 500 * time.Millisecond
|
|
deadline := time.Now().Add(maxWait)
|
|
|
|
for time.Now().Before(deadline) {
|
|
args := []string{"lapi", "status"}
|
|
if _, err := os.Stat(filepath.Join(h.DataDir, "config.yaml")); err == nil {
|
|
args = append([]string{"-c", filepath.Join(h.DataDir, "config.yaml")}, args...)
|
|
}
|
|
|
|
checkCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
|
|
_, err := h.CmdExec.Execute(checkCtx, "cscli", args...)
|
|
cancel()
|
|
|
|
if err == nil {
|
|
lapiReady = true
|
|
break
|
|
}
|
|
|
|
time.Sleep(pollInterval)
|
|
}
|
|
|
|
logger.Log().WithFields(map[string]interface{}{
|
|
"pid": pid,
|
|
"lapi_ready": lapiReady,
|
|
}).Info("CrowdSec toggle: started successfully")
|
|
|
|
c.JSON(http.StatusOK, gin.H{
|
|
"enabled": true,
|
|
"pid": pid,
|
|
"lapi_ready": lapiReady,
|
|
})
|
|
return
|
|
} else {
|
|
// Stop CrowdSec
|
|
if err := h.Executor.Stop(ctx, h.DataDir); err != nil {
|
|
logger.Log().WithError(err).Error("CrowdSec toggle: failed to stop process")
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
|
|
logger.Log().Info("CrowdSec toggle: stopped successfully")
|
|
c.JSON(http.StatusOK, gin.H{"enabled": false})
|
|
return
|
|
}
|
|
}
|
|
```
|
|
|
|
**Register Route**:
|
|
```go
|
|
// In RegisterRoutes() method
|
|
rg.POST("/admin/crowdsec/toggle", h.ToggleCrowdSec)
|
|
```
|
|
|
|
**Frontend API Client** (`frontend/src/api/crowdsec.ts`):
|
|
```typescript
|
|
export async function toggleCrowdsec(enabled: boolean): Promise<{ enabled: boolean; pid?: number; lapi_ready?: boolean }> {
|
|
const response = await client.post('/admin/crowdsec/toggle', { enabled })
|
|
return response.data
|
|
}
|
|
```
|
|
|
|
**Frontend Toggle Update** (`frontend/src/pages/Security.tsx`):
|
|
```tsx
|
|
const crowdsecPowerMutation = useMutation({
|
|
mutationFn: async (enabled: boolean) => {
|
|
if (enabled) {
|
|
toast.info('Starting CrowdSec... This may take up to 30 seconds')
|
|
}
|
|
|
|
// Use unified toggle endpoint (handles Settings + SecurityConfig + Process)
|
|
const result = await toggleCrowdsec(enabled)
|
|
|
|
// Backend already verified state, just do final status check
|
|
const status = await statusCrowdsec()
|
|
if (enabled && !status.running) {
|
|
throw new Error('CrowdSec process failed to start. Check server logs for details.')
|
|
}
|
|
if (!enabled && status.running) {
|
|
throw new Error('CrowdSec process still running. Check server logs for details.')
|
|
}
|
|
|
|
return result
|
|
},
|
|
// ... rest remains the same
|
|
})
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Plan
|
|
|
|
### Test 1: Fresh Install
|
|
|
|
**Scenario**: Brand new Charon installation
|
|
|
|
1. Start container: `docker compose up -d`
|
|
2. Navigate to Security page
|
|
3. Verify CrowdSec toggle shows OFF
|
|
4. Check status: `curl http://localhost:8080/api/v1/admin/crowdsec/status`
|
|
- Expected: `{"running": false}`
|
|
5. Check logs: `docker logs charon 2>&1 | grep "reconciliation"`
|
|
- Expected: "no SecurityConfig found, checking Settings table"
|
|
- Expected: "default SecurityConfig created from Settings preference"
|
|
- Expected: "crowdsec_mode: disabled"
|
|
|
|
### Test 2: Toggle ON → Container Restart
|
|
|
|
**Scenario**: User enables CrowdSec, then restarts container
|
|
|
|
1. Enable toggle in UI (click ON)
|
|
2. Verify CrowdSec starts
|
|
3. Check status: `{"running": true, "pid": xxx}`
|
|
4. Restart: `docker restart charon`
|
|
5. Wait 10 seconds
|
|
6. Check status again: `{"running": true, "pid": xxx}` (NEW PID)
|
|
7. Check logs:
|
|
- Expected: "starting based on SecurityConfig mode='local'"
|
|
|
|
### Test 3: Legacy Migration (Settings Table Only)
|
|
|
|
**Scenario**: Existing install with Settings table but no SecurityConfig
|
|
|
|
1. Manually set: `INSERT INTO settings (key, value, type, category) VALUES ('security.crowdsec.enabled', 'true', 'bool', 'security');`
|
|
2. Delete SecurityConfig: `DELETE FROM security_configs;`
|
|
3. Restart container
|
|
4. Check logs:
|
|
- Expected: "found existing Settings table preference"
|
|
- Expected: "default SecurityConfig created from Settings preference"
|
|
- Expected: "crowdsec_mode: local"
|
|
5. Check status: `{"running": true}`
|
|
|
|
### Test 4: Toggle OFF → Container Restart
|
|
|
|
**Scenario**: User disables CrowdSec, then restarts container
|
|
|
|
1. Start with CrowdSec enabled and running
|
|
2. Click toggle OFF in UI
|
|
3. Verify process stops
|
|
4. Restart: `docker restart charon`
|
|
5. Wait 10 seconds
|
|
6. Check status: `{"running": false}`
|
|
7. Verify toggle still shows OFF
|
|
|
|
### Test 5: Corrupted SecurityConfig Recovery
|
|
|
|
**Scenario**: SecurityConfig gets deleted but Settings exists
|
|
|
|
1. Enable CrowdSec via UI
|
|
2. Manually delete SecurityConfig: `DELETE FROM security_configs;`
|
|
3. Restart container
|
|
4. Verify auto-init recreates SecurityConfig matching Settings table
|
|
5. Verify CrowdSec auto-starts
|
|
|
|
---
|
|
|
|
## Verification Checklist
|
|
|
|
### Phase 1 (Auto-Initialization Fix)
|
|
|
|
- [ ] Modified `crowdsec_startup.go` lines 46-71
|
|
- [ ] Auto-init checks Settings table for existing preference
|
|
- [ ] Auto-init creates SecurityConfig matching Settings state
|
|
- [ ] Auto-init does NOT return early (continues to line 74+)
|
|
- [ ] Test 1 (Fresh Install) passes
|
|
- [ ] Test 3 (Legacy Migration) passes
|
|
|
|
### Phase 2 (Logging Enhancement)
|
|
|
|
- [ ] Modified `crowdsec_startup.go` lines 91-98
|
|
- [ ] Changed log level from Debug to Info
|
|
- [ ] Added source attribution logging
|
|
- [ ] Test 2 (Toggle ON → Restart) shows correct log
|
|
- [ ] Test 4 (Toggle OFF → Restart) shows correct log
|
|
|
|
### Phase 3 (Unified Toggle - Optional)
|
|
|
|
- [ ] Added `ToggleCrowdSec()` method to `crowdsec_handler.go`
|
|
- [ ] Registered `/admin/crowdsec/toggle` route
|
|
- [ ] Added `toggleCrowdsec()` to `crowdsec.ts`
|
|
- [ ] Updated `crowdsecPowerMutation` in `Security.tsx`
|
|
- [ ] Test 4 (Toggle synchronization) passes
|
|
- [ ] Test 5 (Corrupted recovery) passes
|
|
|
|
### Pre-Deployment
|
|
|
|
- [ ] Pre-commit linters pass: `pre-commit run --all-files`
|
|
- [ ] Backend tests pass: `cd backend && go test ./...`
|
|
- [ ] Frontend tests pass: `cd frontend && npm run test`
|
|
- [ ] Docker build succeeds: `docker build -t charon:local .`
|
|
- [ ] Integration test passes: `scripts/crowdsec_integration.sh`
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
✅ **Fix is complete when**:
|
|
|
|
1. Toggle shows correct state (ON = running, OFF = stopped)
|
|
2. Toggle persists across container restarts
|
|
3. Reconciliation logs clearly show decision reason
|
|
4. Auto-initialization respects Settings table preference
|
|
5. No "stuck toggle" scenarios
|
|
6. All 5 test cases pass
|
|
7. Pre-commit checks pass
|
|
8. No regressions in existing CrowdSec functionality
|
|
|
|
---
|
|
|
|
## Risk Assessment
|
|
|
|
| Change | Risk Level | Mitigation |
|
|
|--------|------------|------------|
|
|
| Phase 1 (Auto-init) | **Low** | Only affects fresh installs or corrupted state recovery |
|
|
| Phase 2 (Logging) | **Very Low** | Only changes log output, no logic changes |
|
|
| Phase 3 (Unified toggle) | **Medium** | New endpoint, requires thorough testing, but backward compatible |
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
If issues arise:
|
|
|
|
1. **Immediate Revert**: `git revert <commit-hash>` (no DB changes needed)
|
|
2. **Manual Fix** (if toggle stuck):
|
|
```sql
|
|
-- Reset SecurityConfig
|
|
UPDATE security_configs
|
|
SET crowdsec_mode = 'disabled', enabled = 0
|
|
WHERE uuid = 'default';
|
|
|
|
-- Reset Settings
|
|
UPDATE settings
|
|
SET value = 'false'
|
|
WHERE key = 'security.crowdsec.enabled';
|
|
```
|
|
3. **Force Stop CrowdSec**: `docker exec charon pkill -SIGTERM crowdsec`
|
|
|
|
---
|
|
|
|
## Dependency Impact Analysis
|
|
|
|
### Phase 1: Auto-Initialization Changes (crowdsec_startup.go)
|
|
|
|
#### Files Directly Modified
|
|
- `backend/internal/services/crowdsec_startup.go` (lines 46-71)
|
|
|
|
#### Dependencies and Required Updates
|
|
|
|
**1. Unit Tests - MUST BE UPDATED**
|
|
- **File**: `backend/internal/services/crowdsec_startup_test.go`
|
|
- **Impact**: Test `TestReconcileCrowdSecOnStartup_NoSecurityConfig` expects the function to skip/return early when no SecurityConfig exists
|
|
- **Required Change**: Update test to:
|
|
- Create a Settings table entry with `security.crowdsec.enabled = 'true'`
|
|
- Verify that SecurityConfig is auto-created with `crowdsec_mode = "local"`
|
|
- Verify that CrowdSec process is started (not skipped)
|
|
- **Additional Tests Needed**:
|
|
- `TestReconcileCrowdSecOnStartup_NoSecurityConfig_SettingsDisabled` - Settings='false' → creates config with mode="disabled", does NOT start
|
|
- `TestReconcileCrowdSecOnStartup_NoSecurityConfig_SettingsEnabled` - Settings='true' → creates config with mode="local", DOES start
|
|
- `TestReconcileCrowdSecOnStartup_NoSecurityConfig_NoSettingsEntry` - No Settings entry → creates config with mode="disabled", does NOT start
|
|
|
|
**2. Integration Tests - VERIFICATION NEEDED**
|
|
- **Files**:
|
|
- `scripts/crowdsec_integration.sh`
|
|
- `scripts/crowdsec_startup_test.sh`
|
|
- `scripts/crowdsec_decision_integration.sh`
|
|
- **Impact**: These scripts may assume specific startup behavior
|
|
- **Verification Required**:
|
|
- Do any scripts pre-populate Settings table?
|
|
- Do any scripts expect reconciliation to skip on fresh DB?
|
|
- Do any scripts verify log output from reconciliation?
|
|
- **Action**: Review scripts for assumptions about auto-initialization behavior
|
|
|
|
**3. Migration/Upgrade Path - DATABASE CONCERN**
|
|
- **Scenario**: Existing installations with Settings='true' but missing SecurityConfig
|
|
- **Impact**: After upgrade, reconciliation will auto-create SecurityConfig from Settings (POSITIVE)
|
|
- **Risk**: Low - this is the intended fix
|
|
- **Documentation**: Should document this as expected behavior in migration guide
|
|
|
|
**4. Models - NO CHANGES REQUIRED**
|
|
- **File**: `backend/internal/models/security_config.go`
|
|
- **Analysis**: SecurityConfig model structure unchanged
|
|
- **File**: `backend/internal/models/setting.go`
|
|
- **Analysis**: Setting model structure unchanged
|
|
|
|
**5. Route Registration - NO CHANGES REQUIRED**
|
|
- **File**: `backend/internal/api/routes/routes.go` (line 360)
|
|
- **Analysis**: Already calls `ReconcileCrowdSecOnStartup`, no signature changes
|
|
|
|
**6. Handler Dependencies - NO CHANGES REQUIRED**
|
|
- **File**: `backend/internal/api/handlers/crowdsec_handler.go`
|
|
- **Analysis**: Start/Stop handlers operate independently, no coupling to reconciliation logic
|
|
|
|
### Phase 2: Logging Enhancement Changes (crowdsec_startup.go)
|
|
|
|
#### Files Directly Modified
|
|
- `backend/internal/services/crowdsec_startup.go` (lines 91-98)
|
|
|
|
#### Dependencies and Required Updates
|
|
|
|
**1. Log Aggregation/Parsing - DOCUMENTATION UPDATE**
|
|
- **Concern**: Changing log level from Debug → Info increases log volume
|
|
- **Impact**:
|
|
- Logs will now appear in production (Info is default minimum level)
|
|
- Log aggregation tools may need filter updates if they parse specific messages
|
|
- **Required**: Update any log parsing scripts or documentation about expected log output
|
|
|
|
**2. Integration Tests - POTENTIAL GREP PATTERNS**
|
|
- **Files**: `scripts/crowdsec_*.sh`
|
|
- **Impact**: If scripts `grep` for specific log messages, they may need updates
|
|
- **Action**: Search for log message expectations in scripts
|
|
|
|
**3. Documentation - UPDATE REQUIRED**
|
|
- **File**: `docs/features.md`
|
|
- **Section**: CrowdSec Integration (line 167+)
|
|
- **Required Change**: Add note about reconciliation behavior:
|
|
```markdown
|
|
#### Startup Behavior
|
|
|
|
CrowdSec automatically starts on container restart if:
|
|
- SecurityConfig has `crowdsec_mode = "local"` OR
|
|
- Settings table has `security.crowdsec.enabled = "true"`
|
|
|
|
Check container logs for reconciliation decisions:
|
|
- "CrowdSec reconciliation: starting based on SecurityConfig mode='local'"
|
|
- "CrowdSec reconciliation: starting based on Settings table override"
|
|
- "CrowdSec reconciliation skipped: both SecurityConfig and Settings indicate disabled"
|
|
```
|
|
|
|
**4. Troubleshooting Guide - UPDATE RECOMMENDED**
|
|
- **File**: `docs/troubleshooting/` (if exists) or `docs/security.md`
|
|
- **Required Change**: Add section on "CrowdSec Not Starting After Restart"
|
|
- Explain reconciliation logic
|
|
- Show how to check Settings and SecurityConfig tables
|
|
- Show example log output
|
|
|
|
### Phase 3: Unified Toggle Endpoint (OPTIONAL)
|
|
|
|
#### Files Directly Modified
|
|
- `backend/internal/api/handlers/crowdsec_handler.go` (new method)
|
|
- `backend/internal/api/handlers/crowdsec_handler.go` (RegisterRoutes)
|
|
- `frontend/src/api/crowdsec.ts` (new function)
|
|
- `frontend/src/pages/Security.tsx` (mutation update)
|
|
|
|
#### Dependencies and Required Updates
|
|
|
|
**1. Handler Tests - NEW TESTS REQUIRED**
|
|
- **File**: `backend/internal/api/handlers/crowdsec_handler_test.go`
|
|
- **Required Tests**:
|
|
- `TestCrowdsecHandler_Toggle_EnableSuccess`
|
|
- `TestCrowdsecHandler_Toggle_DisableSuccess`
|
|
- `TestCrowdsecHandler_Toggle_TransactionRollback` (if Start fails)
|
|
- `TestCrowdsecHandler_Toggle_VerifyBothTablesUpdated`
|
|
|
|
**2. Existing Handlers - DEPRECATION CONSIDERATION**
|
|
- **Files**:
|
|
- Start handler (line ~167 in crowdsec_handler.go)
|
|
- Stop handler (line ~260 in crowdsec_handler.go)
|
|
- **Impact**: New toggle endpoint duplicates Start/Stop functionality
|
|
- **Decision Required**:
|
|
- **Option A**: Keep both for backward compatibility (RECOMMENDED)
|
|
- **Option B**: Deprecate Start/Stop, add deprecation warnings
|
|
- **Option C**: Remove Start/Stop entirely (BREAKING CHANGE - NOT RECOMMENDED)
|
|
- **Recommendation**: Keep Start/Stop handlers unchanged, document toggle as "preferred method"
|
|
|
|
**3. Frontend API Layer - MIGRATION PATH**
|
|
- **File**: `frontend/src/api/crowdsec.ts`
|
|
- **Current Exports**: `startCrowdsec`, `stopCrowdsec`, `statusCrowdsec`
|
|
- **After Change**: Add `toggleCrowdsec` to exports (line 75)
|
|
- **Backward Compatibility**: Keep existing functions, don't remove them
|
|
|
|
**4. Frontend Component - LIMITED SCOPE**
|
|
- **File**: `frontend/src/pages/Security.tsx`
|
|
- **Impact**: Only `crowdsecPowerMutation` needs updating (lines 86-125)
|
|
- **Other Components**: No other components import these functions (verified)
|
|
- **Risk**: Low - isolated change
|
|
|
|
**5. API Documentation - NEW ENDPOINT**
|
|
- **File**: `docs/api.md` (if exists)
|
|
- **Required Addition**: Document `/admin/crowdsec/toggle` endpoint
|
|
|
|
**6. Integration Tests - NEW TEST CASE**
|
|
- **Files**: `scripts/crowdsec_integration.sh`
|
|
- **Required Addition**: Test toggle endpoint directly
|
|
|
|
**7. Backward Compatibility - ANALYSIS**
|
|
- **Frontend**: Existing `/admin/crowdsec/start` and `/admin/crowdsec/stop` endpoints remain functional
|
|
- **API Consumers**: External tools using Start/Stop continue to work
|
|
- **Risk**: None - purely additive change
|
|
|
|
### Cross-Cutting Concerns
|
|
|
|
#### Database Migration
|
|
- **No schema changes required** - both Settings and SecurityConfig tables already exist
|
|
- **Data migration**: None needed - changes are behavioral only
|
|
|
|
#### Configuration Files
|
|
- **No changes required** - no new environment variables or config files
|
|
|
|
#### Docker/Deployment
|
|
- **No Dockerfile changes** - all changes are code-level
|
|
- **No docker-compose changes** - no new services or volumes
|
|
|
|
#### Security Implications
|
|
- **Phase 1**: Improves security by respecting user's intent across restarts
|
|
- **Phase 2**: No security impact (logging only)
|
|
- **Phase 3**: Transaction safety prevents partial updates (improvement)
|
|
|
|
#### Performance Considerations
|
|
- **Phase 1**: Adds one SQL query during auto-initialization (one-time, on startup)
|
|
- **Phase 2**: Minimal - only adds log statements
|
|
- **Phase 3**: Minimal - wraps existing logic in transaction
|
|
|
|
#### Rollback Safety
|
|
- **All phases**: No database schema changes, can be rolled back via git revert
|
|
- **Data safety**: No data loss risk - only affects process startup behavior
|
|
|
|
### Summary of Required File Updates
|
|
|
|
| Phase | Files to Modify | Files to Create | Tests to Add | Docs to Update |
|
|
|-------|----------------|-----------------|--------------|----------------|
|
|
| **Phase 1** | `crowdsec_startup.go` | None | 3 new unit tests | None (covered in Phase 2) |
|
|
| **Phase 2** | `crowdsec_startup.go` | None | None | `features.md`, troubleshooting docs |
|
|
| **Phase 3** | `crowdsec_handler.go`, `crowdsec.ts`, `Security.tsx` | None | 4 new handler tests | `api.md` (if exists) |
|
|
|
|
### Testing Matrix
|
|
|
|
| Scenario | Phase 1 | Phase 2 | Phase 3 |
|
|
|----------|---------|---------|---------|
|
|
| Fresh install → toggle ON → restart | ✅ Fixes | ✅ Better logs | ✅ Cleaner code |
|
|
| Existing install with Settings='true', missing SecurityConfig | ✅ Fixes | ✅ Better logs | N/A |
|
|
| Toggle ON → restart → verify logs | ✅ Works | ✅ MUST verify new messages | ✅ Works |
|
|
| Toggle OFF → restart → verify logs | ✅ Works | ✅ MUST verify new messages | ✅ Works |
|
|
| Start/Stop handlers (backward compat) | N/A | N/A | ✅ MUST verify still work |
|
|
|
|
### Missing from Original Plan
|
|
|
|
The original plan DID NOT explicitly mention:
|
|
|
|
1. **Unit test updates required** - Critical for Phase 1 (`TestReconcileCrowdSecOnStartup_NoSecurityConfig` needs major refactoring)
|
|
2. **Integration script verification** - May break if they expect specific behavior
|
|
3. **Documentation updates** - Features and troubleshooting guides need new reconciliation behavior documented
|
|
4. **Backward compatibility analysis for Phase 3** - Need explicit decision on Start/Stop handler fate
|
|
5. **API documentation** - New endpoint needs docs
|
|
6. **Testing matrix for all three phases together** - Need to verify they work in combination
|
|
|
|
---
|
|
|
|
**END OF SPECIFICATION**
|