37 KiB
CrowdSec Toggle Integration Fix Plan
Date: December 15, 2025 Issue: CrowdSec toggle stuck ON, reconciliation silently exits, process not starting Root Cause: Database disconnect between frontend (Settings table) and reconciliation (SecurityConfig table)
Executive Summary
The CrowdSec toggle shows "ON" but the process is NOT running. The reconciliation function silently exits without starting CrowdSec because:
- Frontend writes to Settings table (
security.crowdsec.enabled) - Backend reconciliation reads from SecurityConfig table (
crowdsec_mode = "local") - No synchronization between the two tables
- Auto-initialization code EXISTS (lines 46-71 in crowdsec_startup.go) but creates config with
crowdsec_mode = "disabled" - Reconciliation sees "disabled" and exits silently with no logs
Root Cause Analysis (DETAILED)
Evidence Trail
Container Logs Show Silent Exit:
{"bin_path":"crowdsec","data_dir":"/app/data/crowdsec","level":"info","msg":"CrowdSec reconciliation: starting startup check","time":"2025-12-14T23:32:33-05:00"}
[NO FURTHER LOGS - Function exited here]
Database State on Fresh Start:
SELECT * FROM security_configs → record not found
{"level":"info","msg":"CrowdSec reconciliation: no SecurityConfig found, creating default config"}
Process Check:
$ docker exec charon ps aux | grep -i crowdsec
[NO RESULTS - Process not running]
Why Reconciliation Exits Silently
FILE: backend/internal/services/crowdsec_startup.go
Execution Flow:
1. User clicks toggle ON in Security.tsx
2. Frontend calls updateSetting('security.crowdsec.enabled', 'true')
3. Settings table updated → security.crowdsec.enabled = "true"
4. Frontend calls startCrowdsec() → Handler updates SecurityConfig
5. CrowdSec starts successfully, toggle shows ON
6. Container restarts (docker restart or reboot)
7. ReconcileCrowdSecOnStartup() executes at line 26:
Line 44: db.First(&cfg) → returns gorm.ErrRecordNotFound
Lines 46-71: Auto-initialization block executes:
- Creates SecurityConfig with crowdsec_mode = "disabled"
- Logs "default SecurityConfig created successfully"
- Returns early (line 70) WITHOUT checking Settings table
- CrowdSec is NEVER started
Result: Toggle shows "ON" (Settings table), but process is "OFF" (not running)
THE BUG (Lines 46-71):
if err == gorm.ErrRecordNotFound {
// AUTO-INITIALIZE: Create default SecurityConfig on first startup
logger.Log().Info("CrowdSec reconciliation: no SecurityConfig found, creating default config")
defaultCfg := models.SecurityConfig{
UUID: "default",
Name: "Default Security Config",
Enabled: false,
CrowdSecMode: "disabled", // ← PROBLEM: Ignores Settings table state
WAFMode: "disabled",
WAFParanoiaLevel: 1,
RateLimitMode: "disabled",
RateLimitBurst: 10,
RateLimitRequests: 100,
RateLimitWindowSec: 60,
}
if err := db.Create(&defaultCfg).Error; err != nil {
logger.Log().WithError(err).Error("CrowdSec reconciliation: failed to create default SecurityConfig")
return
}
logger.Log().Info("CrowdSec reconciliation: default SecurityConfig created successfully")
// Don't start CrowdSec on fresh install - user must enable via UI
return // ← EXITS WITHOUT checking Settings table or starting process
}
Why This Causes the Issue:
-
First Container Start: User enables CrowdSec via toggle
- Settings:
security.crowdsec.enabled = "true"✅ - SecurityConfig:
crowdsec_mode = "local"✅ (via Start handler) - Process: Running ✅
- Settings:
-
Container Restart: Database persists but SecurityConfig table may be empty (migration issue or corruption)
- Reconciliation runs
- SecurityConfig table: EMPTY (record lost or never migrated)
- Auto-init creates SecurityConfig with
crowdsec_mode = "disabled" - Returns early without checking Settings table
- Settings: Still shows
"true"(UI says ON) - SecurityConfig: Says
"disabled"(reconciliation source) - Process: NOT started ❌
-
Result: State Mismatch
- Frontend toggle: ON (reads Settings table)
- Backend reconciliation: OFF (reads SecurityConfig table)
- Process: NOT RUNNING (reconciliation didn't start it)
Current Code Analysis
1. Reconciliation Function (crowdsec_startup.go)
Location: backend/internal/services/crowdsec_startup.go
Lines 44-71 (Auto-initialization - THE BUG):
var cfg models.SecurityConfig
if err := db.First(&cfg).Error; err != nil {
if err == gorm.ErrRecordNotFound {
// AUTO-INITIALIZE: Create default SecurityConfig on first startup
logger.Log().Info("CrowdSec reconciliation: no SecurityConfig found, creating default config")
defaultCfg := models.SecurityConfig{
UUID: "default",
Name: "Default Security Config",
Enabled: false,
CrowdSecMode: "disabled", // ← IGNORES Settings table
WAFMode: "disabled",
WAFParanoiaLevel: 1,
RateLimitMode: "disabled",
RateLimitBurst: 10,
RateLimitRequests: 100,
RateLimitWindowSec: 60,
}
if err := db.Create(&defaultCfg).Error; err != nil {
logger.Log().WithError(err).Error("CrowdSec reconciliation: failed to create default SecurityConfig")
return
}
logger.Log().Info("CrowdSec reconciliation: default SecurityConfig created successfully")
// Don't start CrowdSec on fresh install - user must enable via UI
return // ← EARLY EXIT - Never checks Settings table
}
logger.Log().WithError(err).Warn("CrowdSec reconciliation: failed to read SecurityConfig")
return
}
Lines 74-90 (Runtime Setting Override - UNREACHABLE after auto-init):
// Also check for runtime setting override in settings table
var settingOverride struct{ Value string }
crowdSecEnabled := false
if err := db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&settingOverride).Error; err == nil && settingOverride.Value != "" {
crowdSecEnabled = strings.EqualFold(settingOverride.Value, "true")
logger.Log().WithFields(map[string]interface{}{
"setting_value": settingOverride.Value,
"crowdsec_enabled": crowdSecEnabled,
}).Debug("CrowdSec reconciliation: found runtime setting override")
}
This code is NEVER REACHED when SecurityConfig doesn't exist because line 70 returns early!
Lines 91-98 (Decision Logic):
// Only auto-start if CrowdSecMode is "local" OR runtime setting is enabled
if cfg.CrowdSecMode != "local" && !crowdSecEnabled {
logger.Log().WithFields(map[string]interface{}{
"db_mode": cfg.CrowdSecMode,
"setting_enabled": crowdSecEnabled,
}).Debug("CrowdSec reconciliation skipped: mode is not 'local' and setting not enabled")
return
}
Also UNREACHABLE during auto-init scenario!
2. Start Handler (crowdsec_handler.go)
Location: backend/internal/api/handlers/crowdsec_handler.go
Lines 167-192 - CORRECT IMPLEMENTATION:
func (h *CrowdsecHandler) Start(c *gin.Context) {
ctx := c.Request.Context()
// UPDATE SecurityConfig to persist user's intent
var cfg models.SecurityConfig
if err := h.DB.First(&cfg).Error; err != nil {
if err == gorm.ErrRecordNotFound {
// Create default config with CrowdSec enabled
cfg = models.SecurityConfig{
UUID: "default",
Name: "Default Security Config",
Enabled: true,
CrowdSecMode: "local", // ← CORRECT: Sets mode to "local"
}
if err := h.DB.Create(&cfg).Error; err != nil {
logger.Log().WithError(err).Error("Failed to create SecurityConfig")
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist configuration"})
return
}
} else {
logger.Log().WithError(err).Error("Failed to read SecurityConfig")
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to read configuration"})
return
}
} else {
// Update existing config
cfg.CrowdSecMode = "local"
cfg.Enabled = true
if err := h.DB.Save(&cfg).Error; err != nil {
logger.Log().WithError(err).Error("Failed to update SecurityConfig")
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to persist configuration"})
return
}
}
// Start the process...
}
Analysis: This is CORRECT. The Start handler properly updates SecurityConfig when user clicks "Start" from the CrowdSec config page (/security/crowdsec).
3. Frontend Toggle (Security.tsx)
Location: frontend/src/pages/Security.tsx
Lines 64-120 - THE DISCONNECT:
const crowdsecPowerMutation = useMutation({
mutationFn: async (enabled: boolean) => {
// Step 1: Update Settings table
await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool')
if (enabled) {
// Step 2: Call Start() which updates SecurityConfig
const result = await startCrowdsec()
// Step 3: Verify running
const status = await statusCrowdsec()
if (!status.running) {
await updateSetting('security.crowdsec.enabled', 'false', 'security', 'bool')
throw new Error('CrowdSec process failed to start')
}
return result
} else {
// Step 2: Call Stop() which DOES NOT update SecurityConfig!
await stopCrowdsec()
// Step 3: Verify stopped
await new Promise(resolve => setTimeout(resolve, 500))
const status = await statusCrowdsec()
if (status.running) {
throw new Error('CrowdSec process still running')
}
return { enabled: false }
}
},
})
Analysis:
- Enable Path: Updates Settings → Calls Start() → Start() updates SecurityConfig → ✅ Both tables synced
- Disable Path: Updates Settings → Calls Stop() → Stop() does NOT always update SecurityConfig → ❌ Tables out of sync
Looking at the Stop handler:
func (h *CrowdsecHandler) Stop(c *gin.Context) {
ctx := c.Request.Context()
if err := h.Executor.Stop(ctx, h.DataDir); err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
// UPDATE SecurityConfig to persist user's intent
var cfg models.SecurityConfig
if err := h.DB.First(&cfg).Error; err == nil {
cfg.CrowdSecMode = "disabled"
cfg.Enabled = false
if err := h.DB.Save(&cfg).Error; err != nil {
logger.Log().WithError(err).Warn("Failed to update SecurityConfig after stopping CrowdSec")
}
}
c.JSON(http.StatusOK, gin.H{"status": "stopped"})
}
This IS CORRECT - Stop() handler updates SecurityConfig when it can find it. BUT:
Scenario Where It Fails:
- SecurityConfig table gets corrupted/cleared/migrated incorrectly
- User clicks toggle OFF
- Stop() tries to update SecurityConfig → record not found → skips update
- Settings table still updated to "false"
- Container restarts → auto-init creates SecurityConfig with "disabled"
- Both tables say "disabled" but UI might show stale state
Comprehensive Fix Strategy
Phase 1: Fix Auto-Initialization (CRITICAL - IMMEDIATE)
FILE: backend/internal/services/crowdsec_startup.go
CHANGE: Lines 46-71 (auto-initialization block)
AFTER (with Settings table check):
if err == gorm.ErrRecordNotFound {
// AUTO-INITIALIZE: Create default SecurityConfig by checking Settings table
logger.Log().Info("CrowdSec reconciliation: no SecurityConfig found, checking Settings table for user preference")
// Check if user has already enabled CrowdSec via Settings table (from toggle or legacy config)
var settingOverride struct{ Value string }
crowdSecEnabledInSettings := false
if err := db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.crowdsec.enabled").Scan(&settingOverride).Error; err == nil && settingOverride.Value != "" {
crowdSecEnabledInSettings = strings.EqualFold(settingOverride.Value, "true")
logger.Log().WithFields(map[string]interface{}{
"setting_value": settingOverride.Value,
"enabled": crowdSecEnabledInSettings,
}).Info("CrowdSec reconciliation: found existing Settings table preference")
}
// Create SecurityConfig that matches Settings table state
crowdSecMode := "disabled"
if crowdSecEnabledInSettings {
crowdSecMode = "local"
}
defaultCfg := models.SecurityConfig{
UUID: "default",
Name: "Default Security Config",
Enabled: crowdSecEnabledInSettings,
CrowdSecMode: crowdSecMode, // ← NOW RESPECTS Settings table
WAFMode: "disabled",
WAFParanoiaLevel: 1,
RateLimitMode: "disabled",
RateLimitBurst: 10,
RateLimitRequests: 100,
RateLimitWindowSec: 60,
}
if err := db.Create(&defaultCfg).Error; err != nil {
logger.Log().WithError(err).Error("CrowdSec reconciliation: failed to create default SecurityConfig")
return
}
logger.Log().WithFields(map[string]interface{}{
"crowdsec_mode": defaultCfg.CrowdSecMode,
"enabled": defaultCfg.Enabled,
"source": "settings_table",
}).Info("CrowdSec reconciliation: default SecurityConfig created from Settings preference")
// Continue to process the config (DON'T return early)
cfg = defaultCfg
}
KEY CHANGES:
- Check Settings table during auto-initialization
- Create SecurityConfig matching Settings state (not hardcoded "disabled")
- Don't return early - let the rest of the function process the config
- Assign to cfg variable so flow continues to line 74+
Phase 2: Enhance Logging (IMMEDIATE)
FILE: backend/internal/services/crowdsec_startup.go
CHANGE: Lines 91-98 (decision logic - better logging)
AFTER:
// Start when EITHER SecurityConfig has mode="local" OR Settings table has enabled=true
// Exit only when BOTH are disabled
if cfg.CrowdSecMode != "local" && !crowdSecEnabled {
logger.Log().WithFields(map[string]interface{}{
"db_mode": cfg.CrowdSecMode,
"setting_enabled": crowdSecEnabled,
}).Info("CrowdSec reconciliation skipped: both SecurityConfig and Settings indicate disabled")
return
}
// Log which source triggered the start
if cfg.CrowdSecMode == "local" {
logger.Log().WithField("mode", cfg.CrowdSecMode).Info("CrowdSec reconciliation: starting based on SecurityConfig mode='local'")
} else if crowdSecEnabled {
logger.Log().WithField("setting", "true").Info("CrowdSec reconciliation: starting based on Settings table override")
}
KEY CHANGES:
- Change log level from Debug to Info (so we see it in logs)
- Add source attribution (which table triggered the start)
- Clarify condition (exit only when BOTH are disabled)
Phase 3: Add Unified Toggle Endpoint (OPTIONAL BUT RECOMMENDED)
WHY: Currently the toggle updates Settings, then calls Start/Stop which updates SecurityConfig. This creates potential race conditions. A unified endpoint is safer.
FILE: backend/internal/api/handlers/crowdsec_handler.go
ADD: New method (after Stop(), around line 260)
// ToggleCrowdSec enables or disables CrowdSec, synchronizing Settings and SecurityConfig atomically
func (h *CrowdsecHandler) ToggleCrowdSec(c *gin.Context) {
var payload struct {
Enabled bool `json:"enabled"`
}
if err := c.ShouldBindJSON(&payload); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid payload"})
return
}
logger.Log().WithField("enabled", payload.Enabled).Info("CrowdSec toggle: received request")
// Use a transaction to ensure Settings and SecurityConfig stay in sync
tx := h.DB.Begin()
defer func() {
if r := recover(); r != nil {
tx.Rollback()
}
}()
// STEP 1: Update Settings table
settingKey := "security.crowdsec.enabled"
settingValue := "false"
if payload.Enabled {
settingValue = "true"
}
var settingModel models.Setting
if err := tx.Where("key = ?", settingKey).FirstOrCreate(&settingModel, models.Setting{
Key: settingKey,
Value: settingValue,
Type: "bool",
Category: "security",
}).Error; err != nil {
tx.Rollback()
logger.Log().WithError(err).Error("CrowdSec toggle: failed to update Settings table")
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to update settings"})
return
}
settingModel.Value = settingValue
if err := tx.Save(&settingModel).Error; err != nil {
tx.Rollback()
logger.Log().WithError(err).Error("CrowdSec toggle: failed to save Settings table")
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to update settings"})
return
}
// STEP 2: Update SecurityConfig table
var cfg models.SecurityConfig
if err := tx.First(&cfg).Error; err != nil {
if err == gorm.ErrRecordNotFound {
// Create config matching toggle state
crowdSecMode := "disabled"
if payload.Enabled {
crowdSecMode = "local"
}
cfg = models.SecurityConfig{
UUID: "default",
Name: "Default Security Config",
Enabled: payload.Enabled,
CrowdSecMode: crowdSecMode,
WAFMode: "disabled",
WAFParanoiaLevel: 1,
RateLimitMode: "disabled",
RateLimitBurst: 10,
RateLimitRequests: 100,
RateLimitWindowSec: 60,
}
if err := tx.Create(&cfg).Error; err != nil {
tx.Rollback()
logger.Log().WithError(err).Error("CrowdSec toggle: failed to create SecurityConfig")
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist configuration"})
return
}
} else {
tx.Rollback()
logger.Log().WithError(err).Error("CrowdSec toggle: failed to read SecurityConfig")
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to read configuration"})
return
}
} else {
// Update existing config
if payload.Enabled {
cfg.CrowdSecMode = "local"
cfg.Enabled = true
} else {
cfg.CrowdSecMode = "disabled"
cfg.Enabled = false
}
if err := tx.Save(&cfg).Error; err != nil {
tx.Rollback()
logger.Log().WithError(err).Error("CrowdSec toggle: failed to update SecurityConfig")
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to persist configuration"})
return
}
}
// Commit the transaction before starting/stopping process
if err := tx.Commit().Error; err != nil {
logger.Log().WithError(err).Error("CrowdSec toggle: transaction commit failed")
c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to commit changes"})
return
}
logger.Log().WithFields(map[string]interface{}{
"enabled": cfg.Enabled,
"crowdsec_mode": cfg.CrowdSecMode,
}).Info("CrowdSec toggle: synchronized Settings and SecurityConfig successfully")
// STEP 3: Start or stop the process
ctx := c.Request.Context()
if payload.Enabled {
// Start CrowdSec
pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir)
if err != nil {
logger.Log().WithError(err).Error("CrowdSec toggle: failed to start process, reverting DB changes")
// Revert both tables (in new transaction)
revertTx := h.DB.Begin()
cfg.CrowdSecMode = "disabled"
cfg.Enabled = false
revertTx.Save(&cfg)
settingModel.Value = "false"
revertTx.Save(&settingModel)
revertTx.Commit()
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
// Wait for LAPI readiness
lapiReady := false
maxWait := 30 * time.Second
pollInterval := 500 * time.Millisecond
deadline := time.Now().Add(maxWait)
for time.Now().Before(deadline) {
args := []string{"lapi", "status"}
if _, err := os.Stat(filepath.Join(h.DataDir, "config.yaml")); err == nil {
args = append([]string{"-c", filepath.Join(h.DataDir, "config.yaml")}, args...)
}
checkCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
_, err := h.CmdExec.Execute(checkCtx, "cscli", args...)
cancel()
if err == nil {
lapiReady = true
break
}
time.Sleep(pollInterval)
}
logger.Log().WithFields(map[string]interface{}{
"pid": pid,
"lapi_ready": lapiReady,
}).Info("CrowdSec toggle: started successfully")
c.JSON(http.StatusOK, gin.H{
"enabled": true,
"pid": pid,
"lapi_ready": lapiReady,
})
return
} else {
// Stop CrowdSec
if err := h.Executor.Stop(ctx, h.DataDir); err != nil {
logger.Log().WithError(err).Error("CrowdSec toggle: failed to stop process")
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
logger.Log().Info("CrowdSec toggle: stopped successfully")
c.JSON(http.StatusOK, gin.H{"enabled": false})
return
}
}
Register Route:
// In RegisterRoutes() method
rg.POST("/admin/crowdsec/toggle", h.ToggleCrowdSec)
Frontend API Client (frontend/src/api/crowdsec.ts):
export async function toggleCrowdsec(enabled: boolean): Promise<{ enabled: boolean; pid?: number; lapi_ready?: boolean }> {
const response = await client.post('/admin/crowdsec/toggle', { enabled })
return response.data
}
Frontend Toggle Update (frontend/src/pages/Security.tsx):
const crowdsecPowerMutation = useMutation({
mutationFn: async (enabled: boolean) => {
if (enabled) {
toast.info('Starting CrowdSec... This may take up to 30 seconds')
}
// Use unified toggle endpoint (handles Settings + SecurityConfig + Process)
const result = await toggleCrowdsec(enabled)
// Backend already verified state, just do final status check
const status = await statusCrowdsec()
if (enabled && !status.running) {
throw new Error('CrowdSec process failed to start. Check server logs for details.')
}
if (!enabled && status.running) {
throw new Error('CrowdSec process still running. Check server logs for details.')
}
return result
},
// ... rest remains the same
})
Testing Plan
Test 1: Fresh Install
Scenario: Brand new Charon installation
- Start container:
docker compose up -d - Navigate to Security page
- Verify CrowdSec toggle shows OFF
- Check status:
curl http://localhost:8080/api/v1/admin/crowdsec/status- Expected:
{"running": false}
- Expected:
- Check logs:
docker logs charon 2>&1 | grep "reconciliation"- Expected: "no SecurityConfig found, checking Settings table"
- Expected: "default SecurityConfig created from Settings preference"
- Expected: "crowdsec_mode: disabled"
Test 2: Toggle ON → Container Restart
Scenario: User enables CrowdSec, then restarts container
- Enable toggle in UI (click ON)
- Verify CrowdSec starts
- Check status:
{"running": true, "pid": xxx} - Restart:
docker restart charon - Wait 10 seconds
- Check status again:
{"running": true, "pid": xxx}(NEW PID) - Check logs:
- Expected: "starting based on SecurityConfig mode='local'"
Test 3: Legacy Migration (Settings Table Only)
Scenario: Existing install with Settings table but no SecurityConfig
- Manually set:
INSERT INTO settings (key, value, type, category) VALUES ('security.crowdsec.enabled', 'true', 'bool', 'security'); - Delete SecurityConfig:
DELETE FROM security_configs; - Restart container
- Check logs:
- Expected: "found existing Settings table preference"
- Expected: "default SecurityConfig created from Settings preference"
- Expected: "crowdsec_mode: local"
- Check status:
{"running": true}
Test 4: Toggle OFF → Container Restart
Scenario: User disables CrowdSec, then restarts container
- Start with CrowdSec enabled and running
- Click toggle OFF in UI
- Verify process stops
- Restart:
docker restart charon - Wait 10 seconds
- Check status:
{"running": false} - Verify toggle still shows OFF
Test 5: Corrupted SecurityConfig Recovery
Scenario: SecurityConfig gets deleted but Settings exists
- Enable CrowdSec via UI
- Manually delete SecurityConfig:
DELETE FROM security_configs; - Restart container
- Verify auto-init recreates SecurityConfig matching Settings table
- Verify CrowdSec auto-starts
Verification Checklist
Phase 1 (Auto-Initialization Fix)
- Modified
crowdsec_startup.golines 46-71 - Auto-init checks Settings table for existing preference
- Auto-init creates SecurityConfig matching Settings state
- Auto-init does NOT return early (continues to line 74+)
- Test 1 (Fresh Install) passes
- Test 3 (Legacy Migration) passes
Phase 2 (Logging Enhancement)
- Modified
crowdsec_startup.golines 91-98 - Changed log level from Debug to Info
- Added source attribution logging
- Test 2 (Toggle ON → Restart) shows correct log
- Test 4 (Toggle OFF → Restart) shows correct log
Phase 3 (Unified Toggle - Optional)
- Added
ToggleCrowdSec()method tocrowdsec_handler.go - Registered
/admin/crowdsec/toggleroute - Added
toggleCrowdsec()tocrowdsec.ts - Updated
crowdsecPowerMutationinSecurity.tsx - Test 4 (Toggle synchronization) passes
- Test 5 (Corrupted recovery) passes
Pre-Deployment
- Pre-commit linters pass:
pre-commit run --all-files - Backend tests pass:
cd backend && go test ./... - Frontend tests pass:
cd frontend && npm run test - Docker build succeeds:
docker build -t charon:local . - Integration test passes:
scripts/crowdsec_integration.sh
Success Criteria
✅ Fix is complete when:
- Toggle shows correct state (ON = running, OFF = stopped)
- Toggle persists across container restarts
- Reconciliation logs clearly show decision reason
- Auto-initialization respects Settings table preference
- No "stuck toggle" scenarios
- All 5 test cases pass
- Pre-commit checks pass
- No regressions in existing CrowdSec functionality
Risk Assessment
| Change | Risk Level | Mitigation |
|---|---|---|
| Phase 1 (Auto-init) | Low | Only affects fresh installs or corrupted state recovery |
| Phase 2 (Logging) | Very Low | Only changes log output, no logic changes |
| Phase 3 (Unified toggle) | Medium | New endpoint, requires thorough testing, but backward compatible |
Rollback Plan
If issues arise:
- Immediate Revert:
git revert <commit-hash>(no DB changes needed) - Manual Fix (if toggle stuck):
-- Reset SecurityConfig UPDATE security_configs SET crowdsec_mode = 'disabled', enabled = 0 WHERE uuid = 'default'; -- Reset Settings UPDATE settings SET value = 'false' WHERE key = 'security.crowdsec.enabled'; - Force Stop CrowdSec:
docker exec charon pkill -SIGTERM crowdsec
Dependency Impact Analysis
Phase 1: Auto-Initialization Changes (crowdsec_startup.go)
Files Directly Modified
backend/internal/services/crowdsec_startup.go(lines 46-71)
Dependencies and Required Updates
1. Unit Tests - MUST BE UPDATED
- File:
backend/internal/services/crowdsec_startup_test.go - Impact: Test
TestReconcileCrowdSecOnStartup_NoSecurityConfigexpects the function to skip/return early when no SecurityConfig exists - Required Change: Update test to:
- Create a Settings table entry with
security.crowdsec.enabled = 'true' - Verify that SecurityConfig is auto-created with
crowdsec_mode = "local" - Verify that CrowdSec process is started (not skipped)
- Create a Settings table entry with
- Additional Tests Needed:
TestReconcileCrowdSecOnStartup_NoSecurityConfig_SettingsDisabled- Settings='false' → creates config with mode="disabled", does NOT startTestReconcileCrowdSecOnStartup_NoSecurityConfig_SettingsEnabled- Settings='true' → creates config with mode="local", DOES startTestReconcileCrowdSecOnStartup_NoSecurityConfig_NoSettingsEntry- No Settings entry → creates config with mode="disabled", does NOT start
2. Integration Tests - VERIFICATION NEEDED
- Files:
scripts/crowdsec_integration.shscripts/crowdsec_startup_test.shscripts/crowdsec_decision_integration.sh
- Impact: These scripts may assume specific startup behavior
- Verification Required:
- Do any scripts pre-populate Settings table?
- Do any scripts expect reconciliation to skip on fresh DB?
- Do any scripts verify log output from reconciliation?
- Action: Review scripts for assumptions about auto-initialization behavior
3. Migration/Upgrade Path - DATABASE CONCERN
- Scenario: Existing installations with Settings='true' but missing SecurityConfig
- Impact: After upgrade, reconciliation will auto-create SecurityConfig from Settings (POSITIVE)
- Risk: Low - this is the intended fix
- Documentation: Should document this as expected behavior in migration guide
4. Models - NO CHANGES REQUIRED
- File:
backend/internal/models/security_config.go - Analysis: SecurityConfig model structure unchanged
- File:
backend/internal/models/setting.go - Analysis: Setting model structure unchanged
5. Route Registration - NO CHANGES REQUIRED
- File:
backend/internal/api/routes/routes.go(line 360) - Analysis: Already calls
ReconcileCrowdSecOnStartup, no signature changes
6. Handler Dependencies - NO CHANGES REQUIRED
- File:
backend/internal/api/handlers/crowdsec_handler.go - Analysis: Start/Stop handlers operate independently, no coupling to reconciliation logic
Phase 2: Logging Enhancement Changes (crowdsec_startup.go)
Files Directly Modified
backend/internal/services/crowdsec_startup.go(lines 91-98)
Dependencies and Required Updates
1. Log Aggregation/Parsing - DOCUMENTATION UPDATE
- Concern: Changing log level from Debug → Info increases log volume
- Impact:
- Logs will now appear in production (Info is default minimum level)
- Log aggregation tools may need filter updates if they parse specific messages
- Required: Update any log parsing scripts or documentation about expected log output
2. Integration Tests - POTENTIAL GREP PATTERNS
- Files:
scripts/crowdsec_*.sh - Impact: If scripts
grepfor specific log messages, they may need updates - Action: Search for log message expectations in scripts
3. Documentation - UPDATE REQUIRED
- File:
docs/features.md - Section: CrowdSec Integration (line 167+)
- Required Change: Add note about reconciliation behavior:
#### Startup Behavior CrowdSec automatically starts on container restart if: - SecurityConfig has `crowdsec_mode = "local"` OR - Settings table has `security.crowdsec.enabled = "true"` Check container logs for reconciliation decisions: - "CrowdSec reconciliation: starting based on SecurityConfig mode='local'" - "CrowdSec reconciliation: starting based on Settings table override" - "CrowdSec reconciliation skipped: both SecurityConfig and Settings indicate disabled"
4. Troubleshooting Guide - UPDATE RECOMMENDED
- File:
docs/troubleshooting/(if exists) ordocs/security.md - Required Change: Add section on "CrowdSec Not Starting After Restart"
- Explain reconciliation logic
- Show how to check Settings and SecurityConfig tables
- Show example log output
Phase 3: Unified Toggle Endpoint (OPTIONAL)
Files Directly Modified
backend/internal/api/handlers/crowdsec_handler.go(new method)backend/internal/api/handlers/crowdsec_handler.go(RegisterRoutes)frontend/src/api/crowdsec.ts(new function)frontend/src/pages/Security.tsx(mutation update)
Dependencies and Required Updates
1. Handler Tests - NEW TESTS REQUIRED
- File:
backend/internal/api/handlers/crowdsec_handler_test.go - Required Tests:
TestCrowdsecHandler_Toggle_EnableSuccessTestCrowdsecHandler_Toggle_DisableSuccessTestCrowdsecHandler_Toggle_TransactionRollback(if Start fails)TestCrowdsecHandler_Toggle_VerifyBothTablesUpdated
2. Existing Handlers - DEPRECATION CONSIDERATION
- Files:
- Start handler (line ~167 in crowdsec_handler.go)
- Stop handler (line ~260 in crowdsec_handler.go)
- Impact: New toggle endpoint duplicates Start/Stop functionality
- Decision Required:
- Option A: Keep both for backward compatibility (RECOMMENDED)
- Option B: Deprecate Start/Stop, add deprecation warnings
- Option C: Remove Start/Stop entirely (BREAKING CHANGE - NOT RECOMMENDED)
- Recommendation: Keep Start/Stop handlers unchanged, document toggle as "preferred method"
3. Frontend API Layer - MIGRATION PATH
- File:
frontend/src/api/crowdsec.ts - Current Exports:
startCrowdsec,stopCrowdsec,statusCrowdsec - After Change: Add
toggleCrowdsecto exports (line 75) - Backward Compatibility: Keep existing functions, don't remove them
4. Frontend Component - LIMITED SCOPE
- File:
frontend/src/pages/Security.tsx - Impact: Only
crowdsecPowerMutationneeds updating (lines 86-125) - Other Components: No other components import these functions (verified)
- Risk: Low - isolated change
5. API Documentation - NEW ENDPOINT
- File:
docs/api.md(if exists) - Required Addition: Document
/admin/crowdsec/toggleendpoint
6. Integration Tests - NEW TEST CASE
- Files:
scripts/crowdsec_integration.sh - Required Addition: Test toggle endpoint directly
7. Backward Compatibility - ANALYSIS
- Frontend: Existing
/admin/crowdsec/startand/admin/crowdsec/stopendpoints remain functional - API Consumers: External tools using Start/Stop continue to work
- Risk: None - purely additive change
Cross-Cutting Concerns
Database Migration
- No schema changes required - both Settings and SecurityConfig tables already exist
- Data migration: None needed - changes are behavioral only
Configuration Files
- No changes required - no new environment variables or config files
Docker/Deployment
- No Dockerfile changes - all changes are code-level
- No docker-compose changes - no new services or volumes
Security Implications
- Phase 1: Improves security by respecting user's intent across restarts
- Phase 2: No security impact (logging only)
- Phase 3: Transaction safety prevents partial updates (improvement)
Performance Considerations
- Phase 1: Adds one SQL query during auto-initialization (one-time, on startup)
- Phase 2: Minimal - only adds log statements
- Phase 3: Minimal - wraps existing logic in transaction
Rollback Safety
- All phases: No database schema changes, can be rolled back via git revert
- Data safety: No data loss risk - only affects process startup behavior
Summary of Required File Updates
| Phase | Files to Modify | Files to Create | Tests to Add | Docs to Update |
|---|---|---|---|---|
| Phase 1 | crowdsec_startup.go |
None | 3 new unit tests | None (covered in Phase 2) |
| Phase 2 | crowdsec_startup.go |
None | None | features.md, troubleshooting docs |
| Phase 3 | crowdsec_handler.go, crowdsec.ts, Security.tsx |
None | 4 new handler tests | api.md (if exists) |
Testing Matrix
| Scenario | Phase 1 | Phase 2 | Phase 3 |
|---|---|---|---|
| Fresh install → toggle ON → restart | ✅ Fixes | ✅ Better logs | ✅ Cleaner code |
| Existing install with Settings='true', missing SecurityConfig | ✅ Fixes | ✅ Better logs | N/A |
| Toggle ON → restart → verify logs | ✅ Works | ✅ MUST verify new messages | ✅ Works |
| Toggle OFF → restart → verify logs | ✅ Works | ✅ MUST verify new messages | ✅ Works |
| Start/Stop handlers (backward compat) | N/A | N/A | ✅ MUST verify still work |
Missing from Original Plan
The original plan DID NOT explicitly mention:
- Unit test updates required - Critical for Phase 1 (
TestReconcileCrowdSecOnStartup_NoSecurityConfigneeds major refactoring) - Integration script verification - May break if they expect specific behavior
- Documentation updates - Features and troubleshooting guides need new reconciliation behavior documented
- Backward compatibility analysis for Phase 3 - Need explicit decision on Start/Stop handler fate
- API documentation - New endpoint needs docs
- Testing matrix for all three phases together - Need to verify they work in combination
END OF SPECIFICATION