785 lines
26 KiB
Markdown
785 lines
26 KiB
Markdown
# CrowdSec Preset Apply Failure - Fix Plan
|
|
|
|
**Date:** December 12, 2025
|
|
**Status:** Analysis Complete - Ready for Implementation
|
|
**Severity:** High
|
|
|
|
---
|
|
|
|
## Issue Summary
|
|
|
|
User reported error when applying a CrowdSec preset:
|
|
|
|
```
|
|
Apply failed: read archive: open /app/data/crowdsec/hub_cache/crowdsecurity/caddy/bundle.tgz: no such file or directory. Backup created at /app/data/crowdsec.backup.20251211-194408
|
|
```
|
|
|
|
---
|
|
|
|
## Root Cause Analysis
|
|
|
|
### The Bug
|
|
|
|
The `Apply()` function in [hub_sync.go](../../backend/internal/crowdsec/hub_sync.go#L535) has a **fatal ordering bug** that destroys the cache before reading from it.
|
|
|
|
### Detailed Flow
|
|
|
|
1. **Pull Phase (Works Correctly)**
|
|
- User pulls preset `crowdsecurity/caddy`
|
|
- `HubCache.Store()` writes to: `/app/data/crowdsec/hub_cache/crowdsecurity/caddy/bundle.tgz`
|
|
- `CachedPreset.ArchivePath` stores this absolute path
|
|
|
|
2. **Apply Phase (Bug Occurs)**
|
|
```
|
|
Step 1: loadCacheMeta() → Returns meta.ArchivePath = "/app/data/crowdsec/hub_cache/.../bundle.tgz"
|
|
Step 2: backupExisting() → RENAMES "/app/data/crowdsec" to "/app/data/crowdsec.backup.TIMESTAMP"
|
|
⚠️ THIS MOVES THE CACHE TOO! hub_cache is INSIDE crowdsec/
|
|
Step 3: cscli fails (not available or preset not in hub)
|
|
Step 4: os.ReadFile(meta.ArchivePath) → FILE NOT FOUND!
|
|
The path still points to "/app/data/crowdsec/..." but that directory was renamed!
|
|
```
|
|
|
|
### Visual Representation
|
|
|
|
**Before Backup:**
|
|
```
|
|
/app/data/crowdsec/
|
|
├── hub_cache/
|
|
│ └── crowdsecurity/
|
|
│ └── caddy/
|
|
│ ├── bundle.tgz ← meta.ArchivePath points here
|
|
│ ├── preview.yaml
|
|
│ └── metadata.json
|
|
├── config.yaml
|
|
└── other_files/
|
|
```
|
|
|
|
**After `backupExisting()` (line 535):**
|
|
```
|
|
/app/data/crowdsec.backup.20251211-194408/ ← Renamed!
|
|
├── hub_cache/
|
|
│ └── crowdsecurity/
|
|
│ └── caddy/
|
|
│ ├── bundle.tgz ← File is now HERE
|
|
│ ├── preview.yaml
|
|
│ └── metadata.json
|
|
├── config.yaml
|
|
└── other_files/
|
|
|
|
/app/data/crowdsec/ ← Directory no longer exists!
|
|
```
|
|
|
|
**Result:** `os.ReadFile(meta.ArchivePath)` fails because the path `/app/data/crowdsec/hub_cache/.../bundle.tgz` no longer exists.
|
|
|
|
---
|
|
|
|
## Why This Wasn't Caught Earlier
|
|
|
|
1. **Tests use temp directories** - Each test creates fresh directories, so the race condition doesn't manifest
|
|
2. **cscli path succeeds in CI** - When `cscli` is available and works, the code returns early before hitting the bug
|
|
3. **Recent changes to backup logic** - The copy-based fallback and backup improvements may have introduced this ordering issue
|
|
4. **Cache directory nested inside DataDir** - The architecture decision to put `hub_cache` inside `DataDir` (crowdsec config) creates this coupling
|
|
|
|
---
|
|
|
|
## Fix Options
|
|
|
|
### Option A: Read Archive Before Backup (Recommended)
|
|
|
|
**Rationale:** Simple, minimal change, maintains existing backup behavior.
|
|
|
|
**File:** [backend/internal/crowdsec/hub_sync.go](../../backend/internal/crowdsec/hub_sync.go)
|
|
|
|
**Changes:**
|
|
|
|
```go
|
|
func (s *HubService) Apply(ctx context.Context, slug string) (ApplyResult, error) {
|
|
// ... existing validation code ...
|
|
|
|
result := ApplyResult{AppliedPreset: cleanSlug, Status: "failed"}
|
|
meta, metaErr := s.loadCacheMeta(applyCtx, cleanSlug)
|
|
if metaErr == nil {
|
|
result.CacheKey = meta.CacheKey
|
|
}
|
|
hasCS := s.hasCSCLI(applyCtx)
|
|
|
|
// === NEW: Read archive BEFORE backup ===
|
|
var archive []byte
|
|
var archiveErr error
|
|
if metaErr == nil {
|
|
archive, archiveErr = os.ReadFile(meta.ArchivePath)
|
|
if archiveErr != nil {
|
|
logger.Log().WithError(archiveErr).WithField("archive_path", meta.ArchivePath).Warn("failed to read cached archive before backup")
|
|
}
|
|
}
|
|
// === END NEW ===
|
|
|
|
backupPath := filepath.Clean(s.DataDir) + ".backup." + time.Now().Format("20060102-150405")
|
|
if err := s.backupExisting(backupPath); err != nil {
|
|
return result, fmt.Errorf("backup: %w", err)
|
|
}
|
|
result.BackupPath = backupPath
|
|
|
|
// Try cscli first
|
|
if hasCS {
|
|
cscliErr := s.runCSCLI(applyCtx, cleanSlug)
|
|
if cscliErr == nil {
|
|
result.Status = "applied"
|
|
result.ReloadHint = true
|
|
result.UsedCSCLI = true
|
|
return result, nil
|
|
}
|
|
logger.Log().WithField("slug", cleanSlug).WithError(cscliErr).Warn("cscli install failed; attempting cache fallback")
|
|
}
|
|
|
|
// === MODIFIED: Use pre-loaded archive or refresh ===
|
|
if metaErr != nil || archiveErr != nil {
|
|
refreshed, refreshErr := s.refreshCache(applyCtx, cleanSlug, metaErr)
|
|
if refreshErr != nil {
|
|
_ = s.rollback(backupPath)
|
|
return result, fmt.Errorf("load cache for %s: %w", cleanSlug, refreshErr)
|
|
}
|
|
meta = refreshed
|
|
result.CacheKey = meta.CacheKey
|
|
// Re-read archive from refreshed cache location
|
|
archive, archiveErr = os.ReadFile(meta.ArchivePath)
|
|
if archiveErr != nil {
|
|
_ = s.rollback(backupPath)
|
|
return result, fmt.Errorf("read archive: %w", archiveErr)
|
|
}
|
|
}
|
|
|
|
// Use the pre-loaded archive bytes
|
|
if err := s.extractTarGz(applyCtx, archive, s.DataDir); err != nil {
|
|
_ = s.rollback(backupPath)
|
|
return result, fmt.Errorf("extract: %w", err)
|
|
}
|
|
// === END MODIFIED ===
|
|
|
|
result.Status = "applied"
|
|
result.ReloadHint = true
|
|
result.UsedCSCLI = false
|
|
return result, nil
|
|
}
|
|
```
|
|
|
|
### Option B: Move Cache Outside DataDir
|
|
|
|
**Rationale:** Architectural fix - separates transient cache from operational config.
|
|
|
|
**Files to modify:**
|
|
- [backend/internal/api/handlers/crowdsec_handler.go](../../backend/internal/api/handlers/crowdsec_handler.go) - Change cache location
|
|
- [backend/internal/crowdsec/hub_sync.go](../../backend/internal/crowdsec/hub_sync.go) - Add cache dir parameter
|
|
|
|
**Changes:**
|
|
```go
|
|
// In NewCrowdsecHandler:
|
|
// BEFORE:
|
|
cacheDir := filepath.Join(dataDir, "hub_cache")
|
|
|
|
// AFTER:
|
|
cacheDir := filepath.Join(filepath.Dir(dataDir), "hub_cache")
|
|
// Results in: /app/data/hub_cache (sibling of crowdsec, not child)
|
|
```
|
|
|
|
**Pros:** Clean separation, cache survives config resets
|
|
**Cons:** Breaking change for existing installs, requires migration
|
|
|
|
### Option C: Selective Backup (Exclude Cache)
|
|
|
|
**Rationale:** Only backup config files, not cache.
|
|
|
|
**Changes to `backupExisting()`:**
|
|
```go
|
|
func (s *HubService) backupExisting(backupPath string) error {
|
|
// ... existing checks ...
|
|
|
|
// Skip hub_cache during backup - it's transient
|
|
return filepath.WalkDir(s.DataDir, func(path string, d fs.DirEntry, err error) error {
|
|
if strings.Contains(path, "hub_cache") {
|
|
return filepath.SkipDir
|
|
}
|
|
// ... copy logic ...
|
|
})
|
|
}
|
|
```
|
|
|
|
**Pros:** Faster backups, cache preserved
|
|
**Cons:** More complex, backup is no longer complete snapshot
|
|
|
|
---
|
|
|
|
## Recommended Implementation
|
|
|
|
**Choose Option A** for these reasons:
|
|
|
|
1. **Minimal code change** - Single function modification
|
|
2. **No breaking changes** - Existing cache paths remain valid
|
|
3. **No migration needed** - Works immediately
|
|
4. **Maintains complete backups** - Backup still captures full state
|
|
5. **Easy to test** - Clear before/after behavior
|
|
|
|
---
|
|
|
|
## Files to Modify
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| [backend/internal/crowdsec/hub_sync.go](../../backend/internal/crowdsec/hub_sync.go) | Reorder archive read before backup in `Apply()` |
|
|
| [backend/internal/crowdsec/hub_sync_test.go](../../backend/internal/crowdsec/hub_sync_test.go) | Add test for apply with backup scenario |
|
|
| [backend/internal/crowdsec/hub_pull_apply_test.go](../../backend/internal/crowdsec/hub_pull_apply_test.go) | Add regression test |
|
|
|
|
---
|
|
|
|
## Specific Code Changes
|
|
|
|
### Change 1: hub_sync.go - Apply() Function
|
|
|
|
**Location:** Lines 514-580
|
|
|
|
**Before:**
|
|
```go
|
|
func (s *HubService) Apply(ctx context.Context, slug string) (ApplyResult, error) {
|
|
cleanSlug := sanitizeSlug(slug)
|
|
// ... validation ...
|
|
|
|
result := ApplyResult{AppliedPreset: cleanSlug, Status: "failed"}
|
|
meta, metaErr := s.loadCacheMeta(applyCtx, cleanSlug)
|
|
if metaErr == nil {
|
|
result.CacheKey = meta.CacheKey
|
|
}
|
|
hasCS := s.hasCSCLI(applyCtx)
|
|
|
|
backupPath := filepath.Clean(s.DataDir) + ".backup." + time.Now().Format("20060102-150405")
|
|
if err := s.backupExisting(backupPath); err != nil {
|
|
return result, fmt.Errorf("backup: %w", err)
|
|
}
|
|
result.BackupPath = backupPath
|
|
|
|
// Try cscli first
|
|
if hasCS {
|
|
// ... cscli logic ...
|
|
}
|
|
|
|
if metaErr != nil {
|
|
// ... refresh cache logic ...
|
|
}
|
|
|
|
archive, err := os.ReadFile(meta.ArchivePath) // ❌ FAILS - file moved by backup!
|
|
if err != nil {
|
|
_ = s.rollback(backupPath)
|
|
return result, fmt.Errorf("read archive: %w", err)
|
|
}
|
|
// ...
|
|
}
|
|
```
|
|
|
|
**After:**
|
|
```go
|
|
func (s *HubService) Apply(ctx context.Context, slug string) (ApplyResult, error) {
|
|
cleanSlug := sanitizeSlug(slug)
|
|
// ... validation ...
|
|
|
|
result := ApplyResult{AppliedPreset: cleanSlug, Status: "failed"}
|
|
meta, metaErr := s.loadCacheMeta(applyCtx, cleanSlug)
|
|
if metaErr == nil {
|
|
result.CacheKey = meta.CacheKey
|
|
}
|
|
hasCS := s.hasCSCLI(applyCtx)
|
|
|
|
// ✅ NEW: Read archive into memory BEFORE backup moves the files
|
|
var archive []byte
|
|
var archiveReadErr error
|
|
if metaErr == nil {
|
|
archive, archiveReadErr = os.ReadFile(meta.ArchivePath)
|
|
if archiveReadErr != nil {
|
|
logger.Log().WithError(archiveReadErr).WithField("archive_path", meta.ArchivePath).
|
|
Warn("failed to read cached archive before backup")
|
|
}
|
|
}
|
|
|
|
backupPath := filepath.Clean(s.DataDir) + ".backup." + time.Now().Format("20060102-150405")
|
|
if err := s.backupExisting(backupPath); err != nil {
|
|
return result, fmt.Errorf("backup: %w", err)
|
|
}
|
|
result.BackupPath = backupPath
|
|
|
|
// Try cscli first
|
|
if hasCS {
|
|
cscliErr := s.runCSCLI(applyCtx, cleanSlug)
|
|
if cscliErr == nil {
|
|
result.Status = "applied"
|
|
result.ReloadHint = true
|
|
result.UsedCSCLI = true
|
|
return result, nil
|
|
}
|
|
logger.Log().WithField("slug", cleanSlug).WithError(cscliErr).
|
|
Warn("cscli install failed; attempting cache fallback")
|
|
}
|
|
|
|
// ✅ MODIFIED: Handle cache miss OR failed archive read
|
|
if metaErr != nil || archiveReadErr != nil {
|
|
// Need to refresh cache (either wasn't cached or file was unreadable)
|
|
originalErr := metaErr
|
|
if originalErr == nil {
|
|
originalErr = archiveReadErr
|
|
}
|
|
refreshed, refreshErr := s.refreshCache(applyCtx, cleanSlug, originalErr)
|
|
if refreshErr != nil {
|
|
_ = s.rollback(backupPath)
|
|
logger.Log().WithError(refreshErr).WithField("slug", cleanSlug).
|
|
WithField("backup_path", backupPath).
|
|
Warn("cache refresh failed; rolled back backup")
|
|
result.ErrorMessage = fmt.Sprintf("load cache for %s: %v", cleanSlug, refreshErr)
|
|
return result, fmt.Errorf("load cache for %s: %w", cleanSlug, refreshErr)
|
|
}
|
|
meta = refreshed
|
|
result.CacheKey = meta.CacheKey
|
|
|
|
// Read from the newly refreshed cache
|
|
archive, archiveReadErr = os.ReadFile(meta.ArchivePath)
|
|
if archiveReadErr != nil {
|
|
_ = s.rollback(backupPath)
|
|
return result, fmt.Errorf("read archive after refresh: %w", archiveReadErr)
|
|
}
|
|
}
|
|
|
|
// ✅ Use pre-loaded archive bytes (no file read here)
|
|
if err := s.extractTarGz(applyCtx, archive, s.DataDir); err != nil {
|
|
_ = s.rollback(backupPath)
|
|
return result, fmt.Errorf("extract: %w", err)
|
|
}
|
|
|
|
result.Status = "applied"
|
|
result.ReloadHint = true
|
|
result.UsedCSCLI = false
|
|
return result, nil
|
|
}
|
|
```
|
|
|
|
### Change 2: Add Regression Test
|
|
|
|
**File:** [backend/internal/crowdsec/hub_pull_apply_test.go](../../backend/internal/crowdsec/hub_pull_apply_test.go)
|
|
|
|
**New test:**
|
|
```go
|
|
func TestApplyReadsArchiveBeforeBackup(t *testing.T) {
|
|
// This test verifies the fix for the bug where Apply() would:
|
|
// 1. Load cache metadata (getting archive path)
|
|
// 2. Backup DataDir (moving the cache!)
|
|
// 3. Try to read archive from original path (FAIL!)
|
|
|
|
baseDir := t.TempDir()
|
|
dataDir := filepath.Join(baseDir, "crowdsec")
|
|
cacheDir := filepath.Join(dataDir, "hub_cache")
|
|
|
|
// Create cache
|
|
cache, err := NewHubCache(cacheDir, time.Hour)
|
|
require.NoError(t, err)
|
|
|
|
// Create a mock hub server
|
|
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
|
if strings.Contains(r.URL.Path, ".tgz") {
|
|
// Return a valid tar.gz
|
|
var buf bytes.Buffer
|
|
gw := gzip.NewWriter(&buf)
|
|
tw := tar.NewWriter(gw)
|
|
content := []byte("test: value\n")
|
|
tw.WriteHeader(&tar.Header{Name: "test.yaml", Size: int64(len(content)), Mode: 0644})
|
|
tw.Write(content)
|
|
tw.Close()
|
|
gw.Close()
|
|
w.Write(buf.Bytes())
|
|
return
|
|
}
|
|
if strings.Contains(r.URL.Path, ".yaml") {
|
|
w.Write([]byte("preview: content"))
|
|
return
|
|
}
|
|
// Index
|
|
w.Write([]byte(`{"items":[{"name":"test/preset","version":"1.0"}]}`))
|
|
}))
|
|
defer server.Close()
|
|
|
|
hub := &HubService{
|
|
Cache: cache,
|
|
DataDir: dataDir,
|
|
HTTPClient: server.Client(),
|
|
HubBaseURL: server.URL,
|
|
MirrorBaseURL: server.URL,
|
|
PullTimeout: 10 * time.Second,
|
|
ApplyTimeout: 10 * time.Second,
|
|
}
|
|
|
|
ctx := context.Background()
|
|
|
|
// Pull to populate cache
|
|
_, err = hub.Pull(ctx, "test/preset")
|
|
require.NoError(t, err, "pull should succeed")
|
|
|
|
// Verify cache exists
|
|
_, err = cache.Load(ctx, "test/preset")
|
|
require.NoError(t, err, "cache should exist after pull")
|
|
|
|
// Add some extra files to DataDir to make backup more realistic
|
|
require.NoError(t, os.WriteFile(filepath.Join(dataDir, "config.yaml"), []byte("test: config"), 0644))
|
|
|
|
// Apply - this should NOT fail with "read archive: no such file"
|
|
result, err := hub.Apply(ctx, "test/preset")
|
|
require.NoError(t, err, "apply should succeed - archive should be read before backup")
|
|
assert.Equal(t, "applied", result.Status)
|
|
assert.NotEmpty(t, result.BackupPath)
|
|
|
|
// Verify backup was created
|
|
_, err = os.Stat(result.BackupPath)
|
|
assert.NoError(t, err, "backup should exist")
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Edge Cases to Consider
|
|
|
|
| Scenario | Current Behavior | Fixed Behavior |
|
|
|----------|-----------------|----------------|
|
|
| First-time apply (no cache) | Fails with cache miss | Attempts refresh, same behavior |
|
|
| cscli available and works | Returns early, never hits bug | Same - returns early |
|
|
| cscli fails, cache exists | **FAILS** - archive moved | Succeeds - archive pre-loaded |
|
|
| Archive file corrupted | Fails on read | Same - fails on read, but before backup |
|
|
| Network down during refresh | Fails | Same - fails with clear error |
|
|
| Large archive (>25MB) | Limited by maxArchiveSize | Same - memory is fine for 25MB |
|
|
| Concurrent applies | Potential race | Still potential race (separate issue) |
|
|
|
|
---
|
|
|
|
## Testing Plan
|
|
|
|
1. **Unit Tests**
|
|
- [ ] `TestApplyReadsArchiveBeforeBackup` - New regression test
|
|
- [ ] Existing `TestPullThenApplyFlow` should still pass
|
|
- [ ] `TestApplyWithoutPullFails` should still pass
|
|
|
|
2. **Integration Tests**
|
|
- [ ] Manual test in Docker container
|
|
- [ ] Pull preset via UI
|
|
- [ ] Apply preset via UI
|
|
- [ ] Verify no "read archive" error
|
|
|
|
3. **Edge Case Tests**
|
|
- [ ] Apply with expired cache (should refresh)
|
|
- [ ] Apply with network failure (should error gracefully)
|
|
- [ ] Apply with cscli available (should use cscli path)
|
|
|
|
---
|
|
|
|
## Rollout Plan
|
|
|
|
1. **Implement fix** in `hub_sync.go`
|
|
2. **Add regression test** in `hub_pull_apply_test.go`
|
|
3. **Run full test suite**: `go test ./...`
|
|
4. **Run pre-commit**: `pre-commit run --all-files`
|
|
5. **Build and test locally**: `docker build -t charon:local .`
|
|
6. **Manual verification in container**
|
|
7. **Commit with**: `fix: read archive before backup in CrowdSec preset apply`
|
|
|
|
---
|
|
|
|
## Related Files Reference
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| [hub_sync.go](../../backend/internal/crowdsec/hub_sync.go) | HubService.Apply() - main fix location |
|
|
| [hub_cache.go](../../backend/internal/crowdsec/hub_cache.go) | Cache storage, stores ArchivePath |
|
|
| [crowdsec_handler.go](../../backend/internal/api/handlers/crowdsec_handler.go) | HTTP handler, initializes cache |
|
|
| [routes.go](../../backend/internal/api/routes/routes.go) | Sets crowdsecDataDir from config |
|
|
| [config.go](../../backend/internal/config/config.go) | CrowdSecConfigDir default |
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
**Root Cause:** The `Apply()` function backs up the entire DataDir (which includes the cache) before reading the cached archive, resulting in a "file not found" error.
|
|
|
|
**Fix:** Read the archive into memory before creating the backup.
|
|
|
|
**Impact:** Low risk - the fix only changes the order of operations and doesn't affect the backup or extraction logic.
|
|
|
|
**Effort:** ~30 minutes implementation + testing
|
|
| 1 | Cerberus shows ON by default on first load (should be OFF) | High |
|
|
| 2 | Cerberus dashboard header shows "disabled" even when enabled | Medium |
|
|
| 3 | CrowdSec toggle auto-enables when Cerberus is enabled | Medium |
|
|
| 4 | CrowdSec toggle unresponsive + Config button grayed out | High |
|
|
|
|
---
|
|
|
|
## Root Cause Analysis
|
|
|
|
### Issue 1: Cerberus Shows ON by Default
|
|
|
|
**Root Cause:** The `feature_flags_handler.go` has a default value of `true` for all feature flags including `feature.cerberus.enabled`.
|
|
|
|
**File:** [backend/internal/api/handlers/feature_flags_handler.go#L39-L42](../../backend/internal/api/handlers/feature_flags_handler.go#L39-L42)
|
|
|
|
```go
|
|
// Line 39-42
|
|
for _, key := range defaultFlags {
|
|
defaultVal := true // <-- THIS IS THE BUG
|
|
if v, ok := defaultFlagValues[key]; ok {
|
|
defaultVal = v
|
|
}
|
|
```
|
|
|
|
**Problem:** The code sets `defaultVal := true` for all flags, then only overrides it if the key exists in `defaultFlagValues`. However, `feature.cerberus.enabled` is NOT in `defaultFlagValues`:
|
|
|
|
```go
|
|
// Line 29-31
|
|
var defaultFlagValues = map[string]bool{
|
|
"feature.crowdsec.console_enrollment": false,
|
|
}
|
|
```
|
|
|
|
**Result:** On first load with an empty database, `feature.cerberus.enabled` defaults to `true` instead of `false`.
|
|
|
|
**Additional Context:**
|
|
- The [backend/internal/config/config.go#L60](../../backend/internal/config/config.go#L60) correctly defaults `CerberusEnabled` to `false`:
|
|
```go
|
|
CerberusEnabled: getEnvAny("false", "CERBERUS_SECURITY_CERBERUS_ENABLED", ...) == "true"
|
|
```
|
|
- However, the feature flags handler ignores this config and uses its own default.
|
|
|
|
---
|
|
|
|
### Issue 2: Dashboard Header Shows "Disabled" Even When Enabled
|
|
|
|
**Root Cause:** The header banner logic in `Security.tsx` checks `status.cerberus?.enabled` which comes from the security status API, but there's a **data source mismatch**.
|
|
|
|
**Files:**
|
|
- [frontend/src/pages/Security.tsx#L141-L153](../../frontend/src/pages/Security.tsx#L141-L153) - Header banner logic
|
|
- [backend/internal/api/handlers/security_handler.go#L35-L49](../../backend/internal/api/handlers/security_handler.go#L35-L49) - Security status API
|
|
|
|
**Problem Flow:**
|
|
|
|
1. **Security.tsx** checks `status.cerberus?.enabled` from `/api/v1/security/status`
|
|
2. **security_handler.go** reads from config AND settings table:
|
|
```go
|
|
// Line 36-48
|
|
enabled := h.cfg.CerberusEnabled
|
|
var settingKey = "security.cerberus.enabled" // <-- WRONG KEY!
|
|
if h.db != nil {
|
|
var setting struct{ Value string }
|
|
if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", settingKey).Scan(&setting).Error; ...
|
|
```
|
|
3. **SystemSettings.tsx** toggles `feature.cerberus.enabled` (via feature flags API)
|
|
|
|
**The Mismatch:**
|
|
|
|
| Component | Key Used |
|
|
|-----------|----------|
|
|
| SystemSettings toggle | `feature.cerberus.enabled` |
|
|
| Security status API | `security.cerberus.enabled` |
|
|
|
|
The toggle writes to `feature.cerberus.enabled` but the security status reads from `security.cerberus.enabled` - **two different keys!**
|
|
|
|
---
|
|
|
|
### Issue 3: CrowdSec Auto-Enables When Cerberus is Enabled
|
|
|
|
**Root Cause:** The `docker-compose.override.yml` and `docker-compose.local.yml` both set `CHARON_SECURITY_CROWDSEC_MODE=local`:
|
|
|
|
**File:** [docker-compose.override.yml#L21](../../docker-compose.override.yml#L21)
|
|
```yaml
|
|
- CHARON_SECURITY_CROWDSEC_MODE=local
|
|
```
|
|
|
|
**Problem:** When the container starts:
|
|
1. Config loads with `CrowdSecMode: "local"` from env var
|
|
2. Security status API returns `crowdsec.enabled: true` because mode is "local"
|
|
3. Frontend shows CrowdSec as enabled
|
|
|
|
**File:** [backend/internal/api/handlers/security_handler.go#L59-L62](../../backend/internal/api/handlers/security_handler.go#L59-L62)
|
|
```go
|
|
// Allow runtime override for CrowdSec enabled flag via settings table
|
|
crowdsecEnabled := mode == "local" // <-- Auto-true if mode is "local"
|
|
```
|
|
|
|
---
|
|
|
|
### Issue 4: CrowdSec Toggle Unresponsive + Config Button Grayed Out
|
|
|
|
**Root Cause:** Multiple issues combine to break the toggle:
|
|
|
|
**A. Toggle Disabled Logic:**
|
|
|
|
**File:** [frontend/src/pages/Security.tsx#L127](../../frontend/src/pages/Security.tsx#L127)
|
|
```tsx
|
|
const crowdsecToggleDisabled = cerberusDisabled || crowdsecPowerMutation.isPending
|
|
```
|
|
|
|
**File:** [frontend/src/pages/Security.tsx#L126](../../frontend/src/pages/Security.tsx#L126)
|
|
```tsx
|
|
const cerberusDisabled = !status.cerberus?.enabled
|
|
```
|
|
|
|
Since `status.cerberus?.enabled` is `false` due to Issue 2 (wrong settings key), `cerberusDisabled` is `true`, making the toggle disabled.
|
|
|
|
**B. Config Button Disabled:**
|
|
|
|
**File:** [frontend/src/pages/Security.tsx#L128](../../frontend/src/pages/Security.tsx#L128)
|
|
```tsx
|
|
const crowdsecControlsDisabled = cerberusDisabled || crowdsecPowerMutation.isPending
|
|
```
|
|
|
|
Same logic - the controls are disabled because Cerberus appears disabled.
|
|
|
|
**C. Switch Component Event Handling:**
|
|
|
|
**File:** [frontend/src/components/ui/Switch.tsx#L17-L20](../../frontend/src/components/ui/Switch.tsx#L17-L20)
|
|
|
|
The Switch component passes `disabled` to the native checkbox input, which prevents click events. This is correct behavior - the issue is the `disabled` prop is incorrectly `true`.
|
|
|
|
---
|
|
|
|
## Recommended Fixes
|
|
|
|
### Fix 1: Update Feature Flag Defaults
|
|
|
|
**File:** `backend/internal/api/handlers/feature_flags_handler.go`
|
|
|
|
```go
|
|
// Change defaultFlagValues to include cerberus.enabled as false
|
|
var defaultFlagValues = map[string]bool{
|
|
"feature.cerberus.enabled": false, // ADD THIS
|
|
"feature.crowdsec.console_enrollment": false,
|
|
"feature.uptime.enabled": true, // Uptime can default ON
|
|
}
|
|
```
|
|
|
|
### Fix 2: Align Settings Keys
|
|
|
|
**Option A (Recommended):** Update security_handler.go to read from feature flags key
|
|
|
|
**File:** `backend/internal/api/handlers/security_handler.go`
|
|
|
|
```go
|
|
// Line 37: Change from
|
|
var settingKey = "security.cerberus.enabled"
|
|
// To
|
|
var settingKey = "feature.cerberus.enabled"
|
|
```
|
|
|
|
**Option B:** Create a sync mechanism between feature flags and security settings
|
|
|
|
### Fix 3: Remove CrowdSec Mode Override from Docker Compose
|
|
|
|
**Files:**
|
|
- `docker-compose.override.yml`
|
|
- `docker-compose.local.yml`
|
|
|
|
```yaml
|
|
# Remove or comment out:
|
|
# - CHARON_SECURITY_CROWDSEC_MODE=local
|
|
# Or change to:
|
|
- CHARON_SECURITY_CROWDSEC_MODE=disabled
|
|
```
|
|
|
|
### Fix 4: No Additional Fix Needed
|
|
|
|
Issue 4 is a symptom of Issues 1-2. Once those are fixed:
|
|
- `cerberusDisabled` will be `false` when Cerberus is enabled
|
|
- `crowdsecToggleDisabled` will be `false`
|
|
- `crowdsecControlsDisabled` will be `false`
|
|
- Toggle and Config button will be interactive
|
|
|
|
---
|
|
|
|
## Test Scenarios
|
|
|
|
### Test 1: Fresh Install Default State
|
|
```
|
|
Given: Clean database, no env vars set
|
|
When: User loads the Settings > System page
|
|
Then: Cerberus toggle should be OFF
|
|
And: /api/v1/feature-flags returns { "feature.cerberus.enabled": false }
|
|
```
|
|
|
|
### Test 2: Cerberus Toggle Sync
|
|
```
|
|
Given: User is on Settings > System page
|
|
When: User enables Cerberus toggle
|
|
Then: /api/v1/security/status returns { "cerberus": { "enabled": true } }
|
|
And: Security dashboard header banner is NOT displayed
|
|
```
|
|
|
|
### Test 3: CrowdSec Toggle Interaction
|
|
```
|
|
Given: Cerberus is enabled
|
|
And: User is on Security dashboard
|
|
When: User clicks CrowdSec toggle
|
|
Then: Toggle should respond to click
|
|
And: CrowdSec enabled state should change
|
|
And: Toast notification should appear
|
|
```
|
|
|
|
### Test 4: CrowdSec Config Button
|
|
```
|
|
Given: Cerberus is enabled
|
|
And: User is on Security dashboard
|
|
When: User clicks CrowdSec "Config" button
|
|
Then: User should navigate to /security/crowdsec
|
|
And: Button should NOT be grayed out
|
|
```
|
|
|
|
### Test 5: Environment Variable Override
|
|
```
|
|
Given: CERBERUS_SECURITY_CERBERUS_ENABLED=true set
|
|
When: User loads Settings > System (fresh DB)
|
|
Then: Cerberus toggle should be ON (env override)
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Priority
|
|
|
|
| Priority | Fix | Effort | Impact |
|
|
|----------|-----|--------|--------|
|
|
| P0 | Fix 2 (Key alignment) | Low | High - Fixes Issues 2, 4 |
|
|
| P1 | Fix 1 (Default values) | Low | High - Fixes Issue 1 |
|
|
| P2 | Fix 3 (Docker compose) | Low | Medium - Fixes Issue 3 |
|
|
|
|
---
|
|
|
|
## Files to Modify
|
|
|
|
1. **backend/internal/api/handlers/feature_flags_handler.go** - Add default value for cerberus
|
|
2. **backend/internal/api/handlers/security_handler.go** - Change settings key to `feature.cerberus.enabled`
|
|
3. **docker-compose.override.yml** - Remove or change CrowdSec mode
|
|
4. **docker-compose.local.yml** - Remove or change CrowdSec mode
|
|
|
|
---
|
|
|
|
## Additional Observations
|
|
|
|
1. **Dual Control Systems:** There are two overlapping control systems:
|
|
- Feature flags (`feature.cerberus.enabled`) - toggled in SystemSettings.tsx
|
|
- Security config (`SecurityConfig.Enabled` in DB) - used by Enable/Disable endpoints
|
|
|
|
Consider consolidating to one source of truth.
|
|
|
|
2. **Config vs Settings:** The `config.SecurityConfig` struct loaded from env vars is separate from DB-backed `SecurityConfig` model. This creates confusion about which takes precedence.
|
|
|
|
3. **No Migration:** When updating default values, existing users may need a migration or reset to see the new defaults.
|
|
|
|
---
|
|
|
|
## Code Reference Summary
|
|
|
|
| File | Line | Purpose |
|
|
|------|------|---------|
|
|
| `feature_flags_handler.go` | L29-31 | Missing cerberus default |
|
|
| `feature_flags_handler.go` | L39 | `defaultVal := true` bug |
|
|
| `security_handler.go` | L37 | Wrong settings key |
|
|
| `Security.tsx` | L126-128 | Disabled state logic |
|
|
| `SystemSettings.tsx` | L99-105 | Feature toggle UI |
|
|
| `docker-compose.override.yml` | L21 | CrowdSec mode env var |
|
|
| `config.go` | L60 | Correct cerberus default |
|