946 lines
31 KiB
Markdown
946 lines
31 KiB
Markdown
# CrowdSec Console Enrollment Persistence Issue - ARCHITECTURAL ROOT CAUSE
|
|
|
|
**Date:** December 14, 2025 (Updated with Architectural Analysis)
|
|
**Issue:** Console enrollment shows "enrolled" locally but doesn't appear on crowdsec.net
|
|
**Status:** 🚨 **ARCHITECTURAL ISSUE IDENTIFIED** - Environment variable dependency breaks GUI control
|
|
|
|
---
|
|
|
|
## 🎯 Key Findings
|
|
|
|
### Critical Discovery
|
|
The `CHARON_SECURITY_CROWDSEC_MODE` environment variable is **LEGACY/DEPRECATED** technical debt from when Charon supported external CrowdSec instances (no longer supported). Now that Charon offers the **import config option**, CrowdSec should be **entirely GUI-controlled**, but the code still checks environment variables.
|
|
|
|
### Root Cause Chain
|
|
1. User enables CrowdSec via GUI → Database updated (`security.crowdsec.enabled = true`)
|
|
2. Backend sees CrowdSec enabled and allows Console enrollment
|
|
3. **BUT** `docker-entrypoint.sh` checks `SECURITY_CROWDSEC_MODE` environment variable
|
|
4. LAPI never starts because env var says "disabled"
|
|
5. Enrollment command runs but cannot contact LAPI
|
|
6. User sees "enrolled" in UI but nothing appears on crowdsec.net
|
|
|
|
### Why This is an Architecture Problem
|
|
- **WAF, ACL, and Rate Limiting** are all GUI-controlled via Settings table
|
|
- **CrowdSec** still has legacy environment variable checks in entrypoint script
|
|
- Backend has proper `Start()` and `Stop()` handlers but they're not integrated with container lifecycle
|
|
- This creates inconsistent UX where GUI toggle doesn't actually control the service
|
|
|
|
### Impact
|
|
- **ALL users** attempting Console enrollment are affected
|
|
- **Not a configuration issue** - users cannot fix this without workaround
|
|
- **Technical debt** preventing proper GUI-based security orchestration
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The CrowdSec console enrollment appears successful locally (green checkmark in Charon UI) but the instance **does not appear on the CrowdSec Console dashboard at crowdsec.net**.
|
|
|
|
**🚨 CRITICAL ARCHITECTURAL ISSUE:** The `CHARON_SECURITY_CROWDSEC_MODE` environment variable is **LEGACY/DEPRECATED** from when Charon supported external CrowdSec instances. Now that Charon offers the **import config option**, CrowdSec is **always internally managed** and should be **GUI-controlled**, not environment variable controlled.
|
|
|
|
**✅ TRUE ROOT CAUSE:** The code still checks the legacy `SECURITY_CROWDSEC_MODE` environment variable in `docker-entrypoint.sh`, which prevents LAPI from starting even when the GUI says CrowdSec is enabled. The `cscli console enroll` command **requires LAPI to be running** to complete the enrollment registration with crowdsec.net.
|
|
|
|
**CORRECTED UNDERSTANDING:** Enrollment tokens are **REUSABLE** (confirmed by user testing). The issue is NOT token exhaustion - it's that the enrollment process cannot complete without an active LAPI connection.
|
|
|
|
**Key Finding:** The enrollment command executes without error even when LAPI is down, causing the database to show "enrolled" status while the actual Console registration never happens.
|
|
|
|
---
|
|
|
|
## Architectural Analysis
|
|
|
|
### Current Architecture (INCORRECT)
|
|
|
|
**Environment Variable Dependency:**
|
|
```bash
|
|
# docker-entrypoint.sh checks this legacy env var:
|
|
SECURITY_CROWDSEC_MODE=${CERBERUS_SECURITY_CROWDSEC_MODE:-${CHARON_SECURITY_CROWDSEC_MODE:-$CPM_SECURITY_CROWDSEC_MODE}}
|
|
|
|
if [ "$SECURITY_CROWDSEC_MODE" = "local" ]; then
|
|
crowdsec -c /etc/crowdsec/config.yaml &
|
|
fi
|
|
```
|
|
|
|
**The Problem:**
|
|
- User enables CrowdSec via GUI → `security.crowdsec.enabled = true` in database
|
|
- Backend sees CrowdSec enabled and allows enrollment
|
|
- But `docker-entrypoint.sh` checks **environment variable**, not database
|
|
- LAPI never starts because env var says "disabled"
|
|
- Enrollment command runs but cannot contact LAPI
|
|
- User sees "enrolled" in UI but nothing on crowdsec.net
|
|
|
|
### Correct Architecture (GUI-Controlled)
|
|
|
|
**How Other Security Features Work (Pattern to Follow):**
|
|
|
|
WAF, Rate Limiting, and ACL are all **GUI-controlled** through the Settings table:
|
|
- `security.waf.enabled` → Controls WAF mode
|
|
- `security.rate_limit.enabled` → Controls rate limiting
|
|
- `security.acl.enabled` → Controls ACL mode
|
|
|
|
These settings are read by:
|
|
1. **Backend handlers** via `security_handler.go:GetStatus()`
|
|
2. **Caddy config generator** via `caddy/manager.go:computeEffectiveFlags()`
|
|
3. **Frontend** via API calls to `/api/v1/security/status`
|
|
|
|
**CrowdSec Should Follow Same Pattern:**
|
|
- GUI toggle → `security.crowdsec.enabled` in Settings table
|
|
- Backend reads setting and manages CrowdSec process lifecycle
|
|
- No environment variable dependency
|
|
|
|
### Import Config Feature (Why External Mode is Deprecated)
|
|
|
|
The import config feature (`importCrowdsecConfig`) allows users to:
|
|
1. Upload a complete CrowdSec configuration (tar.gz)
|
|
2. Import pre-configured settings, collections, and bouncers
|
|
3. Manage CrowdSec entirely through Charon's GUI
|
|
|
|
**This replaced the need for "external" mode:**
|
|
- Old way: Set `CROWDSEC_MODE=external` and point to external LAPI
|
|
- New way: Import your existing config and let Charon manage it internally
|
|
|
|
---
|
|
|
|
## Forensic Investigation Findings
|
|
|
|
### Environment Status (Verified Dec 14, 2025)
|
|
|
|
**✅ CAPI Registration:** Working
|
|
```bash
|
|
$ docker exec charon cscli capi status
|
|
✓ Loaded credentials from /etc/crowdsec/online_api_credentials.yaml
|
|
✓ You can successfully interact with Central API (CAPI)
|
|
```
|
|
|
|
**❌ LAPI Status:** NOT RUNNING
|
|
```bash
|
|
$ docker exec charon cscli lapi status
|
|
✗ Error: dial tcp 127.0.0.1:8085: connection refused
|
|
```
|
|
|
|
**❌ CrowdSec Agent:** NOT RUNNING
|
|
```bash
|
|
$ docker exec charon ps aux | grep crowdsec
|
|
(no processes found)
|
|
```
|
|
|
|
**Environment Variables:**
|
|
```bash
|
|
CHARON_SECURITY_CROWDSEC_MODE=disabled # ← THIS IS THE PROBLEM
|
|
```
|
|
|
|
### Why Enrollment Appears Successful
|
|
|
|
The enrollment flow in `backend/internal/crowdsec/console_enroll.go`:
|
|
|
|
1. ✅ Validates token format
|
|
2. ✅ Ensures CAPI registered (`ensureCAPIRegistered`)
|
|
3. ✅ Updates database to "enrolling" status
|
|
4. ✅ Executes `cscli console enroll <token>`
|
|
5. **❌ Command exits with code 0 even when LAPI is down**
|
|
6. ✅ Updates database to "enrolled" status
|
|
7. ✅ Returns success to UI
|
|
|
|
**The Bug:** `cscli console enroll` does NOT verify LAPI connectivity before returning success. It writes local state but cannot register with crowdsec.net Console API without an active LAPI connection.
|
|
|
|
---
|
|
|
|
## Root Cause: Legacy Environment Variable Architecture
|
|
|
|
### Confirmed (100% Confidence)
|
|
|
|
**The Issue:** The `docker-entrypoint.sh` script only starts CrowdSec LAPI when checking a **legacy environment variable**, not the **GUI setting**:
|
|
|
|
```bash
|
|
# docker-entrypoint.sh (INCORRECT ARCHITECTURE)
|
|
SECURITY_CROWDSEC_MODE=${CERBERUS_SECURITY_CROWDSEC_MODE:-${CHARON_SECURITY_CROWDSEC_MODE:-$CPM_SECURITY_CROWDSEC_MODE}}
|
|
|
|
if [ "$SECURITY_CROWDSEC_MODE" = "local" ]; then
|
|
crowdsec -c /etc/crowdsec/config.yaml &
|
|
fi
|
|
```
|
|
|
|
**Current State:**
|
|
- GUI setting: `security.crowdsec.enabled = true` (in database)
|
|
- Environment: `CHARON_SECURITY_CROWDSEC_MODE=disabled`
|
|
- Result: LAPI NOT RUNNING
|
|
|
|
**Correct Architecture:**
|
|
- CrowdSec should be started/stopped by **backend handlers** (`Start()` and `Stop()` methods)
|
|
- The GUI toggle should call these handlers, just like WAF and ACL
|
|
- No environment variable checks in entrypoint script
|
|
|
|
**Console Enrollment REQUIRES:**
|
|
1. CrowdSec agent running
|
|
2. Local API (LAPI) running on port 8085
|
|
3. Active connection between LAPI and Console API (api.crowdsec.net)
|
|
4. **All controlled by GUI, not environment variables**
|
|
|
|
---
|
|
|
|
## Comparison: How WAF/ACL Work (Correct Pattern)
|
|
|
|
### WAF Control Flow (GUI → Backend → Caddy)
|
|
|
|
1. **Frontend:** User toggles WAF switch → calls `updateSetting('security.waf.enabled', 'true')`
|
|
2. **Backend:** Settings table updated → Caddy config regenerated
|
|
3. **Caddy Manager:** Reads `security.waf.enabled` from database → enables WAF handlers
|
|
4. **No Environment Variable Checks**
|
|
|
|
### CrowdSec Control Flow (BROKEN - Still Uses Env Vars)
|
|
|
|
1. **Frontend:** User toggles CrowdSec switch → calls `updateSetting('security.crowdsec.enabled', 'true')`
|
|
2. **Backend:** Settings table updated → BUT...
|
|
3. **Entrypoint Script:** Checks `SECURITY_CROWDSEC_MODE` env var (LEGACY)
|
|
4. **Result:** LAPI never starts because env var says "disabled"
|
|
|
|
### How CrowdSec SHOULD Work (GUI-Controlled)
|
|
|
|
1. **Frontend:** User toggles CrowdSec switch → calls `/api/v1/admin/crowdsec/start`
|
|
2. **Backend Handler:** `CrowdsecHandler.Start()` executes → starts LAPI process
|
|
3. **Process Management:** Backend tracks PID and monitors health
|
|
4. **No Environment Variable Dependency**
|
|
|
|
**Evidence from Code:**
|
|
|
|
```go
|
|
// backend/internal/api/handlers/crowdsec_handler.go
|
|
// These handlers already exist but aren't properly integrated!
|
|
|
|
func (h *CrowdsecHandler) Start(c *gin.Context) {
|
|
ctx := c.Request.Context()
|
|
pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir)
|
|
if err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
c.JSON(http.StatusOK, gin.H{"status": "started", "pid": pid})
|
|
}
|
|
|
|
func (h *CrowdsecHandler) Stop(c *gin.Context) {
|
|
ctx := c.Request.Context()
|
|
if err := h.Executor.Stop(ctx, h.DataDir); err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
c.JSON(http.StatusOK, gin.H{"status": "stopped"})
|
|
}
|
|
```
|
|
|
|
**Frontend Integration:**
|
|
|
|
```typescript
|
|
// frontend/src/pages/Security.tsx
|
|
// CrowdSec toggle DOES call start/stop, but LAPI never started by entrypoint!
|
|
|
|
const crowdsecPowerMutation = useMutation({
|
|
mutationFn: async (enabled: boolean) => {
|
|
await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool')
|
|
if (enabled) {
|
|
await startCrowdsec() // ← Calls backend Start() handler
|
|
} else {
|
|
await stopCrowdsec() // ← Calls backend Stop() handler
|
|
}
|
|
return enabled
|
|
},
|
|
})
|
|
```
|
|
|
|
**The Missing Piece:** The `docker-entrypoint.sh` should ALWAYS initialize CrowdSec but NOT start the agent. The backend handlers should control the lifecycle.
|
|
|
|
---
|
|
|
|
## Immediate Fix (For User)
|
|
|
|
**WORKAROUND (Until Architecture Fixed):**
|
|
|
|
Set the legacy environment variable to match the GUI state:
|
|
|
|
**Step 1: Enable CrowdSec Local Mode (Environment Variable)**
|
|
|
|
Update `docker-compose.yml` or `docker-compose.override.yml`:
|
|
```yaml
|
|
services:
|
|
charon:
|
|
environment:
|
|
- CHARON_SECURITY_CROWDSEC_MODE=local # Temporary workaround for legacy check
|
|
```
|
|
|
|
**Step 2: Recreate Container**
|
|
```bash
|
|
docker compose down
|
|
docker compose up -d
|
|
```
|
|
|
|
**Step 3: Verify LAPI is Running**
|
|
```bash
|
|
# Wait 30 seconds for LAPI to start
|
|
docker exec charon cscli lapi status
|
|
```
|
|
|
|
Expected output:
|
|
```
|
|
✓ Loaded credentials from /etc/crowdsec/local_api_credentials.yaml
|
|
✓ You can successfully interact with Local API (LAPI)
|
|
```
|
|
|
|
**Step 4: Re-submit Enrollment Token**
|
|
- Go to Charon UI → Cerberus → CrowdSec
|
|
- Submit enrollment token (same token works!)
|
|
- Verify instance appears on crowdsec.net dashboard
|
|
|
|
---
|
|
|
|
## Long-Term Fix Implementation Plan (ARCHITECTURE CORRECTION)
|
|
|
|
### Priority Overview
|
|
|
|
1. **CRITICAL:** Remove environment variable dependency from entrypoint script
|
|
2. **CRITICAL:** Ensure backend handlers control CrowdSec lifecycle
|
|
3. **HIGH:** Add LAPI availability check before enrollment
|
|
4. **HIGH:** Update documentation to reflect GUI-only control
|
|
5. **MEDIUM:** Add migration guide for users with env vars set
|
|
|
|
---
|
|
|
|
### Fix 1: Remove Environment Variable Dependency (CRITICAL PRIORITY)
|
|
|
|
**Problem:** `docker-entrypoint.sh` checks legacy `SECURITY_CROWDSEC_MODE` env var
|
|
**Solution:** Remove env var check, let backend control CrowdSec lifecycle
|
|
**Time:** 45 minutes
|
|
**Files affected:** `docker-entrypoint.sh`, `backend/internal/api/handlers/crowdsec_handler.go`
|
|
|
|
**Implementation:**
|
|
|
|
**Part A: Update docker-entrypoint.sh**
|
|
|
|
Remove the CrowdSec agent auto-start logic:
|
|
|
|
```bash
|
|
# BEFORE (INCORRECT - Environment Variable Control):
|
|
if [ "$SECURITY_CROWDSEC_MODE" = "local" ]; then
|
|
echo "CrowdSec Local Mode enabled."
|
|
crowdsec -c /etc/crowdsec/config.yaml &
|
|
CROWDSEC_PID=$!
|
|
fi
|
|
|
|
# AFTER (CORRECT - Backend Control):
|
|
# CrowdSec initialization (config setup) always runs
|
|
# But agent startup is controlled by backend handlers via GUI
|
|
# No automatic startup based on environment variables
|
|
```
|
|
|
|
**Part B: Ensure Backend Handlers Work Correctly**
|
|
|
|
The `CrowdsecHandler.Start()` already exists and works:
|
|
|
|
```go
|
|
// backend/internal/api/handlers/crowdsec_handler.go
|
|
func (h *CrowdsecHandler) Start(c *gin.Context) {
|
|
ctx := c.Request.Context()
|
|
pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir)
|
|
if err != nil {
|
|
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
|
|
return
|
|
}
|
|
c.JSON(http.StatusOK, gin.H{"status": "started", "pid": pid})
|
|
}
|
|
```
|
|
|
|
**Part C: Frontend Integration Verification**
|
|
|
|
Verify the frontend correctly calls start/stop:
|
|
|
|
```typescript
|
|
// frontend/src/pages/Security.tsx (ALREADY CORRECT)
|
|
const crowdsecPowerMutation = useMutation({
|
|
mutationFn: async (enabled: boolean) => {
|
|
await updateSetting('security.crowdsec.enabled', enabled ? 'true' : 'false', 'security', 'bool')
|
|
if (enabled) {
|
|
await startCrowdsec() // Calls /api/v1/admin/crowdsec/start
|
|
} else {
|
|
await stopCrowdsec() // Calls /api/v1/admin/crowdsec/stop
|
|
}
|
|
return enabled
|
|
},
|
|
})
|
|
```
|
|
|
|
**Testing:**
|
|
1. Remove env var from docker-compose.yml
|
|
2. Start container (CrowdSec should NOT auto-start)
|
|
3. Toggle CrowdSec in GUI (should start LAPI)
|
|
4. Verify `cscli lapi status` shows running
|
|
5. Toggle off (should stop LAPI)
|
|
|
|
---
|
|
|
|
### Fix 2: Add LAPI Availability Check Before Enrollment (CRITICAL PRIORITY)
|
|
|
|
### Fix 2: Add LAPI Availability Check Before Enrollment (CRITICAL PRIORITY)
|
|
|
|
**Problem:** Enrollment command succeeds even when LAPI is down
|
|
**Solution:** Verify LAPI connectivity before allowing enrollment
|
|
**Time:** 30 minutes
|
|
**Files affected:** `backend/internal/crowdsec/console_enroll.go`
|
|
|
|
**Implementation:**
|
|
|
|
Add LAPI health check before enrollment:
|
|
|
|
```go
|
|
func (s *ConsoleEnrollmentService) checkLAPIAvailable(ctx context.Context) error {
|
|
args := []string{"lapi", "status"}
|
|
if _, err := os.Stat(filepath.Join(s.dataDir, "config.yaml")); err == nil {
|
|
args = append([]string{"-c", filepath.Join(s.dataDir, "config.yaml")}, args...)
|
|
}
|
|
_, err := s.exec.ExecuteWithEnv(ctx, "cscli", args, nil)
|
|
if err != nil {
|
|
return fmt.Errorf("CrowdSec Local API is not running - please enable CrowdSec via the GUI toggle first")
|
|
}
|
|
return nil
|
|
}
|
|
```
|
|
|
|
Update `Enroll()` method:
|
|
```go
|
|
// Before: if err := s.ensureCAPIRegistered(ctx); err != nil {
|
|
if err := s.checkLAPIAvailable(ctx); err != nil {
|
|
return ConsoleEnrollmentStatus{}, err
|
|
}
|
|
if err := s.ensureCAPIRegistered(ctx); err != nil {
|
|
return ConsoleEnrollmentStatus{}, err
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
### Fix 3: Add UI Warning When CrowdSec is Disabled (HIGH PRIORITY)
|
|
|
|
**Problem:** Users can attempt enrollment when CrowdSec is disabled
|
|
**Solution:** Add status check to enrollment UI with clear instructions
|
|
**Time:** 20 minutes
|
|
**Files affected:** `frontend/src/pages/CrowdSecConfig.tsx`
|
|
|
|
**Implementation:**
|
|
|
|
Add LAPI status detection to enrollment form:
|
|
|
|
```typescript
|
|
const crowdsecStatusQuery = useQuery({
|
|
queryKey: ['crowdsec-status'],
|
|
queryFn: async () => {
|
|
const response = await client.get('/api/v1/admin/crowdsec/status');
|
|
return response.data;
|
|
},
|
|
enabled: consoleEnrollmentEnabled,
|
|
refetchInterval: 5000, // Poll every 5 seconds
|
|
});
|
|
|
|
// In enrollment form JSX:
|
|
{!crowdsecStatusQuery.data?.running && (
|
|
<Alert variant="warning">
|
|
<AlertTriangle className="w-4 h-4" />
|
|
<span>
|
|
CrowdSec Local API is not running. Please enable CrowdSec using the toggle switch
|
|
in the Security dashboard before enrolling in the Console.
|
|
</span>
|
|
<Button
|
|
variant="link"
|
|
onClick={() => navigate('/security')}
|
|
>
|
|
Go to Security Dashboard
|
|
</Button>
|
|
</Alert>
|
|
)}
|
|
|
|
<Button
|
|
disabled={!crowdsecStatusQuery.data?.running || !enrollmentToken}
|
|
onClick={handleEnroll}
|
|
>
|
|
Enroll Instance
|
|
</Button>
|
|
```
|
|
|
|
---
|
|
|
|
### Fix 4: Update Documentation (HIGH PRIORITY)
|
|
|
|
**Problem:** Documentation mentions environment variables for CrowdSec control
|
|
**Solution:** Update docs to reflect GUI-only control, mark env vars as deprecated
|
|
**Time:** 30 minutes
|
|
**Files affected:**
|
|
- `docs/security.md`
|
|
- `docs/cerberus.md`
|
|
- `docs/troubleshooting/crowdsec.md`
|
|
- `README.md`
|
|
|
|
**Changes Needed:**
|
|
|
|
1. **Mark Environment Variables as Deprecated:**
|
|
```md
|
|
⚠️ **DEPRECATED:** `CHARON_SECURITY_CROWDSEC_MODE` environment variable is no longer used.
|
|
CrowdSec is now controlled via the GUI in the Security dashboard.
|
|
```
|
|
|
|
2. **Add GUI Control Instructions:**
|
|
```md
|
|
## Enabling CrowdSec
|
|
|
|
1. Navigate to **Security** dashboard
|
|
2. Toggle the **CrowdSec** switch to **ON**
|
|
3. The backend will start the CrowdSec agent and Local API (LAPI)
|
|
4. Verify status shows "Active" with a running PID
|
|
|
|
**Note:** CrowdSec is internally managed by Charon. No external setup required.
|
|
```
|
|
|
|
3. **Update Console Enrollment Prerequisites:**
|
|
```md
|
|
## Console Enrollment Prerequisites
|
|
|
|
Before enrolling your Charon instance with CrowdSec Console:
|
|
|
|
1. ✅ CrowdSec must be **enabled** in the GUI (toggle switch ON)
|
|
2. ✅ Local API (LAPI) must be **running** (check status)
|
|
3. ✅ Feature flag `feature.crowdsec.console_enrollment` must be enabled
|
|
4. ✅ Valid enrollment token from crowdsec.net
|
|
|
|
**Troubleshooting:** If enrollment fails, verify LAPI is running:
|
|
```bash
|
|
docker exec charon cscli lapi status
|
|
```
|
|
```
|
|
|
|
---
|
|
|
|
### Fix 5: Add Migration Guide for Existing Users (MEDIUM PRIORITY)
|
|
|
|
**Problem:** Users may have env vars set that will no longer work
|
|
**Solution:** Add migration guide to help users transition
|
|
**Time:** 15 minutes
|
|
**Files affected:** `docs/migration-guide.md` (new file)
|
|
|
|
**Content:**
|
|
|
|
```md
|
|
# CrowdSec Control Migration Guide
|
|
|
|
## What Changed
|
|
|
|
**Before (v1.x):** CrowdSec was controlled by environment variables:
|
|
```yaml
|
|
environment:
|
|
- CHARON_SECURITY_CROWDSEC_MODE=local
|
|
```
|
|
|
|
**After (v2.x):** CrowdSec is controlled via GUI toggle in Security dashboard.
|
|
|
|
## Migration Steps
|
|
|
|
### Step 1: Remove Environment Variable
|
|
|
|
Edit your `docker-compose.yml` and remove:
|
|
```yaml
|
|
# REMOVE THIS LINE:
|
|
- CHARON_SECURITY_CROWDSEC_MODE=local
|
|
```
|
|
|
|
### Step 2: Restart Container
|
|
|
|
```bash
|
|
docker compose down
|
|
docker compose up -d
|
|
```
|
|
|
|
### Step 3: Enable via GUI
|
|
|
|
1. Open Charon UI → **Security** dashboard
|
|
2. Toggle **CrowdSec** switch to **ON**
|
|
3. Verify status shows "Active"
|
|
|
|
### Step 4: Re-enroll Console (If Applicable)
|
|
|
|
If you were enrolled in CrowdSec Console before:
|
|
1. Your enrollment is preserved in the database
|
|
2. No action needed unless enrollment was incomplete
|
|
|
|
## Benefits of GUI Control
|
|
|
|
- ✅ No need to restart container to enable/disable
|
|
- ✅ Status visible in real-time
|
|
- ✅ Consistent with WAF, ACL, and Rate Limiting controls
|
|
- ✅ Better integration with Charon's security orchestration
|
|
|
|
## Troubleshooting
|
|
|
|
**Q: CrowdSec won't start after toggling?**
|
|
- Check logs: `docker logs charon`
|
|
- Verify config exists: `docker exec charon ls -la /app/data/crowdsec/config`
|
|
|
|
**Q: Console enrollment fails?**
|
|
- Verify LAPI is running: `docker exec charon cscli lapi status`
|
|
- Check enrollment prerequisites in [docs/security.md](security.md)
|
|
```
|
|
|
|
---
|
|
|
|
### Fix 6: Add Integration Test (MEDIUM PRIORITY)
|
|
|
|
### Fix 6: Add Integration Test (MEDIUM PRIORITY)
|
|
|
|
**Problem:** No test coverage for enrollment prerequisites
|
|
**Solution:** Add test that verifies LAPI requirement and GUI lifecycle
|
|
**Time:** 30 minutes
|
|
**Files affected:**
|
|
- `backend/internal/crowdsec/console_enroll_test.go`
|
|
- `scripts/crowdsec_lifecycle_test.sh` (new file)
|
|
|
|
**Implementation:**
|
|
|
|
**Unit Test:**
|
|
```go
|
|
func TestEnroll_RequiresLAPI(t *testing.T) {
|
|
exec := &mockExecutor{
|
|
responses: []cmdResponse{
|
|
{out: nil, err: nil}, // capi register success
|
|
{out: nil, err: errors.New("connection refused")}, // lapi status fails
|
|
},
|
|
}
|
|
svc := NewConsoleEnrollmentService(db, exec, tempDir, "secret")
|
|
|
|
_, err := svc.Enroll(ctx, ConsoleEnrollRequest{
|
|
EnrollmentKey: "test123token",
|
|
AgentName: "agent",
|
|
})
|
|
|
|
require.Error(t, err)
|
|
require.Contains(t, err.Error(), "Local API is not running")
|
|
}
|
|
```
|
|
|
|
**Integration Test Script:**
|
|
```bash
|
|
#!/bin/bash
|
|
# scripts/crowdsec_lifecycle_test.sh
|
|
# Tests GUI-controlled CrowdSec lifecycle
|
|
|
|
echo "Testing CrowdSec GUI-controlled lifecycle..."
|
|
|
|
# 1. Start Charon without env var
|
|
docker compose up -d
|
|
sleep 5
|
|
|
|
# 2. Verify CrowdSec NOT running by default
|
|
docker exec charon cscli lapi status 2>&1 | grep "connection refused"
|
|
echo "✓ CrowdSec not auto-started without env var"
|
|
|
|
# 3. Enable via GUI toggle
|
|
curl -X POST -H "Content-Type: application/json" \
|
|
-b cookies.txt \
|
|
-d '{"key": "security.crowdsec.enabled", "value": "true", "category": "security", "type": "bool"}' \
|
|
http://localhost:8080/api/v1/admin/settings
|
|
|
|
# 4. Call start endpoint (mimics GUI toggle)
|
|
curl -X POST -b cookies.txt \
|
|
http://localhost:8080/api/v1/admin/crowdsec/start
|
|
|
|
sleep 10
|
|
|
|
# 5. Verify LAPI running
|
|
docker exec charon cscli lapi status | grep "successfully interact"
|
|
echo "✓ LAPI started via GUI toggle"
|
|
|
|
# 6. Disable via GUI
|
|
curl -X POST -b cookies.txt \
|
|
http://localhost:8080/api/v1/admin/crowdsec/stop
|
|
|
|
sleep 5
|
|
|
|
# 7. Verify LAPI stopped
|
|
docker exec charon cscli lapi status 2>&1 | grep "connection refused"
|
|
echo "✓ LAPI stopped via GUI toggle"
|
|
|
|
echo "✅ All GUI lifecycle tests passed"
|
|
```
|
|
|
|
---
|
|
|
|
## Summary of Architectural Changes
|
|
|
|
### What's Broken Now (Environment Variable Control)
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ docker-compose │
|
|
│ env: MODE= │ ← Environment variable set here
|
|
│ disabled │
|
|
└────────┬────────┘
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ entrypoint.sh │
|
|
│ if MODE=local │ ← Checks env var, doesn't start LAPI
|
|
│ start crowdsec│
|
|
└─────────────────┘
|
|
│
|
|
v
|
|
❌ LAPI never starts
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ GUI Toggle │
|
|
│ "CrowdSec: ON" │ ← User thinks it's enabled
|
|
└─────────────────┘
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ Enroll Console │ ← Fails silently (LAPI not running)
|
|
└─────────────────┘
|
|
```
|
|
|
|
### What Should Happen (GUI Control)
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ docker-compose │
|
|
│ (no env var) │ ← No environment variable needed
|
|
└────────┬────────┘
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ entrypoint.sh │
|
|
│ Init CrowdSec │ ← Setup config only, don't start agent
|
|
│ (config only) │
|
|
└─────────────────┘
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ GUI Toggle │
|
|
│ "CrowdSec: ON" │ ← User enables via GUI
|
|
└────────┬────────┘
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ POST /crowdsec/ │
|
|
│ /start │ ← Frontend calls backend handler
|
|
└────────┬────────┘
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ Backend Handler │
|
|
│ Start LAPI │ ← Backend starts the agent
|
|
│ (PID tracked) │
|
|
└────────┬────────┘
|
|
│
|
|
v
|
|
✅ LAPI running
|
|
│
|
|
v
|
|
┌─────────────────┐
|
|
│ Enroll Console │ ← Works! LAPI available
|
|
└─────────────────┘
|
|
```
|
|
|
|
### Pattern Consistency Across Security Features
|
|
|
|
| Feature | Control Method | Status Endpoint | Lifecycle Handler |
|
|
|---------|---------------|-----------------|-------------------|
|
|
| **Cerberus** | GUI Toggle | `/security/status` | N/A (master switch) |
|
|
| **WAF** | GUI Toggle | `/security/status` | Config regeneration |
|
|
| **ACL** | GUI Toggle | `/security/status` | Config regeneration |
|
|
| **Rate Limit** | GUI Toggle | `/security/status` | Config regeneration |
|
|
| **CrowdSec** (OLD) | ❌ Env Var | `/security/status` | ❌ Entrypoint script |
|
|
| **CrowdSec** (NEW) | ✅ GUI Toggle | `/security/status` | ✅ Start/Stop handlers |
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Manual Testing (For User - Workaround)
|
|
|
|
1. **Set Environment Variable (Temporary)**
|
|
```bash
|
|
# docker-compose.override.yml
|
|
environment:
|
|
- CHARON_SECURITY_CROWDSEC_MODE=local
|
|
```
|
|
|
|
2. **Restart Container**
|
|
```bash
|
|
docker compose down && docker compose up -d
|
|
```
|
|
|
|
3. **Verify LAPI Running**
|
|
```bash
|
|
docker exec charon cscli lapi status
|
|
# Should show: "You can successfully interact with Local API (LAPI)"
|
|
```
|
|
|
|
4. **Test Enrollment**
|
|
- Submit enrollment token via Charon UI
|
|
- Check crowdsec.net dashboard after 60 seconds
|
|
- Instance should appear
|
|
|
|
### Automated Testing (For Developers - After Fix)
|
|
|
|
1. **Unit Test:** LAPI availability check before enrollment
|
|
2. **Integration Test:** GUI-controlled CrowdSec lifecycle (start/stop)
|
|
3. **End-to-End Test:** Full enrollment flow with GUI toggle
|
|
4. **Regression Test:** Verify env var no longer affects behavior
|
|
|
|
### Post-Fix Validation
|
|
|
|
1. **Remove Environment Variable**
|
|
```bash
|
|
# Ensure CHARON_SECURITY_CROWDSEC_MODE is NOT set
|
|
```
|
|
|
|
2. **Start Container**
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
3. **Verify CrowdSec NOT Running**
|
|
```bash
|
|
docker exec charon cscli lapi status
|
|
# Should show: "connection refused"
|
|
```
|
|
|
|
4. **Enable via GUI**
|
|
- Toggle CrowdSec switch in Security dashboard
|
|
- Wait 10 seconds
|
|
|
|
5. **Verify LAPI Started**
|
|
```bash
|
|
docker exec charon cscli lapi status
|
|
# Should show: "successfully interact"
|
|
```
|
|
|
|
6. **Test Console Enrollment**
|
|
- Submit enrollment token
|
|
- Verify appears on crowdsec.net
|
|
|
|
7. **Disable via GUI**
|
|
- Toggle CrowdSec switch off
|
|
- Wait 5 seconds
|
|
|
|
8. **Verify LAPI Stopped**
|
|
```bash
|
|
docker exec charon cscli lapi status
|
|
# Should show: "connection refused"
|
|
```
|
|
|
|
---
|
|
|
|
## Files Requiring Changes
|
|
|
|
### Backend (Go)
|
|
1. ✅ `docker-entrypoint.sh` - Remove env var check, initialize config only
|
|
2. ✅ `backend/internal/crowdsec/console_enroll.go` - Add LAPI availability check
|
|
3. ⚠️ `backend/internal/api/handlers/crowdsec_handler.go` - Already has Start/Stop (verify works)
|
|
|
|
### Frontend (TypeScript)
|
|
1. ✅ `frontend/src/pages/CrowdSecConfig.tsx` - Add LAPI status warning
|
|
2. ⚠️ `frontend/src/pages/Security.tsx` - Already calls start/stop (verify integration)
|
|
|
|
### Documentation
|
|
1. ✅ `docs/security.md` - Remove env var instructions, add GUI instructions
|
|
2. ✅ `docs/cerberus.md` - Mark env vars deprecated
|
|
3. ✅ `docs/troubleshooting/crowdsec.md` - Update enrollment prerequisites
|
|
4. ✅ `README.md` - Update quick start to use GUI only
|
|
5. ✅ `docs/migration-guide.md` - New file for v1.x → v2.x migration
|
|
6. ✅ `docker-compose.yml` - Comment out deprecated env var
|
|
|
|
### Testing
|
|
1. ✅ `backend/internal/crowdsec/console_enroll_test.go` - Add LAPI requirement test
|
|
2. ✅ `scripts/crowdsec_lifecycle_test.sh` - New integration test for GUI control
|
|
|
|
### Configuration (Already Correct)
|
|
1. ⚠️ `backend/internal/models/security_config.go` - CrowdSecMode field exists (DB)
|
|
2. ⚠️ `backend/internal/api/handlers/security_handler.go` - Already reads from DB
|
|
3. ⚠️ `frontend/src/api/crowdsec.ts` - Start/stop API calls already exist
|
|
|
|
---
|
|
|
|
## Risk Assessment
|
|
|
|
### Low Risk Changes
|
|
- ✅ Documentation updates
|
|
- ✅ Frontend UI warnings
|
|
- ✅ Backend LAPI availability check
|
|
|
|
### Medium Risk Changes
|
|
- ⚠️ Removing env var logic from entrypoint (requires thorough testing)
|
|
- ⚠️ Integration test for GUI lifecycle
|
|
|
|
### High Risk Areas (Existing Functionality - Verify)
|
|
- ⚠️ Backend Start/Stop handlers (already exist, need to verify)
|
|
- ⚠️ Frontend toggle integration (already exists, need to verify)
|
|
- ⚠️ CrowdSec config persistence across restarts
|
|
|
|
### Migration Considerations
|
|
- Users with `CHARON_SECURITY_CROWDSEC_MODE=local` set will need to:
|
|
1. Remove environment variable
|
|
2. Enable via GUI toggle
|
|
3. Re-verify enrollment if applicable
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
If the architectural changes cause issues:
|
|
|
|
1. **Immediate Rollback:** Add env var check back to `docker-entrypoint.sh`
|
|
2. **Document Workaround:** Continue using env var for CrowdSec control
|
|
3. **Defer Fix:** Mark as "known limitation" in docs until proper fix validated
|
|
|
|
---
|
|
|
|
## Files Inspected During Investigation
|
|
|
|
### Configuration ✅
|
|
- `docker-compose.yml` - Volume mounts correct
|
|
- `docker-entrypoint.sh` - Conditional CrowdSec startup logic
|
|
- `Dockerfile` - CrowdSec installed correctly
|
|
|
|
### Backend ✅
|
|
- `backend/internal/crowdsec/console_enroll.go` - Enrollment flow logic
|
|
- `backend/internal/models/crowdsec_console_enrollment.go` - Database model
|
|
- `backend/internal/api/handlers/crowdsec_handler.go` - API endpoint
|
|
|
|
### Runtime Verification ✅
|
|
- `/etc/crowdsec` → `/app/data/crowdsec/config` (symlink correct)
|
|
- `/app/data/crowdsec/config/online_api_credentials.yaml` exists (CAPI registered)
|
|
- `/app/data/crowdsec/config/console.yaml` exists
|
|
- `ps aux` shows NO crowdsec processes (LAPI not running)
|
|
- Environment: `CHARON_SECURITY_CROWDSEC_MODE=disabled`
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
**Root Cause (Updated with Architectural Analysis):** Console enrollment fails because of **architectural technical debt** - the legacy environment variable `CHARON_SECURITY_CROWDSEC_MODE` still controls LAPI startup in `docker-entrypoint.sh`, bypassing the GUI control system that users expect.
|
|
|
|
**The Real Problem:** This is NOT a user configuration issue. It's a **code architecture issue** where:
|
|
1. CrowdSec control was never fully migrated to GUI-based management
|
|
2. The entrypoint script still checks deprecated environment variables
|
|
3. Backend handlers (`Start()`/`Stop()`) exist but aren't properly integrated with container startup
|
|
4. Users are misled into thinking the GUI toggle actually controls CrowdSec
|
|
|
|
**Immediate Fix (User Workaround):** Set `CHARON_SECURITY_CROWDSEC_MODE=local` environment variable to match GUI state.
|
|
|
|
**Proper Fix (Development Required):**
|
|
1. **CRITICAL:** Remove environment variable dependency from `docker-entrypoint.sh`
|
|
2. **CRITICAL:** Ensure backend handlers control CrowdSec lifecycle (GUI → API → Process)
|
|
3. **HIGH:** Add LAPI availability check before enrollment (prevents silent failures)
|
|
4. **HIGH:** Add UI warnings when LAPI is not running (improves UX)
|
|
5. **HIGH:** Update documentation to reflect GUI-only control
|
|
6. **MEDIUM:** Add migration guide for users transitioning from env var control
|
|
7. **MEDIUM:** Add integration tests for GUI-controlled lifecycle
|
|
|
|
**Pattern to Follow:** CrowdSec should work like WAF, ACL, and Rate Limiting - all controlled through Settings table, no environment variable dependency.
|
|
|
|
**Token Reusability:** Confirmed REUSABLE - no need to generate new tokens after fixing LAPI availability.
|
|
|
|
**Impact:** This architectural issue affects ALL users trying to use Console enrollment, not just the reporter. The fix will benefit the entire user base by providing consistent, GUI-based security feature management.
|