- Implemented `diagnose-crowdsec.sh` script for checking CrowdSec connectivity and configuration. - Added E2E tests for CrowdSec console enrollment, including API checks for enrollment status, diagnostics connectivity, and configuration validation. - Created E2E tests for CrowdSec diagnostics, covering configuration file validation, connectivity checks, and configuration export.
56 KiB
CrowdSec Enrollment & Console Connectivity Debug Plan
Issue Reference: #586 Problem: CrowdSec engine showing as offline since 12/19/25 in web console Date Created: 2026-02-03 Status: Research Complete - Ready for Implementation
Executive Summary
This document provides a comprehensive debugging and testing strategy for diagnosing and resolving CrowdSec console enrollment and connectivity issues. The issue manifests as the CrowdSec engine appearing offline in the crowdsec.net web console despite being enrolled locally.
Key Findings from Research
Architecture Components Identified:
- CrowdSec Handler (
backend/internal/api/handlers/crowdsec_handler.go) - Manages lifecycle, enrollment, status - Console Enrollment Service (
backend/internal/crowdsec/console_enroll.go) - Handles enrollment with retry logic - Startup Service (
backend/internal/services/crowdsec_startup.go) - Auto-starts CrowdSec on container boot - Docker Entrypoint (
.docker/docker-entrypoint.sh) - Initializes CrowdSec configuration - Database Model (
backend/internal/models/crowdsec_console_enrollment.go) - Stores enrollment state - LAPI - Runs on port 8085, health checks via
cscli lapi status - Feature Flag -
feature.crowdsec.console_enrollmentcontrols console enrollment UI visibility
Current Test Coverage:
- ✅ Integration tests for CrowdSec decisions (
backend/integration/crowdsec_decisions_integration_test.go) - ✅ Integration tests for CrowdSec startup (
backend/integration/crowdsec_integration_test.go) - ✅ E2E tests for CrowdSec configuration page (
tests/security/crowdsec-config.spec.ts) - ✅ Unit tests for startup service (
backend/internal/services/crowdsec_startup_test.go) - ❌ No E2E tests for console enrollment
- ❌ No integration tests for LAPI heartbeat/connectivity
- ❌ No tests for enrollment token validation
- ❌ No tests for console status polling
Problem Analysis
Symptom: Engine Offline in Console
When CrowdSec shows as "offline" in the crowdsec.net console, it indicates one or more of the following:
- LAPI Not Running - The Local API process is not active
- Enrollment Not Completed - Token accepted locally but not on crowdsec.net
- Heartbeat Failure - LAPI running but not sending heartbeats to console
- Network Connectivity - Container cannot reach crowdsec.net APIs
- Token Expiry - Enrollment token expired or revoked
- CAPI Not Registered - Central API credentials missing or invalid
- Config Corruption - Missing or corrupt
online_api_credentials.yaml
Known Failure Points
1. Enrollment Token Handling
Location: backend/internal/crowdsec/console_enroll.go:124-149
Current Implementation:
token, err := normalizeEnrollmentKey(req.EnrollmentKey)
if err != nil {
return ConsoleEnrollmentStatus{}, err
}
Issues:
- Token validation only checks format (alphanumeric, 10-64 chars)
- No check for token expiry before attempting enrollment
- Token is encrypted and stored but never re-validated
- No explicit error for expired tokens (generic failure message)
Impact: Users may submit valid-format tokens that are already expired, leading to silent enrollment failures.
2. LAPI Connectivity
Location: backend/internal/crowdsec/console_enroll.go:218-246
Current Implementation:
func (s *ConsoleEnrollmentService) checkLAPIAvailable(ctx context.Context) error {
maxRetries := 3
retryDelay := 2 * time.Second
for i := 0; i < maxRetries; i++ {
args := []string{"lapi", "status"}
// ... execute cscli command
if err == nil {
return nil // LAPI is available
}
time.Sleep(retryDelay)
}
return fmt.Errorf("CrowdSec Local API is not running after %d attempts", maxRetries)
}
Issues:
- Only 3 retries with 2-second delays (6 seconds total)
- LAPI initialization can take 10-15 seconds on slow hardware
- No exponential backoff (fixed 2s delay)
- Timeout per attempt is 3 seconds (may be insufficient for cold start)
- No check for LAPI process vs LAPI readiness
Impact: LAPI may still be initializing when enrollment check fails, causing false negatives.
3. Network Configuration
Location: .docker/docker-entrypoint.sh:212-219, docker-compose.yml:47-49
Current Implementation:
services:
charon:
image: wikid82/charon:latest
container_name: charon
ports:
- "80:80"
- "443:443"
- "8080:8080"
# LAPI port 8085 NOT exposed by default
Issues:
- LAPI listens on
127.0.0.1:8085(localhost only) - No port mapping for LAPI (intentional for security)
- Console heartbeats must originate from within container
- Network mode defaults to bridge (may block outbound HTTPS to crowdsec.net)
- No explicit DNS resolution configuration
Impact: Enrollment may succeed locally but heartbeats fail due to network restrictions.
4. CAPI Registration
Location: backend/internal/crowdsec/console_enroll.go:248-267
Current Implementation:
func (s *ConsoleEnrollmentService) ensureCAPIRegistered(ctx context.Context) error {
credsPath := filepath.Join(s.dataDir, "config", "online_api_credentials.yaml")
if _, err := os.Stat(credsPath); err == nil {
return nil // Assume registered if file exists
}
// Register with CAPI
args := []string{"capi", "register"}
out, err := s.exec.ExecuteWithEnv(ctx, "cscli", args, nil)
if err != nil {
return fmt.Errorf("capi register: %s: %w", string(out), err)
}
return nil
}
Issues:
- Only checks file existence, not validity
- No validation of credentials format
- No retry logic for CAPI registration failures
- No check for CAPI connectivity before registration
- Credentials may be corrupt or revoked
Impact: CAPI registration failures block console enrollment but error is generic.
5. Config File Management
Location: .docker/docker-entrypoint.sh:117-156
Current Implementation:
# Initialize CrowdSec configuration
if [ ! -f "$CS_CONFIG_DIR/config.yaml" ]; then
echo "Initializing persistent CrowdSec configuration..."
if [ -d "/etc/crowdsec.dist" ]; then
cp -r /etc/crowdsec.dist/* "$CS_CONFIG_DIR/"
fi
fi
# Configure LAPI port
sed -i 's|listen_uri: 127.0.0.1:8080|listen_uri: 127.0.0.1:8085|g' /etc/crowdsec/config.yaml
Issues:
- Config only initialized on first run (not validated on restarts)
- Port replacement uses
sed(brittle if config format changes) - No validation of
config.yamlsyntax - No validation of
acquis.yaml(required for datasources) online_api_credentials.yamlassumed valid if present
Impact: Config corruption after first run is not detected until CrowdSec fails to start.
6. Enrollment Status Management
Location: backend/internal/models/crowdsec_console_enrollment.go:7-21
Current Implementation:
type CrowdsecConsoleEnrollment struct {
UUID string `json:"uuid"`
Status string `json:"status"` // not_enrolled, enrolling, pending_acceptance, enrolled, failed
Tenant string `json:"tenant"`
AgentName string `json:"agent_name"`
EncryptedEnrollKey string `json:"-"`
LastError string `json:"last_error"`
LastAttemptAt *time.Time `json:"last_attempt_at"`
EnrolledAt *time.Time `json:"enrolled_at"`
LastHeartbeatAt *time.Time `json:"last_heartbeat_at"` // NOT USED
// ...
}
Issues:
LastHeartbeatAtfield exists but is never updated- No automatic polling of console status
- Status is set to
pending_acceptanceafter local enrollment - No mechanism to detect when user accepts enrollment on crowdsec.net
- No mechanism to detect when engine goes offline
- Status remains
pending_acceptanceindefinitely unless user manually checks
Impact: Database shows "pending" but user has already accepted on console (status never updates).
7. Health Check Limitations
Location: backend/internal/api/handlers/crowdsec_handler.go:327-367
Current Implementation:
func (h *CrowdsecHandler) Status(c *gin.Context) {
running, pid, err := h.Executor.Status(ctx, h.DataDir)
lapiReady := false
if running {
args := []string{"lapi", "status"}
_, checkErr := h.CmdExec.Execute(checkCtx, "cscli", args...)
lapiReady = (checkErr == nil)
}
c.JSON(http.StatusOK, gin.H{
"running": running,
"pid": pid,
"lapi_ready": lapiReady,
})
}
Issues:
- Only checks if LAPI responds to
cscli lapi status - Does not check if LAPI can reach crowdsec.net
- Does not verify CAPI credentials are valid
- Does not check if console enrollment is active
- No check for heartbeat status
Impact: Status endpoint shows "running" but doesn't detect console connectivity issues.
Root Cause Analysis Framework
Diagnostic Decision Tree
CrowdSec shows as offline in console
│
├─> Is CrowdSec process running?
│ └─> NO → Check startup logs, fix daemon start
│ └─> YES ↓
│
├─> Is LAPI responding (`cscli lapi status`)?
│ └─> NO → Check LAPI initialization logs, increase startup wait time
│ └─> YES ↓
│
├─> Is CAPI registered (`online_api_credentials.yaml` exists)?
│ └─> NO → Run `cscli capi register`, check network connectivity
│ └─> YES ↓
│
├─> Is console enrolled (`cscli console status` shows enrolled)?
│ └─> NO → Check enrollment token validity, re-enroll
│ └─> YES ↓
│
├─> Can container reach crowdsec.net?
│ └─> NO → Check DNS, firewall, proxy settings
│ └─> YES ↓
│
├─> Are heartbeats being sent?
│ └─> NO → Check LAPI logs for heartbeat failures
│ └─> YES ↓
│
└─> Console shows agent offline?
└─> Check crowdsec.net for enrollment acceptance
└─> Check for token expiry/revocation
└─> Contact CrowdSec support
Investigation Checklist
Phase 1: Local Process Verification
- Verify CrowdSec process is running:
docker exec charon ps aux | grep crowdsec - Check LAPI is listening:
docker exec charon ss -tlnp | grep 8085 - Verify LAPI responds:
docker exec charon cscli lapi status - Check CrowdSec version:
docker exec charon cscli version - Review CrowdSec logs:
docker exec charon tail -100 /var/log/crowdsec/crowdsec.log
Phase 2: Configuration Validation
- Verify
config.yamlexists:docker exec charon test -f /etc/crowdsec/config.yaml - Validate
config.yamlsyntax:docker exec charon cscli config check - Check LAPI port:
docker exec charon grep listen_uri /etc/crowdsec/config.yaml - Verify
acquis.yamlexists:docker exec charon test -f /etc/crowdsec/acquis.yaml - Check datasource config:
docker exec charon cat /etc/crowdsec/acquis.yaml - Validate CAPI credentials:
docker exec charon test -f /etc/crowdsec/config/online_api_credentials.yaml - Check machines list:
docker exec charon cscli machines list
Phase 3: Enrollment State Verification
- Check enrollment status in DB:
docker exec charon sqlite3 /app/data/charon.db "SELECT * FROM crowdsec_console_enrollments;" - Verify console enrollment status:
docker exec charon cscli console status - Check feature flag: Query
/api/v1/settings?key=feature.crowdsec.console_enrollment - Review enrollment logs in Charon: Search logs for "crowdsec console enrollment"
- Check last enrollment attempt timestamp in DB
Phase 4: Network Connectivity
- Test DNS resolution:
docker exec charon nslookup crowdsec.net - Test HTTPS connectivity:
docker exec charon curl -I https://api.crowdsec.net/health - Check container network mode:
docker inspect charon | grep NetworkMode - Verify outbound firewall rules: Check host firewall, corporate proxy
- Test CAPI connectivity:
docker exec charon cscli capi status
Phase 5: Console Status
- Check crowdsec.net console for agent enrollment
- Verify enrollment was accepted on crowdsec.net
- Check agent last seen timestamp in console
- Review console activity logs for heartbeat failures
- Verify enrollment token hasn't expired
Testing Strategy
Phase 1: Unit Tests for Enrollment Logic
Objective: Validate enrollment service behavior in isolation
Test File: backend/internal/crowdsec/console_enroll_test.go (NEW)
Test Cases:
1.1 Token Validation
func TestConsoleEnrollmentService_TokenValidation(t *testing.T) {
tests := []struct {
name string
token string
wantErr bool
errMsg string
}{
{
name: "Valid token",
token: "abc123xyz789",
wantErr: false,
},
{
name: "Token too short",
token: "abc123",
wantErr: true,
errMsg: "invalid enrollment key",
},
{
name: "Token with special chars",
token: "abc-123_xyz",
wantErr: true,
errMsg: "invalid enrollment key",
},
{
name: "Empty token",
token: "",
wantErr: true,
errMsg: "enrollment_key required",
},
{
name: "Token from cscli command",
token: "sudo cscli console enroll abc123xyz789",
wantErr: false, // Should extract token
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Test normalizeEnrollmentKey function
token, err := normalizeEnrollmentKey(tt.token)
if tt.wantErr {
assert.Error(t, err)
assert.Contains(t, err.Error(), tt.errMsg)
} else {
assert.NoError(t, err)
assert.NotEmpty(t, token)
}
})
}
}
1.2 LAPI Availability Check with Retries
func TestConsoleEnrollmentService_CheckLAPIAvailable(t *testing.T) {
tests := []struct {
name string
execResponses []execResponse // Mock responses for each retry
wantErr bool
errMsg string
}{
{
name: "LAPI available on first try",
execResponses: []execResponse{
{output: "", err: nil},
},
wantErr: false,
},
{
name: "LAPI available on second try",
execResponses: []execResponse{
{output: "connection refused", err: fmt.Errorf("exit status 1")},
{output: "", err: nil},
},
wantErr: false,
},
{
name: "LAPI never becomes available",
execResponses: []execResponse{
{output: "connection refused", err: fmt.Errorf("exit status 1")},
{output: "connection refused", err: fmt.Errorf("exit status 1")},
{output: "connection refused", err: fmt.Errorf("exit status 1")},
},
wantErr: true,
errMsg: "Local API is not running after 3 attempts",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Create mock executor that returns responses in sequence
mockExec := &mockEnvCommandExecutor{
responses: tt.execResponses,
}
svc := &ConsoleEnrollmentService{
exec: mockExec,
dataDir: "/tmp/test",
}
err := svc.checkLAPIAvailable(context.Background())
if tt.wantErr {
assert.Error(t, err)
assert.Contains(t, err.Error(), tt.errMsg)
} else {
assert.NoError(t, err)
}
})
}
}
1.3 CAPI Registration
func TestConsoleEnrollmentService_EnsureCAPIRegistered(t *testing.T) {
tests := []struct {
name string
credsExist bool
registerErr error
wantErr bool
wantRegister bool
}{
{
name: "CAPI already registered",
credsExist: true,
wantErr: false,
wantRegister: false,
},
{
name: "CAPI not registered, success",
credsExist: false,
registerErr: nil,
wantErr: false,
wantRegister: true,
},
{
name: "CAPI registration fails",
credsExist: false,
registerErr: fmt.Errorf("network error"),
wantErr: true,
wantRegister: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Test CAPI registration logic
})
}
}
1.4 Enrollment Status Transitions
func TestConsoleEnrollmentService_StatusTransitions(t *testing.T) {
// Test: not_enrolled → enrolling → pending_acceptance
// Test: pending_acceptance → enrolled (manual transition)
// Test: enrolling → failed (on error)
// Test: failed → enrolling (retry)
}
1.5 Concurrent Enrollment Prevention
func TestConsoleEnrollmentService_ConcurrentEnrollment(t *testing.T) {
// Test: Multiple simultaneous enrollment attempts should be blocked
// Test: Mutex prevents race conditions
}
1.6 Token Encryption/Decryption
func TestConsoleEnrollmentService_TokenEncryption(t *testing.T) {
// Test: Token is encrypted before storage
// Test: Decryption works correctly
// Test: Encryption key derivation
}
Phase 2: Integration Tests for LAPI Connectivity
Objective: Verify LAPI health checks and connectivity in real environment
Test File: backend/integration/crowdsec_lapi_integration_test.go (NEW)
Test Cases:
2.1 LAPI Startup and Health
//go:build integration
func TestCrowdSecLAPIStartup(t *testing.T) {
// 1. Start CrowdSec via API: POST /api/v1/admin/crowdsec/start
// 2. Wait for LAPI to initialize (up to 30s)
// 3. Verify: GET /api/v1/admin/crowdsec/status returns lapi_ready: true
// 4. Verify: docker exec cscli lapi status returns 0
// 5. Verify: LAPI health endpoint responds: curl http://localhost:8085/health
}
2.2 LAPI Readiness After Restart
//go:build integration
func TestCrowdSecLAPIRestartPersistence(t *testing.T) {
// 1. Enroll CrowdSec
// 2. Stop CrowdSec
// 3. Restart container
// 4. Verify LAPI comes back online
// 5. Verify enrollment status persists
}
2.3 LAPI Port Configuration
//go:build integration
func TestCrowdSecLAPIPortConfiguration(t *testing.T) {
// 1. Verify LAPI listens on 8085 (not 8080)
// 2. Verify Charon can reach LAPI at 127.0.0.1:8085
// 3. Verify LAPI is NOT exposed to host
}
2.4 CAPI Connectivity
//go:build integration
func TestCrowdSecCAPIConnectivity(t *testing.T) {
// 1. Verify CAPI can be reached from container
// 2. Verify CAPI registration succeeds
// 3. Verify online_api_credentials.yaml is created
// 4. Verify credentials are valid (cscli capi status)
}
Phase 3: E2E Tests for Console Enrollment
Objective: Verify complete enrollment flow from UI
Test File: tests/security/crowdsec-console-enrollment.spec.ts (NEW)
Test Cases:
3.1 Enrollment Flow (Happy Path)
test('should complete console enrollment successfully', async ({ page, request }) => {
// Prerequisite: Ensure CrowdSec is enabled
await enableCrowdSec(request);
// Step 1: Navigate to CrowdSec configuration
await page.goto('/security/crowdsec');
await waitForLoadingComplete(page);
// Step 2: Verify console enrollment section is visible
const enrollmentSection = page.getByTestId('console-enrollment-section');
await expect(enrollmentSection).toBeVisible();
// Step 3: Enter enrollment token
const tokenInput = page.getByTestId('enrollment-token-input');
await tokenInput.fill(process.env.TEST_CROWDSEC_ENROLLMENT_TOKEN || 'test-token-123');
// Step 4: Enter agent name
const agentNameInput = page.getByTestId('agent-name-input');
await agentNameInput.fill('test-agent-e2e');
// Step 5: Submit enrollment
const enrollButton = page.getByRole('button', { name: /enroll/i });
await enrollButton.click();
// Step 6: Wait for enrollment request to complete
const enrollResponse = await page.waitForResponse(
resp => resp.url().includes('/api/v1/admin/crowdsec/console/enrollment') && resp.request().method() === 'POST'
);
expect(enrollResponse.ok()).toBeTruthy();
// Step 7: Verify status changes to pending_acceptance
await expect(page.getByText(/pending acceptance/i)).toBeVisible({ timeout: 10000 });
// Step 8: Verify enrollment record was created
const statusResponse = await request.get('/api/v1/admin/crowdsec/console/enrollment');
expect(statusResponse.ok()).toBeTruthy();
const status = await statusResponse.json();
expect(status.status).toBe('pending_acceptance');
expect(status.agent_name).toBe('test-agent-e2e');
});
3.2 Enrollment Validation Errors
test('should show validation errors for invalid enrollment data', async ({ page }) => {
await page.goto('/security/crowdsec');
// Test: Empty token
await page.getByRole('button', { name: /enroll/i }).click();
await expect(page.getByText(/enrollment.*required/i)).toBeVisible();
// Test: Invalid token format
await page.getByTestId('enrollment-token-input').fill('invalid@token!');
await page.getByRole('button', { name: /enroll/i }).click();
await expect(page.getByText(/invalid.*enrollment/i)).toBeVisible();
// Test: Empty agent name
await page.getByTestId('enrollment-token-input').fill('validtoken123');
await page.getByRole('button', { name: /enroll/i }).click();
await expect(page.getByText(/agent.*required/i)).toBeVisible();
});
3.3 Enrollment When LAPI Not Running
test('should show error when LAPI is not running', async ({ page, request }) => {
// Prerequisite: Stop CrowdSec
await request.post('/api/v1/admin/crowdsec/stop');
await page.waitForTimeout(2000);
// Attempt enrollment
await page.goto('/security/crowdsec');
const tokenInput = page.getByTestId('enrollment-token-input');
await tokenInput.fill('validtoken123');
await page.getByRole('button', { name: /enroll/i }).click();
// Verify error message
await expect(page.getByText(/Local API is not running/i)).toBeVisible({ timeout: 10000 });
});
3.4 Re-enrollment (Force)
test('should allow re-enrollment with force flag', async ({ page, request }) => {
// Prerequisite: Complete initial enrollment
// ... (same as happy path)
// Attempt enrollment again (should fail without force)
await page.getByTestId('enrollment-token-input').fill('newtoken456');
await page.getByRole('button', { name: /enroll/i }).click();
await expect(page.getByText(/already enrolled/i)).toBeVisible();
// Enable force re-enrollment
await page.getByTestId('force-reenroll-checkbox').click();
await page.getByRole('button', { name: /enroll/i }).click();
// Verify re-enrollment succeeds
await waitForToast(page, /enrollment.*sent/i);
});
3.5 Enrollment Status Display
test('should display current enrollment status correctly', async ({ page, request }) => {
// Test: Not enrolled
await page.goto('/security/crowdsec');
await expect(page.getByText(/not enrolled/i)).toBeVisible();
// Test: Enrolling (in progress)
// (Mock or trigger enrollment)
// Test: Pending acceptance
// (Mock pending state)
await expect(page.getByText(/pending acceptance/i)).toBeVisible();
await expect(page.getByText(/accept.*crowdsec\.net/i)).toBeVisible();
// Test: Enrolled
// (Mock enrolled state)
await expect(page.getByText(/enrolled/i)).toBeVisible();
await expect(page.getByTestId('enrollment-success-badge')).toBeVisible();
});
3.6 Clear Enrollment State
test('should clear enrollment state to allow fresh enrollment', async ({ page, request }) => {
// Prerequisite: Complete enrollment
// ...
// Click "Clear Enrollment" button
const clearButton = page.getByRole('button', { name: /clear.*enrollment/i });
await clearButton.click();
// Confirm in dialog
await page.getByRole('button', { name: /confirm/i }).click();
// Wait for DELETE request
const deleteResponse = await page.waitForResponse(
resp => resp.url().includes('/api/v1/admin/crowdsec/console/enrollment') && resp.request().method() === 'DELETE'
);
expect(deleteResponse.ok()).toBeTruthy();
// Verify status resets to not_enrolled
await expect(page.getByText(/not enrolled/i)).toBeVisible();
});
Phase 4: E2E Tests for Console Status Monitoring
Objective: Verify console connectivity monitoring and heartbeat tracking
Test File: tests/security/crowdsec-console-monitoring.spec.ts (NEW)
Test Cases:
4.1 Console Status Endpoint
test('should fetch console enrollment status', async ({ request }) => {
const response = await request.get('/api/v1/admin/crowdsec/console/enrollment');
expect(response.ok()).toBeTruthy();
const status = await response.json();
expect(status).toHaveProperty('status');
expect(status).toHaveProperty('agent_name');
expect(status).toHaveProperty('tenant');
expect(status).toHaveProperty('last_attempt_at');
expect(status).toHaveProperty('key_present');
});
4.2 Heartbeat Tracking (Future Enhancement)
test.skip('should track console heartbeats', async ({ request }) => {
// NOTE: This test is skipped because LastHeartbeatAt is not currently implemented
// Once implemented, this test should:
// 1. Enroll with console
// 2. Wait for heartbeat to be sent (typically every 10s)
// 3. Verify last_heartbeat_at is updated in database
// 4. Verify status endpoint returns heartbeat timestamp
});
Phase 5: Diagnostic and Monitoring Tests
Objective: Verify diagnostic endpoints and log collection
Test File: tests/security/crowdsec-diagnostics.spec.ts (NEW)
Test Cases:
5.1 Config File Validation
test('should validate CrowdSec configuration files', async ({ request }) => {
// GET /api/v1/admin/crowdsec/files (list config files)
const filesResponse = await request.get('/api/v1/admin/crowdsec/files');
expect(filesResponse.ok()).toBeTruthy();
const files = await filesResponse.json();
expect(files.files).toContain('config/config.yaml');
expect(files.files).toContain('config/acquis.yaml');
// GET /api/v1/admin/crowdsec/files?path=config/config.yaml
const configResponse = await request.get('/api/v1/admin/crowdsec/files?path=config/config.yaml');
expect(configResponse.ok()).toBeTruthy();
const config = await configResponse.json();
expect(config.content).toContain('listen_uri: 127.0.0.1:8085');
});
5.2 LAPI Health Endpoint
test('should verify LAPI health endpoint responds', async ({ request }) => {
// Prerequisite: Ensure CrowdSec is running
await enableCrowdSec(request);
await waitForLAPIReady(request, 30000);
// Test LAPI health endpoint directly
// NOTE: This requires port 8085 to be exposed or curl from within container
const statusResponse = await request.get('/api/v1/admin/crowdsec/status');
expect(statusResponse.ok()).toBeTruthy();
const status = await statusResponse.json();
expect(status.lapi_ready).toBe(true);
});
5.3 Export Configuration
test('should export CrowdSec configuration', async ({ request }) => {
const response = await request.get('/api/v1/admin/crowdsec/export');
expect(response.ok()).toBeTruthy();
// Verify response is tar.gz file
const contentType = response.headers()['content-type'];
expect(contentType).toContain('application/gzip');
const contentDisposition = response.headers()['content-disposition'];
expect(contentDisposition).toMatch(/attachment.*crowdsec-config.*\.tar\.gz/);
});
Implementation Plan
Phase 1: Diagnostic Tools (Week 1)
Goal: Implement comprehensive diagnostic endpoints to aid troubleshooting
1.1 Add Console Connectivity Check Endpoint
File: backend/internal/api/handlers/crowdsec_handler.go
// ConsoleConnectivityCheck verifies connectivity to crowdsec.net APIs
func (h *CrowdsecHandler) ConsoleConnectivityCheck(c *gin.Context) {
ctx := c.Request.Context()
checks := map[string]interface{}{
"lapi_running": false,
"capi_registered": false,
"console_enrolled": false,
"capi_reachable": false,
"console_reachable": false,
}
// Check 1: LAPI running
running, _, _ := h.Executor.Status(ctx, h.DataDir)
checks["lapi_running"] = running
// Check 2: LAPI health
if running {
args := []string{"lapi", "status"}
_, err := h.CmdExec.Execute(ctx, "cscli", args...)
checks["lapi_ready"] = (err == nil)
}
// Check 3: CAPI registered
credsPath := filepath.Join(h.DataDir, "config", "online_api_credentials.yaml")
checks["capi_registered"] = fileExists(credsPath)
// Check 4: CAPI reachable
if checks["capi_registered"].(bool) {
args := []string{"capi", "status"}
out, err := h.CmdExec.Execute(ctx, "cscli", args...)
checks["capi_reachable"] = (err == nil)
checks["capi_status_output"] = string(out)
}
// Check 5: Console enrolled
if h.Console != nil {
status, err := h.Console.Status(ctx)
if err == nil {
checks["console_enrolled"] = (status.Status == "enrolled" || status.Status == "pending_acceptance")
checks["console_status"] = status
}
}
// Check 6: Console API reachable (ping crowdsec.net)
consoleURL := "https://api.crowdsec.net/health"
resp, err := http.Get(consoleURL)
if err == nil {
defer resp.Body.Close()
checks["console_reachable"] = (resp.StatusCode == 200)
} else {
checks["console_reachable"] = false
checks["console_error"] = err.Error()
}
c.JSON(http.StatusOK, checks)
}
Route: GET /api/v1/admin/crowdsec/diagnostics/connectivity
1.2 Add Config Validation Endpoint
File: backend/internal/api/handlers/crowdsec_handler.go
// ValidateConfig checks CrowdSec configuration files for common issues
func (h *CrowdsecHandler) ValidateConfig(c *gin.Context) {
ctx := c.Request.Context()
validation := map[string]interface{}{
"config_exists": false,
"config_valid": false,
"acquis_exists": false,
"acquis_valid": false,
"lapi_port": "",
"errors": []string{},
}
// Check config.yaml
configPath := filepath.Join(h.DataDir, "config", "config.yaml")
if fileExists(configPath) {
validation["config_exists"] = true
// Read config and check LAPI port
content, err := os.ReadFile(configPath)
if err == nil {
configStr := string(content)
if strings.Contains(configStr, "listen_uri") {
// Extract port
re := regexp.MustCompile(`listen_uri:\s*127\.0\.0\.1:(\d+)`)
matches := re.FindStringSubmatch(configStr)
if len(matches) > 1 {
validation["lapi_port"] = matches[1]
}
}
}
// Validate using cscli
args := []string{"-c", configPath, "config", "check"}
out, err := h.CmdExec.Execute(ctx, "cscli", args...)
if err == nil {
validation["config_valid"] = true
} else {
validation["config_valid"] = false
validation["errors"] = append(validation["errors"].([]string), string(out))
}
}
// Check acquis.yaml
acquisPath := filepath.Join(h.DataDir, "config", "acquis.yaml")
if fileExists(acquisPath) {
validation["acquis_exists"] = true
// Check if it has datasources
content, err := os.ReadFile(acquisPath)
if err == nil {
acquisStr := string(content)
if strings.Contains(acquisStr, "source:") && strings.Contains(acquisStr, "filenames:") {
validation["acquis_valid"] = true
} else {
validation["acquis_valid"] = false
validation["errors"] = append(validation["errors"].([]string), "acquis.yaml missing datasource configuration")
}
}
} else {
validation["errors"] = append(validation["errors"].([]string), "acquis.yaml not found")
}
c.JSON(http.StatusOK, validation)
}
Route: GET /api/v1/admin/crowdsec/diagnostics/config
1.3 Add Heartbeat Status Endpoint (Future)
File: backend/internal/api/handlers/crowdsec_handler.go
// GetHeartbeatStatus returns console heartbeat status
// NOTE: This is a placeholder for future implementation
// Currently, LastHeartbeatAt is not tracked in the database
func (h *CrowdsecHandler) GetHeartbeatStatus(c *gin.Context) {
if h.Console == nil {
c.JSON(http.StatusServiceUnavailable, gin.H{"error": "console service unavailable"})
return
}
ctx := c.Request.Context()
status, err := h.Console.Status(ctx)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
// TODO: Implement heartbeat tracking
// For now, return placeholder
c.JSON(http.StatusOK, gin.H{
"status": status.Status,
"last_heartbeat_at": status.LastHeartbeatAt,
"heartbeat_tracking_implemented": false,
"note": "Heartbeat tracking not yet implemented",
})
}
Route: GET /api/v1/admin/crowdsec/console/heartbeat
Phase 2: Enhanced Enrollment Validation (Week 2)
Goal: Improve enrollment validation and error messaging
2.1 Increase LAPI Check Retries
File: backend/internal/crowdsec/console_enroll.go
Change:
// Before:
maxRetries := 3
retryDelay := 2 * time.Second
// After:
maxRetries := 5 // Increased from 3
retryDelay := 3 * time.Second // Increased from 2s
// Add exponential backoff
for i := 0; i < maxRetries; i++ {
// ... existing code ...
if i < maxRetries-1 {
// Exponential backoff: 3s, 6s, 12s, 24s
delay := time.Duration(retryDelay.Seconds() * math.Pow(2, float64(i))) * time.Second
time.Sleep(delay)
}
}
2.2 Add Token Expiry Detection
File: backend/internal/crowdsec/console_enroll.go
Add new function:
// checkTokenExpiry attempts to detect if enrollment token is expired
// Returns true if token appears expired, false otherwise
func (s *ConsoleEnrollmentService) checkTokenExpiry(token string) (bool, error) {
// Tokens from crowdsec.net have a limited lifetime (typically 24-48 hours)
// We can't validate this client-side without calling the API
// Best we can do is check the error message from cscli
// For now, this is a placeholder
// Future: Could call crowdsec.net API to validate token before enrollment
return false, nil
}
2.3 Improve Error Messages
File: backend/internal/crowdsec/console_enroll.go
Enhance extractCscliErrorMessage:
func extractCscliErrorMessage(output string) string {
output = strings.TrimSpace(output)
if output == "" {
return ""
}
// Check for specific error patterns
errorPatterns := map[string]string{
"token is expired": "Enrollment token has expired. Please generate a new token from crowdsec.net",
"token is invalid": "Enrollment token is invalid. Please verify the token from crowdsec.net",
"agent is already enrolled": "Agent is already enrolled. Use force=true to re-enroll",
"LAPI is not reachable": "Cannot reach Local API. Ensure CrowdSec is running",
"CAPI is not reachable": "Cannot reach Central API. Check network connectivity",
}
for pattern, message := range errorPatterns {
if strings.Contains(strings.ToLower(output), pattern) {
return message
}
}
// Fall back to existing extraction logic
// ...
}
Phase 3: Heartbeat Monitoring Implementation (Week 3)
Goal: Implement console heartbeat tracking and status polling
3.1 Add Heartbeat Polling Service
File: backend/internal/crowdsec/heartbeat_poller.go (NEW)
package crowdsec
import (
"context"
"sync"
"time"
"github.com/Wikid82/charon/backend/internal/logger"
"github.com/Wikid82/charon/backend/internal/models"
"gorm.io/gorm"
)
type HeartbeatPoller struct {
db *gorm.DB
exec EnvCommandExecutor
dataDir string
interval time.Duration
stopCh chan struct{}
wg sync.WaitGroup
}
func NewHeartbeatPoller(db *gorm.DB, exec EnvCommandExecutor, dataDir string) *HeartbeatPoller {
return &HeartbeatPoller{
db: db,
exec: exec,
dataDir: dataDir,
interval: 60 * time.Second, // Check every 60 seconds
stopCh: make(chan struct{}),
}
}
func (p *HeartbeatPoller) Start() {
p.wg.Add(1)
go p.poll()
}
func (p *HeartbeatPoller) Stop() {
close(p.stopCh)
p.wg.Wait()
}
func (p *HeartbeatPoller) poll() {
defer p.wg.Done()
ticker := time.NewTicker(p.interval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
p.checkHeartbeat()
case <-p.stopCh:
return
}
}
}
func (p *HeartbeatPoller) checkHeartbeat() {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
// Check if console is enrolled
var enrollment models.CrowdsecConsoleEnrollment
if err := p.db.WithContext(ctx).First(&enrollment).Error; err != nil {
return // Not enrolled, skip
}
if enrollment.Status != "enrolled" && enrollment.Status != "pending_acceptance" {
return // Not enrolled, skip
}
// Check console status via cscli
args := []string{"console", "status"}
configPath := filepath.Join(p.dataDir, "config", "config.yaml")
if _, err := os.Stat(configPath); err == nil {
args = append([]string{"-c", configPath}, args...)
}
out, err := p.exec.ExecuteWithEnv(ctx, "cscli", args, nil)
if err != nil {
logger.Log().WithError(err).WithField("output", string(out)).Warn("Failed to check console status")
return
}
// Parse output to detect status
output := string(out)
now := time.Now().UTC()
if strings.Contains(output, "enabled") && strings.Contains(output, "enrolled") {
// Update heartbeat timestamp
enrollment.LastHeartbeatAt = &now
if enrollment.Status == "pending_acceptance" {
// User has accepted enrollment on console
enrollment.Status = "enrolled"
enrollment.EnrolledAt = &now
}
if err := p.db.WithContext(ctx).Save(&enrollment).Error; err != nil {
logger.Log().WithError(err).Warn("Failed to update heartbeat timestamp")
} else {
logger.Log().Debug("Console heartbeat updated")
}
}
}
3.2 Integrate Heartbeat Poller in Main
File: backend/cmd/api/main.go
// Initialize heartbeat poller
if consoleEnrollmentEnabled {
heartbeatPoller := crowdsec.NewHeartbeatPoller(db, &crowdsec.SecureCommandExecutor{}, crowdsecDataDir)
heartbeatPoller.Start()
defer heartbeatPoller.Stop()
}
Phase 4: Enhanced Logging and Monitoring (Week 3-4)
Goal: Improve observability for troubleshooting
4.1 Add Structured Logging for Enrollment
File: backend/internal/crowdsec/console_enroll.go
Enhance logging throughout enrollment process:
// Log at each critical step
logger.Log().WithFields(map[string]any{
"correlation_id": rec.LastCorrelationID,
"agent_name": agent,
"tenant": tenant,
"step": "lapi_check",
}).Info("Checking LAPI availability")
logger.Log().WithFields(map[string]any{
"correlation_id": rec.LastCorrelationID,
"step": "capi_registration",
}).Info("Ensuring CAPI registration")
logger.Log().WithFields(map[string]any{
"correlation_id": rec.LastCorrelationID,
"step": "enrollment_submit",
"force": req.Force,
}).Info("Submitting enrollment request to CrowdSec console")
4.2 Add Prometheus Metrics
File: backend/internal/metrics/crowdsec_metrics.go (NEW)
package metrics
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
CrowdSecEnrollmentAttempts = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "charon_crowdsec_enrollment_attempts_total",
Help: "Total number of console enrollment attempts",
},
[]string{"status"}, // success, failed, pending
)
CrowdSecEnrollmentDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "charon_crowdsec_enrollment_duration_seconds",
Help: "Duration of console enrollment attempts",
Buckets: []float64{1, 5, 10, 30, 60},
},
[]string{"status"},
)
CrowdSecHeartbeatLatency = promauto.NewHistogram(
prometheus.HistogramOpts{
Name: "charon_crowdsec_heartbeat_latency_seconds",
Help: "Latency of console heartbeat checks",
Buckets: []float64{0.1, 0.5, 1, 2, 5},
},
)
CrowdSecLAPIHealth = promauto.NewGauge(
prometheus.GaugeOpts{
Name: "charon_crowdsec_lapi_healthy",
Help: "1 if LAPI is healthy, 0 otherwise",
},
)
)
Integrate metrics in handlers:
func (h *CrowdsecHandler) ConsoleEnroll(c *gin.Context) {
start := time.Now()
defer func() {
duration := time.Since(start).Seconds()
// Record metrics based on result
}()
// ... enrollment logic ...
metrics.CrowdSecEnrollmentAttempts.WithLabelValues("success").Inc()
metrics.CrowdSecEnrollmentDuration.WithLabelValues("success").Observe(duration)
}
Phase 5: Documentation and User Guidance (Week 4)
Goal: Provide clear troubleshooting documentation for users
5.1 Update Cerberus Documentation
File: docs/cerberus.md
Add new section:
### Troubleshooting Console Enrollment
#### Symptom: Enrollment shows "pending_acceptance" but I already accepted on crowdsec.net
**Cause:** Status polling not implemented (manual refresh required)
**Solution:**
1. Refresh the Charon UI page
2. Check `/api/v1/admin/crowdsec/console/enrollment` endpoint
3. If still pending, check `cscli console status` inside container:
```bash
docker exec charon cscli console status
Symptom: "Local API is not running after X attempts"
Cause: LAPI is still initializing or failed to start
Solution:
- Check CrowdSec process:
docker exec charon ps aux | grep crowdsec - Check LAPI logs:
docker exec charon tail -100 /var/log/crowdsec/crowdsec.log - Verify LAPI config:
docker exec charon grep listen_uri /etc/crowdsec/config.yaml - Test LAPI manually:
docker exec charon cscli lapi status
Symptom: Enrollment fails with "CAPI is not reachable"
Cause: Network connectivity issue or CAPI credentials invalid
Solution:
- Test connectivity to crowdsec.net:
docker exec charon curl -I https://api.crowdsec.net/health - Check CAPI credentials:
docker exec charon test -f /etc/crowdsec/config/online_api_credentials.yaml - Re-register with CAPI:
docker exec charon cscli capi register
Symptom: Engine shows offline in console despite successful enrollment
Possible Causes:
- Heartbeats not being sent
- Network/firewall blocking outbound HTTPS
- Token expired or revoked
- LAPI process crashed after enrollment
Solution:
- Verify LAPI is running:
docker exec charon ps aux | grep crowdsec - Check console status:
docker exec charon cscli console status - Check CrowdSec logs for heartbeat errors:
docker exec charon tail -100 /var/log/crowdsec/crowdsec.log | grep -i heartbeat - Test outbound HTTPS:
docker exec charon curl -v https://api.crowdsec.net/health - Check crowdsec.net console for agent last seen timestamp
#### 5.2 Add Diagnostic Script
**File:** `scripts/diagnose-crowdsec.sh` (NEW)
```bash
#!/bin/bash
set -e
echo "CrowdSec Console Enrollment Diagnostic Script"
echo "=============================================="
echo ""
# Check 1: CrowdSec process
echo "[1/8] Checking CrowdSec process..."
if docker exec charon ps aux | grep -q '[c]rowdsec'; then
echo "✓ CrowdSec process is running"
CROWDSEC_PID=$(docker exec charon ps aux | grep '[c]rowdsec' | awk '{print $2}')
echo " PID: $CROWDSEC_PID"
else
echo "✗ CrowdSec process is NOT running"
echo " Run: docker exec charon charon crowdsec start"
exit 1
fi
# Check 2: LAPI health
echo "[2/8] Checking LAPI health..."
if docker exec charon cscli lapi status >/dev/null 2>&1; then
echo "✓ LAPI is responding"
else
echo "✗ LAPI is NOT responding"
echo " Check logs: docker exec charon tail -100 /var/log/crowdsec/crowdsec.log"
fi
# Check 3: LAPI port
echo "[3/8] Checking LAPI configuration..."
LAPI_PORT=$(docker exec charon grep listen_uri /etc/crowdsec/config.yaml | sed 's/.*:\([0-9]*\)/\1/')
if [ "$LAPI_PORT" = "8085" ]; then
echo "✓ LAPI configured on port 8085"
else
echo "✗ LAPI port is $LAPI_PORT (expected 8085)"
fi
# Check 4: CAPI registration
echo "[4/8] Checking CAPI registration..."
if docker exec charon test -f /etc/crowdsec/config/online_api_credentials.yaml; then
echo "✓ CAPI credentials file exists"
if docker exec charon cscli capi status >/dev/null 2>&1; then
echo "✓ CAPI is reachable"
else
echo "✗ CAPI is NOT reachable (network issue or invalid credentials)"
fi
else
echo "✗ CAPI credentials file missing"
echo " Run: docker exec charon cscli capi register"
fi
# Check 5: Console enrollment status
echo "[5/8] Checking console enrollment status..."
CONSOLE_STATUS=$(docker exec charon cscli console status 2>&1 || echo "error")
if echo "$CONSOLE_STATUS" | grep -qi "enrolled"; then
echo "✓ Console enrollment detected"
echo " $CONSOLE_STATUS"
else
echo "✗ Console not enrolled"
echo " Enroll via Charon UI: /security/crowdsec"
fi
# Check 6: Config validation
echo "[6/8] Validating configuration files..."
if docker exec charon test -f /etc/crowdsec/config.yaml; then
echo "✓ config.yaml exists"
else
echo "✗ config.yaml missing"
fi
if docker exec charon test -f /etc/crowdsec/acquis.yaml; then
echo "✓ acquis.yaml exists"
if docker exec charon grep -q "source:" /etc/crowdsec/acquis.yaml; then
echo "✓ acquis.yaml has datasource configuration"
else
echo "✗ acquis.yaml missing datasource"
fi
else
echo "✗ acquis.yaml missing"
fi
# Check 7: Network connectivity
echo "[7/8] Checking network connectivity..."
if docker exec charon curl -fsS --connect-timeout 5 https://api.crowdsec.net/health >/dev/null 2>&1; then
echo "✓ Can reach crowdsec.net API"
else
echo "✗ Cannot reach crowdsec.net API"
echo " Check firewall, proxy, DNS configuration"
fi
# Check 8: Database enrollment state
echo "[8/8] Checking database enrollment state..."
ENROLLMENT_STATUS=$(docker exec charon sqlite3 /app/data/charon.db \
"SELECT status FROM crowdsec_console_enrollments LIMIT 1;" 2>/dev/null || echo "")
if [ -n "$ENROLLMENT_STATUS" ]; then
echo "✓ Enrollment record found: $ENROLLMENT_STATUS"
else
echo "✓ No enrollment record (not enrolled)"
fi
echo ""
echo "Diagnostic complete!"
echo ""
echo "If CrowdSec shows as offline in console:"
echo " 1. Verify all checks above passed"
echo " 2. Check crowdsec.net console for agent last seen"
echo " 3. Review CrowdSec logs: docker exec charon tail -100 /var/log/crowdsec/crowdsec.log"
echo " 4. Contact CrowdSec support if issue persists"
Test Execution Plan
Phase 1: Unit Tests (Day 1-2)
# Run enrollment service unit tests
cd backend
go test -v ./internal/crowdsec/... -run TestConsoleEnrollment
# Expected: All new token validation, LAPI check, and CAPI registration tests pass
Phase 2: Integration Tests (Day 3-4)
# Run LAPI integration tests
cd backend
go test -v -tags=integration ./integration/... -run TestCrowdSecLAPI
# Expected: LAPI startup, health checks, and CAPI connectivity tests pass
Phase 3: E2E Tests (Day 5-6)
# Rebuild E2E container
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e --clean
# Run console enrollment E2E tests
npx playwright test tests/security/crowdsec-console-enrollment.spec.ts
# Run console monitoring E2E tests
npx playwright test tests/security/crowdsec-console-monitoring.spec.ts
# Run diagnostic E2E tests
npx playwright test tests/security/crowdsec-diagnostics.spec.ts
# Expected: All enrollment flow, validation, and diagnostic tests pass
Phase 4: Manual Verification (Day 7)
# Run diagnostic script
./scripts/diagnose-crowdsec.sh
# Expected: All checks pass, detailed status report generated
Phase 5: Production Validation (Day 8-10)
- Deploy to staging environment
- Complete enrollment with real crowdsec.net console
- Monitor heartbeat status over 24-48 hours
- Verify engine stays online in console
- Document any additional issues discovered
Success Criteria
Short-term (Week 1-2)
- ✅ All diagnostic endpoints implemented and functional
- ✅ Enrollment validation enhanced with better error messages
- ✅ Config validation endpoint reports accurate status
- ✅ Connectivity check endpoint identifies network issues
Medium-term (Week 3-4)
- ✅ Heartbeat polling service implemented and running
- ✅ LastHeartbeatAt field populated correctly
- ✅ Console status transitions from pending_acceptance to enrolled automatically
- ✅ Prometheus metrics tracking enrollment success/failure rates
- ✅ All unit tests pass with 100% coverage for new code
- ✅ All integration tests pass consistently
Long-term (Week 4+)
- ✅ All E2E tests pass on Chromium, Firefox, Webkit
- ✅ Diagnostic script catches 90%+ of common issues
- ✅ Documentation updated with troubleshooting guides
- ✅ Zero false positives in offline detection
- ✅ Engine consistently shows online in console for enrolled instances
- ✅ User-reported enrollment issues reduced by 80%+
Risk Mitigation
Risk 1: LAPI Initialization Timing
Mitigation:
- Increase retry attempts from 3 to 5
- Implement exponential backoff (3s, 6s, 12s, 24s)
- Add detailed logging for each retry attempt
- Document expected initialization time (10-15s on slow hardware)
Risk 2: Network Connectivity Variability
Mitigation:
- Add explicit connectivity checks before enrollment
- Test against both api.crowdsec.net and main crowdsec.net domains
- Document firewall/proxy requirements clearly
- Provide fallback diagnostic commands
Risk 3: Token Expiry Edge Cases
Mitigation:
- Improve error message extraction to detect expiry
- Document token lifetime (24-48 hours)
- Add warning in UI when token is >24 hours old
- Provide clear instructions to regenerate token
Risk 4: Database State Corruption
Mitigation:
- Add validation for enrollment state transitions
- Implement database migration to ensure schema consistency
- Add repair mechanism for corrupted enrollment records
- Document manual DB cleanup procedures
Risk 5: Test Flakiness
Mitigation:
- Use deterministic wait strategies (not arbitrary sleeps)
- Implement retry logic for network-dependent tests
- Mock external dependencies where possible
- Run tests in isolated containers to prevent interference
Monitoring and Alerting
Metrics to Track
- Enrollment Success Rate:
charon_crowdsec_enrollment_attempts_total{status="success"} - Enrollment Failure Rate:
charon_crowdsec_enrollment_attempts_total{status="failed"} - LAPI Health:
charon_crowdsec_lapi_healthy(1 = healthy, 0 = unhealthy) - Heartbeat Latency:
charon_crowdsec_heartbeat_latency_seconds - Enrollment Duration:
charon_crowdsec_enrollment_duration_seconds
Alerts to Create
- LAPI Down: Alert if
charon_crowdsec_lapi_healthy == 0for >5 minutes - Enrollment Failures: Alert if enrollment failure rate >20% over 1 hour
- Heartbeat Timeout: Alert if no heartbeat received for >10 minutes (when enrolled)
- Console Offline: Alert if engine shows offline in console for >30 minutes
Log Queries
# Enrollment attempts in last hour
grep "crowdsec console enrollment" /var/log/charon.log | tail -100
# LAPI health check failures
grep "LAPI check" /var/log/charon.log | grep -i error
# Heartbeat status
grep "heartbeat" /var/log/crowdsec/crowdsec.log | tail -50
Appendix A: File Manifest
New Files to Create
backend/internal/crowdsec/console_enroll_test.go- Unit tests for enrollment servicebackend/integration/crowdsec_lapi_integration_test.go- LAPI integration teststests/security/crowdsec-console-enrollment.spec.ts- E2E enrollment teststests/security/crowdsec-console-monitoring.spec.ts- E2E monitoring teststests/security/crowdsec-diagnostics.spec.ts- E2E diagnostic testsbackend/internal/crowdsec/heartbeat_poller.go- Heartbeat polling servicebackend/internal/metrics/crowdsec_metrics.go- Prometheus metricsscripts/diagnose-crowdsec.sh- Diagnostic script
Files to Modify
backend/internal/api/handlers/crowdsec_handler.go- Add diagnostic endpointsbackend/internal/crowdsec/console_enroll.go- Enhance retry logic, error messagesbackend/cmd/api/main.go- Integrate heartbeat pollerdocs/cerberus.md- Add troubleshooting section
Files to Review (No Changes)
backend/internal/models/crowdsec_console_enrollment.go- Model is adequate.docker/docker-entrypoint.sh- Config initialization is adequatedocker-compose.yml- Network configuration is adequate
Appendix B: API Endpoint Reference
New Diagnostic Endpoints
| Endpoint | Method | Description | Auth |
|---|---|---|---|
/api/v1/admin/crowdsec/diagnostics/connectivity |
GET | Check connectivity to crowdsec.net APIs | Admin |
/api/v1/admin/crowdsec/diagnostics/config |
GET | Validate CrowdSec configuration files | Admin |
/api/v1/admin/crowdsec/console/heartbeat |
GET | Get console heartbeat status | Admin |
Existing Enrollment Endpoints
| Endpoint | Method | Description | Auth |
|---|---|---|---|
/api/v1/admin/crowdsec/console/enrollment |
GET | Get current enrollment status | Admin |
/api/v1/admin/crowdsec/console/enrollment |
POST | Enroll with CrowdSec console | Admin |
/api/v1/admin/crowdsec/console/enrollment |
DELETE | Clear enrollment state | Admin |
/api/v1/admin/crowdsec/status |
GET | Get CrowdSec running status | Admin |
/api/v1/admin/crowdsec/start |
POST | Start CrowdSec process | Admin |
/api/v1/admin/crowdsec/stop |
POST | Stop CrowdSec process | Admin |
Appendix C: Database Schema
CrowdsecConsoleEnrollment Table
CREATE TABLE crowdsec_console_enrollments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
uuid TEXT UNIQUE NOT NULL,
status TEXT NOT NULL, -- not_enrolled, enrolling, pending_acceptance, enrolled, failed
tenant TEXT,
agent_name TEXT,
encrypted_enroll_key TEXT, -- AES-256 encrypted
last_error TEXT,
last_correlation_id TEXT,
last_attempt_at DATETIME,
enrolled_at DATETIME,
last_heartbeat_at DATETIME, -- NEW: Updated by heartbeat poller
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL
);
CREATE INDEX idx_crowdsec_enrollments_status ON crowdsec_console_enrollments(status);
CREATE INDEX idx_crowdsec_enrollments_correlation_id ON crowdsec_console_enrollments(last_correlation_id);
Appendix D: References
Internal Documentation
External Resources
GitHub Issues
- Issue #586 - CrowdSec offline since 12/19/25
Document Version Control
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-02-03 | GitHub Copilot | Initial comprehensive plan |
End of Document