Charon/docs/cerberus.md at 9ef8a1ce21c8b79653704f745415b4647afa59d0

Files

GitHub Actions 3169b05156 fix: skip incomplete system log viewer tests

- Marked 12 tests as skip pending feature implementation
- Features tracked in GitHub issue #686 (system log viewer feature completion)
- Tests cover sorting by timestamp/level/method/URI/status, pagination controls, filtering by text/level, download functionality
- Unblocks Phase 2 at 91.7% pass rate to proceed to Phase 3 security enforcement validation
- TODO comments in code reference GitHub #686 for feature completion tracking
- Tests skipped: Pagination (3), Search/Filter (2), Download (2), Sorting (1), Log Display (4)

2026-02-09 21:55:55 +00:00

24 KiB

Raw Blame History

title, description

title	description
Cerberus Technical Documentation	Technical deep-dive into Charon's Cerberus security suite. Architecture, configuration, and API reference for developers.

Cerberus Technical Documentation

This document is for developers and advanced users who want to understand how Cerberus works under the hood.

Looking for the user guide? See Security Features instead.

What Is Cerberus?

Cerberus is the optional security suite built into Charon. It includes:

WAF (Web Application Firewall) — Inspects requests for malicious payloads
CrowdSec — Blocks IPs based on behavior and reputation
Access Lists — Static allow/deny rules (IP, CIDR, geo)
Rate Limiting — Volume-based abuse prevention (placeholder)

All components are disabled by default and can be enabled independently.

Architecture

Request Flow

When a request hits Charon:

Check if Cerberus is enabled (global setting + dynamic database flag)
WAF evaluation (if waf_mode != disabled)
- Increment charon_waf_requests_total metric
- Check payload against loaded rulesets
- If suspicious:
  - block mode: Return 403 + increment charon_waf_blocked_total
  - monitor mode: Log + increment charon_waf_monitored_total
ACL evaluation (if enabled)
- Test client IP against active access lists
- First denial = 403 response
CrowdSec check (placeholder for future)
Rate limit check (placeholder for future)
Pass to downstream handler (if not blocked)

Middleware Integration

Cerberus runs as Gin middleware on all /api/v1 routes:

r.Use(cerberusMiddleware.RequestLogger())

This means it protects the management API but does not directly inspect traffic to proxied websites (that happens in Caddy).

Threat Model & Protection Coverage

What Cerberus Protects

Threat Category	CrowdSec	ACL	WAF	Rate Limit
Known attackers (IP reputation)	✅	❌	❌	❌
Geo-based attacks	❌	✅	❌	❌
SQL Injection (SQLi)	❌	❌	✅	❌
Cross-Site Scripting (XSS)	❌	❌	✅	❌
Remote Code Execution (RCE)	❌	❌	✅	❌
Zero-Day Web Exploits	⚠️	❌	✅	❌
DDoS / Volume attacks	❌	❌	❌	✅
Brute-force login attempts	✅	❌	❌	✅
Credential stuffing	✅	❌	❌	✅

Legend:

✅ Full protection
⚠️ Partial protection (time-delayed)
❌ Not designed for this threat

Zero-Day Exploit Protection (WAF)

The WAF provides pattern-based detection for zero-day exploits:

How It Works:

Attacker discovers new vulnerability (e.g., SQLi in your login form)
Attacker crafts exploit: ' OR 1=1--
WAF inspects request → matches SQL injection pattern → BLOCKED
Your application never sees the malicious input

Limitations:

Only protects HTTP/HTTPS traffic
Cannot detect completely novel attack patterns (rare)
Does not protect against logic bugs in application code

Effectiveness:

~90% of zero-day web exploits use known patterns (SQLi, XSS, RCE)
~10% are truly novel and may bypass WAF until rules are updated

Request Processing Pipeline

1. [CrowdSec]      Check IP reputation → Block if known attacker
2. [ACL]           Check IP/Geo rules → Block if not allowed
3. [WAF]           Inspect request payload → Block if malicious pattern
4. [Rate Limit]    Count requests → Block if too many
5. [Proxy]         Forward to upstream service

Configuration Model

Database Schema

SecurityConfig table:

type SecurityConfig struct {
    ID                   uint   `gorm:"primaryKey"`
    Name                 string `json:"name"`
    Enabled              bool   `json:"enabled"`
    AdminWhitelist       string `json:"admin_whitelist"`        // CSV of IPs/CIDRs
    CrowdsecMode         string `json:"crowdsec_mode"`          // disabled, local, external
    CrowdsecAPIURL       string `json:"crowdsec_api_url"`
    CrowdsecAPIKey       string `json:"crowdsec_api_key"`
    WafMode              string `json:"waf_mode"`               // disabled, monitor, block
    WafRulesSource       string `json:"waf_rules_source"`       // Ruleset identifier
    WafLearning          bool   `json:"waf_learning"`
    RateLimitEnable      bool   `json:"rate_limit_enable"`
    RateLimitBurst       int    `json:"rate_limit_burst"`
    RateLimitRequests    int    `json:"rate_limit_requests"`
    RateLimitWindowSec   int    `json:"rate_limit_window_sec"`
}

Environment Variables (Fallbacks)

If no database config exists, Charon reads from environment:

CERBERUS_SECURITY_WAF_MODE — disabled | monitor | block
🚨 DEPRECATED: CERBERUS_SECURITY_CROWDSEC_MODE — Use GUI toggle instead (see below)
🚨 DEPRECATED: CERBERUS_SECURITY_CROWDSEC_API_URL — External mode is no longer supported
🚨 DEPRECATED: CERBERUS_SECURITY_CROWDSEC_API_KEY — External mode is no longer supported
CERBERUS_SECURITY_ACL_ENABLED — true | false
CERBERUS_SECURITY_RATELIMIT_ENABLED — true | false

⚠️ IMPORTANT: The CHARON_SECURITY_CROWDSEC_MODE (and legacy CERBERUS_SECURITY_CROWDSEC_MODE, CPM_SECURITY_CROWDSEC_MODE) environment variables are DEPRECATED as of version 2.0. CrowdSec is now GUI-controlled through the Security dashboard, just like WAF, ACL, and Rate Limiting.

Why the change?

CrowdSec now works like all other security features (GUI-based)
No need to restart containers to enable/disable CrowdSec
Better integration with Charon's security orchestration
The import config feature replaced the need for external mode

Migration: If you have CHARON_SECURITY_CROWDSEC_MODE=local in your docker-compose.yml, remove it and use the GUI toggle instead. See Migration Guide for step-by-step instructions.

WAF (Web Application Firewall)

Current Implementation

Status: Prototype with placeholder detection

The current WAF checks for <script> tags as a proof-of-concept. Full OWASP CRS integration is planned.

func (w *WAF) EvaluateRequest(r *http.Request) (Decision, error) {
    if strings.Contains(r.URL.Query().Get("q"), "<script>") {
        return Decision{Action: "block", Reason: "XSS detected"}, nil
    }
    return Decision{Action: "allow"}, nil
}

Future: Coraza Integration

Planned integration with Coraza WAF and OWASP Core Rule Set:

waf, err := coraza.NewWAF(coraza.NewWAFConfig().
    WithDirectives(loadedRuleContent))

This will provide production-grade detection of:

SQL injection
Cross-site scripting (XSS)
Remote code execution
File inclusion attacks
And more

Rulesets

SecurityRuleSet table stores rule definitions:

type SecurityRuleSet struct {
    ID         uint   `gorm:"primaryKey"`
    Name       string `json:"name"`
    SourceURL  string `json:"source_url"`  // Optional URL for rule updates
    Mode       string `json:"mode"`        // owasp, custom
    Content    string `json:"content"`     // Raw rule text
}

Manage via /api/v1/security/rulesets.

Prometheus Metrics

charon_waf_requests_total{mode="block|monitor"} — Total requests evaluated
charon_waf_blocked_total{mode="block"} — Requests blocked
charon_waf_monitored_total{mode="monitor"} — Requests logged but not blocked

Scrape from /metrics endpoint (no auth required).

Structured Logging

WAF decisions emit JSON-like structured logs:

{
  "source": "waf",
  "decision": "block",
  "mode": "block",
  "path": "/api/v1/proxy-hosts",
  "query": "name=<script>alert(1)</script>",
  "ip": "203.0.113.50"
}

Use these for dashboard creation and alerting.

Access Control Lists (ACLs)

How They Work

Each AccessList defines:

Type: whitelist | blacklist | geo_whitelist | geo_blacklist | local_only
IPs: Comma-separated IPs or CIDR blocks
Countries: Comma-separated ISO country codes (US, GB, FR, etc.)

Evaluation logic:

Whitelist: If IP matches list → allow; else → deny
Blacklist: If IP matches list → deny; else → allow
Geo Whitelist: If country matches → allow; else → deny
Geo Blacklist: If country matches → deny; else → allow
Local Only: If RFC1918 private IP → allow; else → deny

Multiple ACLs can be assigned to a proxy host. The first denial wins.

GeoIP Database

Uses MaxMind GeoLite2-Country database:

Path configured via CHARON_GEOIP_DB_PATH
Default: /app/data/GeoLite2-Country.mmdb (Docker)
Update monthly from MaxMind for accuracy

CrowdSec Integration

GUI-Based Control (Current Architecture)

CrowdSec is now GUI-controlled, matching the pattern used by WAF, ACL, and Rate Limiting. The environment variable control (CHARON_SECURITY_CROWDSEC_MODE) is deprecated and will be removed in a future version.

LAPI Initialization and Health Checks

Technical Implementation:

When you toggle CrowdSec ON via the GUI, the backend performs the following:

Start CrowdSec Process (/api/v1/admin/crowdsec/start)

pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir)

Poll LAPI Health (automatic, server-side)
- Polling interval: 500ms
- Maximum wait: 30 seconds
- Health check command: cscli lapi status
- Expected response: Exit code 0 (success)

Return Status with lapi_ready Flag

{
  "status": "started",
  "pid": 203,
  "lapi_ready": true
}

Response Fields:

status — "started" (process successfully initiated) or "error"
pid — Process ID of running CrowdSec instance
lapi_ready — Boolean indicating if LAPI health check passed
- true — LAPI is fully initialized and accepting requests
- false — CrowdSec is running, but LAPI still initializing (may take 5-10 more seconds)

Backend Implementation (internal/handlers/crowdsec_handler.go:185-230):

func (h *CrowdsecHandler) Start(c *gin.Context) {
    // Start the process
    pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir)
    if err != nil {
        c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
        return
    }

    // Wait for LAPI to be ready (with timeout)
    lapiReady := false
    maxWait := 30 * time.Second
    pollInterval := 500 * time.Millisecond
    deadline := time.Now().Add(maxWait)

    for time.Now().Before(deadline) {
        checkCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
        defer cancel()

        _, err := h.CmdExec.Execute(checkCtx, "cscli", []string{"lapi", "status"})
        if err == nil {
            lapiReady = true
            break
        }
        time.Sleep(pollInterval)
    }

    // Return status
    c.JSON(http.StatusOK, gin.H{
        "status":     "started",
        "pid":        pid,
        "lapi_ready": lapiReady,
    })
}

Key Technical Details:

Non-blocking: The Start() handler waits for LAPI but has a timeout
Health check: Uses cscli lapi status (exit code 0 = healthy)
Retry logic: Polls every 500ms instead of continuous checks (reduces CPU)
Timeout: 30 seconds maximum wait (prevents infinite loops)
Graceful degradation: Returns lapi_ready: false instead of failing if timeout exceeded

LAPI Health Endpoint:

LAPI exposes a health endpoint on http://localhost:8085/health:

curl -s http://localhost:8085/health

Response when healthy:

{"status":"up"}

This endpoint is used internally by cscli lapi status.

How to Enable CrowdSec

Step 1: Access Security Dashboard

Navigate to Security in the sidebar
Find the CrowdSec card
Toggle the switch to ON
Wait 10-15 seconds for LAPI to start
Verify status shows "Active" with a running PID

Step 2: Verify LAPI is Running

docker exec charon cscli lapi status

Expected output:

✓ You can successfully interact with Local API (LAPI)

Step 3: (Optional) Enroll in CrowdSec Console

Once LAPI is running, you can enroll your instance:

Go to Cerberus → CrowdSec
Enable the Console enrollment feature flag (if not already enabled)
Click Enroll with CrowdSec Console
Paste your enrollment token from crowdsec.net
Submit

Prerequisites for Console Enrollment:

✅ CrowdSec must be enabled via GUI toggle
✅ LAPI must be running (verify with cscli lapi status)
✅ Feature flag feature.crowdsec.console_enrollment must be enabled
✅ Valid enrollment token from crowdsec.net

⚠️ Important: Console enrollment requires an active LAPI connection. If LAPI is not running, the enrollment will appear successful locally but won't register on crowdsec.net.

Enrollment Retry Logic:

The console enrollment service automatically checks LAPI availability with retries:

Implementation (internal/services/console_enroll.go:218-246):

func (s *ConsoleEnrollmentService) checkLAPIAvailable(ctx context.Context) error {
    maxRetries := 3
    retryDelay := 2 * time.Second

    for i := 0; i < maxRetries; i++ {
        checkCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
        defer cancel()

        _, err := s.exec.ExecuteWithEnv(checkCtx, "cscli", []string{"lapi", "status"}, nil)
        if err == nil {
            return nil // LAPI is available
        }

        if i < maxRetries-1 {
            logger.Log().WithError(err).WithField("attempt", i+1).Debug("LAPI not ready, retrying")
            time.Sleep(retryDelay)
        }
    }

    return fmt.Errorf("CrowdSec Local API is not running after %d attempts", maxRetries)
}

Retry Parameters:

Max retries: 3 attempts
Retry delay: 2 seconds between attempts
Total retry window: Up to 6 seconds (3 attempts × 2 seconds)
Command timeout: 5 seconds per attempt

Retry Flow:

Attempt 1 — Immediate LAPI check
Wait 2 seconds (if failed)
Attempt 2 — Retry LAPI check
Wait 2 seconds (if failed)
Attempt 3 — Final LAPI check
Return error — If all 3 attempts fail

This handles most race conditions where LAPI is still initializing after CrowdSec start.

How CrowdSec Works in Charon

Startup Flow:

Container starts → CrowdSec config initialized (but agent NOT started)
User toggles CrowdSec switch in GUI → Frontend calls /api/v1/admin/crowdsec/start
Backend handler starts LAPI process → PID tracked in backend
User can verify status in Security dashboard
User toggles OFF → Backend calls /api/v1/admin/crowdsec/stop

This matches the pattern used by other security features:

Feature	Control Method	Status Endpoint	Lifecycle Handler
Cerberus	GUI Toggle	`/security/status`	N/A (master switch)
WAF	GUI Toggle	`/security/status`	Config regeneration
ACL	GUI Toggle	`/security/status`	Config regeneration
Rate Limit	GUI Toggle	`/security/status`	Config regeneration
CrowdSec	✅ GUI Toggle	`/security/status`	Start/Stop handlers

Import Config Feature

The import config feature (importCrowdsecConfig) allows you to:

Upload a complete CrowdSec configuration (tar.gz)
Import pre-configured settings, collections, and bouncers
Manage CrowdSec entirely through Charon's GUI

This replaced the need for "external" mode:

Old way (deprecated): Set CROWDSEC_MODE=external and point to external LAPI
New way: Import your existing config and let Charon manage it internally

Troubleshooting

Problem: Console enrollment shows "enrolled" locally but doesn't appear on crowdsec.net

Technical Analysis: LAPI must be fully initialized before enrollment. Even with automatic retries, there's a window where LAPI might not be ready.

Solution:

Verify LAPI process is running:

docker exec charon ps aux | grep crowdsec

Expected output:

crowdsec  203  0.5  2.3  /usr/local/bin/crowdsec -c /app/data/crowdsec/config/config.yaml

Check LAPI status:

docker exec charon cscli lapi status

Expected output:

✓ You can successfully interact with Local API (LAPI)

If not ready:

ERROR: cannot contact local API

Check LAPI health endpoint:

docker exec charon curl -s http://localhost:8085/health

Expected response:

{"status":"up"}

Check LAPI can process requests:

docker exec charon cscli machines list

Expected output:

Name                     IP Address      Auth Type    Version
charon-local-machine     127.0.0.1      password     v1.x.x

If LAPI is not running:
- Go to Security dashboard
- Toggle CrowdSec OFF, then ON again
- Wait 15 seconds (critical: LAPI needs time to initialize)
- Verify LAPI is running (repeat checks above)
- Re-submit enrollment token
Monitor LAPI startup:
```
# Watch CrowdSec logs in real-time
docker logs -f charon | grep -i crowdsec
```
Look for:
- ✅ "Starting CrowdSec Local API"
- ✅ "CrowdSec Local API listening on 127.0.0.1:8085"
- ✅ "parsers loaded: 4"
- ✅ "scenarios loaded: 46"
- ❌ "error" or "fatal" (indicates startup problem)

Problem: CrowdSec won't start after toggling

Solution:

Check logs for errors:

docker logs charon | grep -i error | tail -20

Common startup issues:

Issue: Config directory missing

# Check directory exists
docker exec charon ls -la /app/data/crowdsec/config

# If missing, restart container to regenerate
docker compose restart

Issue: Port conflict (8085 in use)

# Check port usage
docker exec charon netstat -tulpn | grep 8085

# If another process is using port 8085, stop it or change CrowdSec LAPI port

Issue: Permission errors

# Fix ownership (run on host machine)
sudo chown -R 1000:1000 ./data/crowdsec
docker compose restart

Remove deprecated environment variables:

Edit docker-compose.yml and remove:

# REMOVE THESE DEPRECATED VARIABLES:
- CHARON_SECURITY_CROWDSEC_MODE=local
- CERBERUS_SECURITY_CROWDSEC_MODE=local
- CPM_SECURITY_CROWDSEC_MODE=local

Then restart:

docker compose down
docker compose up -d

Verify CrowdSec binary exists:

docker exec charon which crowdsec
# Expected: /usr/local/bin/crowdsec

docker exec charon which cscli
# Expected: /usr/local/bin/cscli

Expected LAPI Startup Times:

Initial start: 5-10 seconds
First start after container restart: 10-15 seconds
With many scenarios/parsers: Up to 20 seconds
Maximum timeout: 30 seconds (Start() handler limit)

Performance Monitoring:

# Check CrowdSec resource usage
docker exec charon ps aux | grep crowdsec

# Check LAPI response time
time docker exec charon curl -s http://localhost:8085/health

# Monitor LAPI availability over time
watch -n 5 'docker exec charon cscli lapi status'

Security Decisions

The SecurityDecision table logs all security actions:

type SecurityDecision struct {
    ID        uint      `gorm:"primaryKey"`
    Source    string    `json:"source"`    // waf, crowdsec, acl, ratelimit, manual
    IPAddress string    `json:"ip_address"`
    Action    string    `json:"action"`    // allow, block, challenge
    Reason    string    `json:"reason"`
    Timestamp time.Time `json:"timestamp"`
}

Use cases:

Audit trail for compliance
UI visibility into recent blocks
Manual override tracking

Self-Lockout Prevention

Admin Whitelist

Purpose: Prevent admins from blocking themselves

Implementation:

Stored in SecurityConfig.admin_whitelist as CSV
Checked before applying any block decision
If requesting IP matches whitelist → always allow

Recommendation: Add your VPN IP, Tailscale IP, or home network before enabling Cerberus.

Break-Glass Token

Purpose: Emergency disable when locked out

How it works:

Generate via POST /api/v1/security/breakglass/generate
Returns one-time token (plaintext, never stored hashed)
Token can be used in POST /api/v1/security/disable to turn off Cerberus
Token expires after first use

Storage: Tokens are hashed in database using bcrypt.

Localhost Bypass

Requests from 127.0.0.1 or ::1 may bypass security checks (configurable). Allows local management access even when locked out.

API Reference

Status

GET /api/v1/security/status

Returns:

{
  "enabled": true,
  "waf_mode": "monitor",
  "crowdsec_mode": "local",
  "acl_enabled": true,
  "ratelimit_enabled": false
}

Enable Cerberus

POST /api/v1/security/enable
Content-Type: application/json

{
  "admin_whitelist": "198.51.100.10,203.0.113.0/24"
}

Requires either:

admin_whitelist with at least one IP/CIDR
OR valid break-glass token in header

Disable Cerberus

POST /api/v1/security/disable

Requires either:

Request from localhost
OR valid break-glass token in header

Get/Update Config

GET /api/v1/security/config
POST /api/v1/security/config

See SecurityConfig schema above.

Rulesets

GET /api/v1/security/rulesets
POST /api/v1/security/rulesets
DELETE /api/v1/security/rulesets/:id

Decisions (Audit Log)

GET /api/v1/security/decisions?limit=50
POST /api/v1/security/decisions  # Manual override

Testing

Integration Test

Run the Coraza integration test:

bash scripts/coraza_integration.sh

Or via Go:

cd backend
go test -tags=integration ./integration -run TestCorazaIntegration -v

Manual Testing

Enable WAF in monitor mode
Send request with <script> in query string
Check /api/v1/security/decisions for logged attempt
Switch to block mode
Repeat — should receive 403

Observability

Recommended Dashboards

Block Rate:

rate(charon_waf_blocked_total[5m]) / rate(charon_waf_requests_total[5m])

Monitor vs Block Comparison:

rate(charon_waf_monitored_total[5m])
rate(charon_waf_blocked_total[5m])

Alerting Rules

High block rate (potential attack):

alert: HighWAFBlockRate
expr: rate(charon_waf_blocked_total[5m]) > 0.3
for: 10m
annotations:
  summary: "WAF blocking >30% of requests"

No WAF evaluation (misconfiguration):

alert: WAFNotEvaluating
expr: rate(charon_waf_requests_total[10m]) == 0
for: 15m
annotations:
  summary: "WAF received zero requests, check middleware config"

Development Roadmap

Phase	Feature	Status
1	WAF placeholder + metrics	✅ Complete
2	ACL implementation	✅ Complete
3	Break-glass token	✅ Complete
4	Coraza CRS integration	📋 Planned
5	CrowdSec local agent	📋 Planned
6	Rate limiting enforcement	📋 Planned
7	Adaptive learning/tuning	🔮 Future

FAQ

Why is the WAF just a placeholder?

We wanted to ship the architecture and observability first. This lets you enable monitoring, see the metrics, and prepare dashboards before the full rule engine is integrated.

Can I use my own WAF rules?

Yes, via /api/v1/security/rulesets. Upload custom Coraza-compatible rules.

Does Cerberus protect Caddy's proxy traffic?

Not yet. Currently it only protects the management API (/api/v1). Future versions will integrate directly with Caddy's request pipeline to protect proxied traffic.

Why is monitor mode still blocking?

Known issue with the placeholder implementation. This will be fixed when Coraza integration is complete.

24 KiB Raw Blame History Unescape Escape

Cerberus Technical Documentation

What Is Cerberus?

Architecture

Request Flow

Middleware Integration

Threat Model & Protection Coverage

What Cerberus Protects

Zero-Day Exploit Protection (WAF)

Request Processing Pipeline

Configuration Model

Database Schema

Environment Variables (Fallbacks)

WAF (Web Application Firewall)

Current Implementation

Future: Coraza Integration

Rulesets

Prometheus Metrics

Structured Logging

Access Control Lists (ACLs)

How They Work

GeoIP Database

CrowdSec Integration

GUI-Based Control (Current Architecture)

LAPI Initialization and Health Checks

How to Enable CrowdSec

How CrowdSec Works in Charon

Import Config Feature

Troubleshooting

Security Decisions

Self-Lockout Prevention

Admin Whitelist

Break-Glass Token

Localhost Bypass

API Reference

Status

Enable Cerberus

Disable Cerberus

Get/Update Config

Rulesets

Decisions (Audit Log)

Testing

Integration Test

Manual Testing

Observability

Recommended Dashboards

Alerting Rules

Development Roadmap

FAQ

Why is the WAF just a placeholder?

Can I use my own WAF rules?

Does Cerberus protect Caddy's proxy traffic?

Why is monitor mode still blocking?

See Also

24 KiB

Raw Blame History