24 KiB
Cerberus Technical Documentation
This document is for developers and advanced users who want to understand how Cerberus works under the hood.
Looking for the user guide? See Security Features instead.
What Is Cerberus?
Cerberus is the optional security suite built into Charon. It includes:
- WAF (Web Application Firewall) — Inspects requests for malicious payloads
- CrowdSec — Blocks IPs based on behavior and reputation
- Access Lists — Static allow/deny rules (IP, CIDR, geo)
- Rate Limiting — Volume-based abuse prevention (placeholder)
All components are disabled by default and can be enabled independently.
Architecture
Request Flow
When a request hits Charon:
- Check if Cerberus is enabled (global setting + dynamic database flag)
- WAF evaluation (if
waf_mode != disabled)- Increment
charon_waf_requests_totalmetric - Check payload against loaded rulesets
- If suspicious:
blockmode: Return 403 + incrementcharon_waf_blocked_totalmonitormode: Log + incrementcharon_waf_monitored_total
- Increment
- ACL evaluation (if enabled)
- Test client IP against active access lists
- First denial = 403 response
- CrowdSec check (placeholder for future)
- Rate limit check (placeholder for future)
- Pass to downstream handler (if not blocked)
Middleware Integration
Cerberus runs as Gin middleware on all /api/v1 routes:
r.Use(cerberusMiddleware.RequestLogger())
This means it protects the management API but does not directly inspect traffic to proxied websites (that happens in Caddy).
Threat Model & Protection Coverage
What Cerberus Protects
| Threat Category | CrowdSec | ACL | WAF | Rate Limit |
|---|---|---|---|---|
| Known attackers (IP reputation) | ✅ | ❌ | ❌ | ❌ |
| Geo-based attacks | ❌ | ✅ | ❌ | ❌ |
| SQL Injection (SQLi) | ❌ | ❌ | ✅ | ❌ |
| Cross-Site Scripting (XSS) | ❌ | ❌ | ✅ | ❌ |
| Remote Code Execution (RCE) | ❌ | ❌ | ✅ | ❌ |
| Zero-Day Web Exploits | ⚠️ | ❌ | ✅ | ❌ |
| DDoS / Volume attacks | ❌ | ❌ | ❌ | ✅ |
| Brute-force login attempts | ✅ | ❌ | ❌ | ✅ |
| Credential stuffing | ✅ | ❌ | ❌ | ✅ |
Legend:
- ✅ Full protection
- ⚠️ Partial protection (time-delayed)
- ❌ Not designed for this threat
Zero-Day Exploit Protection (WAF)
The WAF provides pattern-based detection for zero-day exploits:
How It Works:
- Attacker discovers new vulnerability (e.g., SQLi in your login form)
- Attacker crafts exploit:
' OR 1=1-- - WAF inspects request → matches SQL injection pattern → BLOCKED
- Your application never sees the malicious input
Limitations:
- Only protects HTTP/HTTPS traffic
- Cannot detect completely novel attack patterns (rare)
- Does not protect against logic bugs in application code
Effectiveness:
- ~90% of zero-day web exploits use known patterns (SQLi, XSS, RCE)
- ~10% are truly novel and may bypass WAF until rules are updated
Request Processing Pipeline
1. [CrowdSec] Check IP reputation → Block if known attacker
2. [ACL] Check IP/Geo rules → Block if not allowed
3. [WAF] Inspect request payload → Block if malicious pattern
4. [Rate Limit] Count requests → Block if too many
5. [Proxy] Forward to upstream service
Configuration Model
Database Schema
SecurityConfig table:
type SecurityConfig struct {
ID uint `gorm:"primaryKey"`
Name string `json:"name"`
Enabled bool `json:"enabled"`
AdminWhitelist string `json:"admin_whitelist"` // CSV of IPs/CIDRs
CrowdsecMode string `json:"crowdsec_mode"` // disabled, local, external
CrowdsecAPIURL string `json:"crowdsec_api_url"`
CrowdsecAPIKey string `json:"crowdsec_api_key"`
WafMode string `json:"waf_mode"` // disabled, monitor, block
WafRulesSource string `json:"waf_rules_source"` // Ruleset identifier
WafLearning bool `json:"waf_learning"`
RateLimitEnable bool `json:"rate_limit_enable"`
RateLimitBurst int `json:"rate_limit_burst"`
RateLimitRequests int `json:"rate_limit_requests"`
RateLimitWindowSec int `json:"rate_limit_window_sec"`
}
Environment Variables (Fallbacks)
If no database config exists, Charon reads from environment:
CERBERUS_SECURITY_WAF_MODE—disabled|monitor|block- 🚨 DEPRECATED:
CERBERUS_SECURITY_CROWDSEC_MODE— Use GUI toggle instead (see below) - 🚨 DEPRECATED:
CERBERUS_SECURITY_CROWDSEC_API_URL— External mode is no longer supported - 🚨 DEPRECATED:
CERBERUS_SECURITY_CROWDSEC_API_KEY— External mode is no longer supported CERBERUS_SECURITY_ACL_ENABLED—true|falseCERBERUS_SECURITY_RATELIMIT_ENABLED—true|false
⚠️ IMPORTANT: The CHARON_SECURITY_CROWDSEC_MODE (and legacy CERBERUS_SECURITY_CROWDSEC_MODE, CPM_SECURITY_CROWDSEC_MODE) environment variables are DEPRECATED as of version 2.0. CrowdSec is now GUI-controlled through the Security dashboard, just like WAF, ACL, and Rate Limiting.
Why the change?
- CrowdSec now works like all other security features (GUI-based)
- No need to restart containers to enable/disable CrowdSec
- Better integration with Charon's security orchestration
- The import config feature replaced the need for external mode
Migration: If you have CHARON_SECURITY_CROWDSEC_MODE=local in your docker-compose.yml, remove it and use the GUI toggle instead. See Migration Guide for step-by-step instructions.
WAF (Web Application Firewall)
Current Implementation
Status: Prototype with placeholder detection
The current WAF checks for <script> tags as a proof-of-concept. Full OWASP CRS integration is planned.
func (w *WAF) EvaluateRequest(r *http.Request) (Decision, error) {
if strings.Contains(r.URL.Query().Get("q"), "<script>") {
return Decision{Action: "block", Reason: "XSS detected"}, nil
}
return Decision{Action: "allow"}, nil
}
Future: Coraza Integration
Planned integration with Coraza WAF and OWASP Core Rule Set:
waf, err := coraza.NewWAF(coraza.NewWAFConfig().
WithDirectives(loadedRuleContent))
This will provide production-grade detection of:
- SQL injection
- Cross-site scripting (XSS)
- Remote code execution
- File inclusion attacks
- And more
Rulesets
SecurityRuleSet table stores rule definitions:
type SecurityRuleSet struct {
ID uint `gorm:"primaryKey"`
Name string `json:"name"`
SourceURL string `json:"source_url"` // Optional URL for rule updates
Mode string `json:"mode"` // owasp, custom
Content string `json:"content"` // Raw rule text
}
Manage via /api/v1/security/rulesets.
Prometheus Metrics
charon_waf_requests_total{mode="block|monitor"} — Total requests evaluated
charon_waf_blocked_total{mode="block"} — Requests blocked
charon_waf_monitored_total{mode="monitor"} — Requests logged but not blocked
Scrape from /metrics endpoint (no auth required).
Structured Logging
WAF decisions emit JSON-like structured logs:
{
"source": "waf",
"decision": "block",
"mode": "block",
"path": "/api/v1/proxy-hosts",
"query": "name=<script>alert(1)</script>",
"ip": "203.0.113.50"
}
Use these for dashboard creation and alerting.
Access Control Lists (ACLs)
How They Work
Each AccessList defines:
- Type:
whitelist|blacklist|geo_whitelist|geo_blacklist|local_only - IPs: Comma-separated IPs or CIDR blocks
- Countries: Comma-separated ISO country codes (US, GB, FR, etc.)
Evaluation logic:
- Whitelist: If IP matches list → allow; else → deny
- Blacklist: If IP matches list → deny; else → allow
- Geo Whitelist: If country matches → allow; else → deny
- Geo Blacklist: If country matches → deny; else → allow
- Local Only: If RFC1918 private IP → allow; else → deny
Multiple ACLs can be assigned to a proxy host. The first denial wins.
GeoIP Database
Uses MaxMind GeoLite2-Country database:
- Path configured via
CHARON_GEOIP_DB_PATH - Default:
/app/data/GeoLite2-Country.mmdb(Docker) - Update monthly from MaxMind for accuracy
CrowdSec Integration
GUI-Based Control (Current Architecture)
CrowdSec is now GUI-controlled, matching the pattern used by WAF, ACL, and Rate Limiting. The environment variable control (CHARON_SECURITY_CROWDSEC_MODE) is deprecated and will be removed in a future version.
LAPI Initialization and Health Checks
Technical Implementation:
When you toggle CrowdSec ON via the GUI, the backend performs the following:
-
Start CrowdSec Process (
/api/v1/admin/crowdsec/start)pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir) -
Poll LAPI Health (automatic, server-side)
- Polling interval: 500ms
- Maximum wait: 30 seconds
- Health check command:
cscli lapi status - Expected response: Exit code 0 (success)
-
Return Status with
lapi_readyFlag{ "status": "started", "pid": 203, "lapi_ready": true }
Response Fields:
status— "started" (process successfully initiated) or "error"pid— Process ID of running CrowdSec instancelapi_ready— Boolean indicating if LAPI health check passedtrue— LAPI is fully initialized and accepting requestsfalse— CrowdSec is running, but LAPI still initializing (may take 5-10 more seconds)
Backend Implementation (internal/handlers/crowdsec_handler.go:185-230):
func (h *CrowdsecHandler) Start(c *gin.Context) {
// Start the process
pid, err := h.Executor.Start(ctx, h.BinPath, h.DataDir)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
// Wait for LAPI to be ready (with timeout)
lapiReady := false
maxWait := 30 * time.Second
pollInterval := 500 * time.Millisecond
deadline := time.Now().Add(maxWait)
for time.Now().Before(deadline) {
checkCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()
_, err := h.CmdExec.Execute(checkCtx, "cscli", []string{"lapi", "status"})
if err == nil {
lapiReady = true
break
}
time.Sleep(pollInterval)
}
// Return status
c.JSON(http.StatusOK, gin.H{
"status": "started",
"pid": pid,
"lapi_ready": lapiReady,
})
}
Key Technical Details:
- Non-blocking: The Start() handler waits for LAPI but has a timeout
- Health check: Uses
cscli lapi status(exit code 0 = healthy) - Retry logic: Polls every 500ms instead of continuous checks (reduces CPU)
- Timeout: 30 seconds maximum wait (prevents infinite loops)
- Graceful degradation: Returns
lapi_ready: falseinstead of failing if timeout exceeded
LAPI Health Endpoint:
LAPI exposes a health endpoint on http://localhost:8085/health:
curl -s http://localhost:8085/health
Response when healthy:
{"status":"up"}
This endpoint is used internally by cscli lapi status.
How to Enable CrowdSec
Step 1: Access Security Dashboard
- Navigate to Security in the sidebar
- Find the CrowdSec card
- Toggle the switch to ON
- Wait 10-15 seconds for LAPI to start
- Verify status shows "Active" with a running PID
Step 2: Verify LAPI is Running
docker exec charon cscli lapi status
Expected output:
✓ You can successfully interact with Local API (LAPI)
Step 3: (Optional) Enroll in CrowdSec Console
Once LAPI is running, you can enroll your instance:
- Go to Cerberus → CrowdSec
- Enable the Console enrollment feature flag (if not already enabled)
- Click Enroll with CrowdSec Console
- Paste your enrollment token from crowdsec.net
- Submit
Prerequisites for Console Enrollment:
- ✅ CrowdSec must be enabled via GUI toggle
- ✅ LAPI must be running (verify with
cscli lapi status) - ✅ Feature flag
feature.crowdsec.console_enrollmentmust be enabled - ✅ Valid enrollment token from crowdsec.net
⚠️ Important: Console enrollment requires an active LAPI connection. If LAPI is not running, the enrollment will appear successful locally but won't register on crowdsec.net.
Enrollment Retry Logic:
The console enrollment service automatically checks LAPI availability with retries:
Implementation (internal/services/console_enroll.go:218-246):
func (s *ConsoleEnrollmentService) checkLAPIAvailable(ctx context.Context) error {
maxRetries := 3
retryDelay := 2 * time.Second
for i := 0; i < maxRetries; i++ {
checkCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
_, err := s.exec.ExecuteWithEnv(checkCtx, "cscli", []string{"lapi", "status"}, nil)
if err == nil {
return nil // LAPI is available
}
if i < maxRetries-1 {
logger.Log().WithError(err).WithField("attempt", i+1).Debug("LAPI not ready, retrying")
time.Sleep(retryDelay)
}
}
return fmt.Errorf("CrowdSec Local API is not running after %d attempts", maxRetries)
}
Retry Parameters:
- Max retries: 3 attempts
- Retry delay: 2 seconds between attempts
- Total retry window: Up to 6 seconds (3 attempts × 2 seconds)
- Command timeout: 5 seconds per attempt
Retry Flow:
- Attempt 1 — Immediate LAPI check
- Wait 2 seconds (if failed)
- Attempt 2 — Retry LAPI check
- Wait 2 seconds (if failed)
- Attempt 3 — Final LAPI check
- Return error — If all 3 attempts fail
This handles most race conditions where LAPI is still initializing after CrowdSec start.
How CrowdSec Works in Charon
Startup Flow:
- Container starts → CrowdSec config initialized (but agent NOT started)
- User toggles CrowdSec switch in GUI → Frontend calls
/api/v1/admin/crowdsec/start - Backend handler starts LAPI process → PID tracked in backend
- User can verify status in Security dashboard
- User toggles OFF → Backend calls
/api/v1/admin/crowdsec/stop
This matches the pattern used by other security features:
| Feature | Control Method | Status Endpoint | Lifecycle Handler |
|---|---|---|---|
| Cerberus | GUI Toggle | /security/status |
N/A (master switch) |
| WAF | GUI Toggle | /security/status |
Config regeneration |
| ACL | GUI Toggle | /security/status |
Config regeneration |
| Rate Limit | GUI Toggle | /security/status |
Config regeneration |
| CrowdSec | ✅ GUI Toggle | /security/status |
Start/Stop handlers |
Import Config Feature
The import config feature (importCrowdsecConfig) allows you to:
- Upload a complete CrowdSec configuration (tar.gz)
- Import pre-configured settings, collections, and bouncers
- Manage CrowdSec entirely through Charon's GUI
This replaced the need for "external" mode:
- Old way (deprecated): Set
CROWDSEC_MODE=externaland point to external LAPI - New way: Import your existing config and let Charon manage it internally
Troubleshooting
Problem: Console enrollment shows "enrolled" locally but doesn't appear on crowdsec.net
Technical Analysis: LAPI must be fully initialized before enrollment. Even with automatic retries, there's a window where LAPI might not be ready.
Solution:
-
Verify LAPI process is running:
docker exec charon ps aux | grep crowdsecExpected output:
crowdsec 203 0.5 2.3 /usr/local/bin/crowdsec -c /app/data/crowdsec/config/config.yaml -
Check LAPI status:
docker exec charon cscli lapi statusExpected output:
✓ You can successfully interact with Local API (LAPI)If not ready:
ERROR: cannot contact local API -
Check LAPI health endpoint:
docker exec charon curl -s http://localhost:8085/healthExpected response:
{"status":"up"} -
Check LAPI can process requests:
docker exec charon cscli machines listExpected output:
Name IP Address Auth Type Version charon-local-machine 127.0.0.1 password v1.x.x -
If LAPI is not running:
- Go to Security dashboard
- Toggle CrowdSec OFF, then ON again
- Wait 15 seconds (critical: LAPI needs time to initialize)
- Verify LAPI is running (repeat checks above)
- Re-submit enrollment token
-
Monitor LAPI startup:
# Watch CrowdSec logs in real-time docker logs -f charon | grep -i crowdsecLook for:
- ✅ "Starting CrowdSec Local API"
- ✅ "CrowdSec Local API listening on 127.0.0.1:8085"
- ✅ "parsers loaded: 4"
- ✅ "scenarios loaded: 46"
- ❌ "error" or "fatal" (indicates startup problem)
Problem: CrowdSec won't start after toggling
Solution:
-
Check logs for errors:
docker logs charon | grep -i error | tail -20 -
Common startup issues:
Issue: Config directory missing
# Check directory exists docker exec charon ls -la /app/data/crowdsec/config # If missing, restart container to regenerate docker compose restartIssue: Port conflict (8085 in use)
# Check port usage docker exec charon netstat -tulpn | grep 8085 # If another process is using port 8085, stop it or change CrowdSec LAPI portIssue: Permission errors
# Fix ownership (run on host machine) sudo chown -R 1000:1000 ./data/crowdsec docker compose restart -
Remove deprecated environment variables:
Edit
docker-compose.ymland remove:# REMOVE THESE DEPRECATED VARIABLES: - CHARON_SECURITY_CROWDSEC_MODE=local - CERBERUS_SECURITY_CROWDSEC_MODE=local - CPM_SECURITY_CROWDSEC_MODE=localThen restart:
docker compose down docker compose up -d -
Verify CrowdSec binary exists:
docker exec charon which crowdsec # Expected: /usr/local/bin/crowdsec docker exec charon which cscli # Expected: /usr/local/bin/cscli
Expected LAPI Startup Times:
- Initial start: 5-10 seconds
- First start after container restart: 10-15 seconds
- With many scenarios/parsers: Up to 20 seconds
- Maximum timeout: 30 seconds (Start() handler limit)
Performance Monitoring:
# Check CrowdSec resource usage
docker exec charon ps aux | grep crowdsec
# Check LAPI response time
time docker exec charon curl -s http://localhost:8085/health
# Monitor LAPI availability over time
watch -n 5 'docker exec charon cscli lapi status'
See also: CrowdSec Troubleshooting Guide
Security Decisions
The SecurityDecision table logs all security actions:
type SecurityDecision struct {
ID uint `gorm:"primaryKey"`
Source string `json:"source"` // waf, crowdsec, acl, ratelimit, manual
IPAddress string `json:"ip_address"`
Action string `json:"action"` // allow, block, challenge
Reason string `json:"reason"`
Timestamp time.Time `json:"timestamp"`
}
Use cases:
- Audit trail for compliance
- UI visibility into recent blocks
- Manual override tracking
Self-Lockout Prevention
Admin Whitelist
Purpose: Prevent admins from blocking themselves
Implementation:
- Stored in
SecurityConfig.admin_whitelistas CSV - Checked before applying any block decision
- If requesting IP matches whitelist → always allow
Recommendation: Add your VPN IP, Tailscale IP, or home network before enabling Cerberus.
Break-Glass Token
Purpose: Emergency disable when locked out
How it works:
- Generate via
POST /api/v1/security/breakglass/generate - Returns one-time token (plaintext, never stored hashed)
- Token can be used in
POST /api/v1/security/disableto turn off Cerberus - Token expires after first use
Storage: Tokens are hashed in database using bcrypt.
Localhost Bypass
Requests from 127.0.0.1 or ::1 may bypass security checks (configurable). Allows local management access even when locked out.
API Reference
Status
GET /api/v1/security/status
Returns:
{
"enabled": true,
"waf_mode": "monitor",
"crowdsec_mode": "local",
"acl_enabled": true,
"ratelimit_enabled": false
}
Enable Cerberus
POST /api/v1/security/enable
Content-Type: application/json
{
"admin_whitelist": "198.51.100.10,203.0.113.0/24"
}
Requires either:
admin_whitelistwith at least one IP/CIDR- OR valid break-glass token in header
Disable Cerberus
POST /api/v1/security/disable
Requires either:
- Request from localhost
- OR valid break-glass token in header
Get/Update Config
GET /api/v1/security/config
POST /api/v1/security/config
See SecurityConfig schema above.
Rulesets
GET /api/v1/security/rulesets
POST /api/v1/security/rulesets
DELETE /api/v1/security/rulesets/:id
Decisions (Audit Log)
GET /api/v1/security/decisions?limit=50
POST /api/v1/security/decisions # Manual override
Testing
Integration Test
Run the Coraza integration test:
bash scripts/coraza_integration.sh
Or via Go:
cd backend
go test -tags=integration ./integration -run TestCorazaIntegration -v
Manual Testing
- Enable WAF in
monitormode - Send request with
<script>in query string - Check
/api/v1/security/decisionsfor logged attempt - Switch to
blockmode - Repeat — should receive 403
Observability
Recommended Dashboards
Block Rate:
rate(charon_waf_blocked_total[5m]) / rate(charon_waf_requests_total[5m])
Monitor vs Block Comparison:
rate(charon_waf_monitored_total[5m])
rate(charon_waf_blocked_total[5m])
Alerting Rules
High block rate (potential attack):
alert: HighWAFBlockRate
expr: rate(charon_waf_blocked_total[5m]) > 0.3
for: 10m
annotations:
summary: "WAF blocking >30% of requests"
No WAF evaluation (misconfiguration):
alert: WAFNotEvaluating
expr: rate(charon_waf_requests_total[10m]) == 0
for: 15m
annotations:
summary: "WAF received zero requests, check middleware config"
Development Roadmap
| Phase | Feature | Status |
|---|---|---|
| 1 | WAF placeholder + metrics | ✅ Complete |
| 2 | ACL implementation | ✅ Complete |
| 3 | Break-glass token | ✅ Complete |
| 4 | Coraza CRS integration | 📋 Planned |
| 5 | CrowdSec local agent | 📋 Planned |
| 6 | Rate limiting enforcement | 📋 Planned |
| 7 | Adaptive learning/tuning | 🔮 Future |
FAQ
Why is the WAF just a placeholder?
We wanted to ship the architecture and observability first. This lets you enable monitoring, see the metrics, and prepare dashboards before the full rule engine is integrated.
Can I use my own WAF rules?
Yes, via /api/v1/security/rulesets. Upload custom Coraza-compatible rules.
Does Cerberus protect Caddy's proxy traffic?
Not yet. Currently it only protects the management API (/api/v1). Future versions will integrate directly with Caddy's request pipeline to protect proxied traffic.
Why is monitor mode still blocking?
Known issue with the placeholder implementation. This will be fixed when Coraza integration is complete.