feat(docs): enhance documentation for Cerberus security suite, WAF configuration, and API endpoints

2025-12-02 03:05:57 +00:00
parent 34347b1ff5
commit d1731f81dd
5 changed files with 367 additions and 8 deletions
--- a/.github/agents/Doc_Writer.agent.md
+++ b/.github/agents/Doc_Writer.agent.md
@@ -10,7 +10,7 @@ You value clarity, brevity, and accuracy. You translate "Engineer Speak" into "U
 <context>
 - **Project**: Charon
 - **Docs Location**: `docs/` folder and `docs/features.md`.
- **Style**: Professional, concise, using the existing markdown structure.
+- **Style**: Professional, concise, but also with the novice home user in mind. Use and "explain it like i'm five" language style. Use the existing markdown structure.
 </context>

 <workflow>
--- a/docs/api.md
+++ b/docs/api.md
@@ -51,6 +51,23 @@ Authorization: Bearer <token>

 ## Endpoints

+### Metrics (Prometheus)
+
+Expose internal counters for scraping.
+
+```http
+GET /metrics
+```
+
+No authentication required. Primary WAF metrics:
+```text
+charon_waf_requests_total
+charon_waf_blocked_total
+charon_waf_monitored_total
+```
+
+---
+
 ### Health Check

 Check API health status.
@@ -68,6 +85,108 @@ GET /health

 ---

+### Security Suite (Cerberus)
+
+#### Status
+```http
+GET /security/status
+```
+Returns enabled flag plus modes for each module.
+
+#### Get Global Security Config
+```http
+GET /security/config
+```
+Response 200 (no config yet): `{ "config": null }`
+
+#### Upsert Global Security Config
+```http
+POST /security/config
+Content-Type: application/json
+```
+Request Body (example):
+```json
+{
+  "name": "default",
+  "enabled": true,
+  "admin_whitelist": "198.51.100.10,203.0.113.0/24",
+  "crowdsec_mode": "local",
+  "waf_mode": "monitor",
+  "waf_rules_source": "owasp-crs-local"
+}
+```
+Response 200: `{ "config": { ... } }`
+
+#### Enable Cerberus
+```http
+POST /security/enable
+```
+Payload (optional break-glass token):
+```json
+{ "break_glass_token": "abcd1234" }
+```
+
+#### Disable Cerberus
+```http
+POST /security/disable
+```
+Payload (required if not localhost):
+```json
+{ "break_glass_token": "abcd1234" }
+```
+
+#### Generate Break-Glass Token
+```http
+POST /security/breakglass/generate
+```
+Response 200: `{ "token": "plaintext-token-once" }`
+
+#### List Security Decisions
+```http
+GET /security/decisions?limit=50
+```
+Response 200: `{ "decisions": [ ... ] }`
+
+#### Create Manual Decision
+```http
+POST /security/decisions
+Content-Type: application/json
+```
+Payload:
+```json
+{ "ip": "203.0.113.5", "action": "block", "details": "manual temporary block" }
+```
+
+#### List Rulesets
+```http
+GET /security/rulesets
+```
+Response 200: `{ "rulesets": [ ... ] }`
+
+#### Upsert Ruleset
+```http
+POST /security/rulesets
+Content-Type: application/json
+```
+Payload:
+```json
+{
+  "name": "owasp-crs-quick",
+  "source_url": "https://example.com/owasp-crs.txt",
+  "mode": "owasp",
+  "content": "# raw rules"
+}
+```
+Response 200: `{ "ruleset": { ... } }`
+
+#### Delete Ruleset
+```http
+DELETE /security/rulesets/:id
+```
+Response 200: `{ "deleted": true }`
+
+---
+
 ### Proxy Hosts

 #### List All Proxy Hosts
--- a/docs/cerberus.md
+++ b/docs/cerberus.md
@@ -0,0 +1,137 @@
+# Cerberus Security Suite
+
+Cerberus is Charon's optional, modular security layer bundling a lightweight WAF pipeline, CrowdSec integration, Access Control Lists (ACLs), and future rate limiting. It focuses on *ease of enablement*, *observability first*, and *gradual enforcement* so home and small business users avoid accidental lockouts.
+
+---
+## Architecture Overview
+
+Cerberus sits as a Gin middleware applied to all `/api/v1` routes (and indirectly protects reverse proxy management workflows). Components:
+
+| Component | Purpose | Current Status |
+| :--- | :--- | :--- |
+| WAF | Inspect requests, detect payload signatures, optionally block | Prototype (placeholder `<script>` detection) |
+| CrowdSec | Behavior & reputation-based IP decisions | Local agent planned; mode wiring present |
+| ACL | Static allow/deny (IP, CIDR, geo) per host | Implemented (evaluates active lists) |
+| Rate Limiting | Volume-based abuse prevention | Placeholder (API + config stub) |
+| Decisions & Audit | Persist actions for UI visibility | Implemented models + listing |
+| Rulesets | Persist rule content/metadata for dynamic WAF config | CRUD implemented |
+| Break-Glass | Emergency disable token generation & verification | Implemented |
+
+### Request Flow (Simplified)
+1. Cerberus `IsEnabled()` checks global flags and dynamic DB setting.
+2. WAF (if `waf_mode != disabled`) increments `charon_waf_requests_total` and evaluates payload.
+3. If suspicious and in `block` mode (design intent), reject with JSON error; otherwise log & continue in `monitor`.
+4. ACL evaluation (if enabled) tests client IP against active lists; may 403.
+5. CrowdSec & Rate Limit placeholders reserved for future enforcement phases.
+6. Downstream handler runs if not aborted.
+
+> Note: Current prototype blocks suspicious payloads even in `monitor` mode; future refinement will ensure true log-only behavior. Monitor first for safe rollout.
+
+---
+## Configuration Model
+
+Global config persisted via `/api/v1/security/config` matches `SecurityConfig`:
+```json
+{
+  "name": "default",
+  "enabled": true,
+  "admin_whitelist": "198.51.100.10,203.0.113.0/24",
+  "crowdsec_mode": "local",
+  "waf_mode": "monitor",
+  "waf_rules_source": "owasp-crs-local",
+  "waf_learning": true,
+  "rate_limit_enable": false,
+  "rate_limit_burst": 0,
+  "rate_limit_requests": 0,
+  "rate_limit_window_sec": 0
+}
+```
+
+Environment variables (fallback defaults) mirror these settings (`CERBERUS_SECURITY_WAF_MODE`, etc.). Runtime enable/disable uses `/security/enable` & `/security/disable` with whitelist or break-glass validation.
+
+---
+## WAF Details
+
+| Field | Meaning |
+| :--- | :--- |
+| `waf_mode` | `disabled`, `monitor`, `block` |
+| `waf_rules_source` | Identifier or URL for ruleset content |
+| `waf_learning` | Flag for future adaptive tuning |
+
+Metrics (Prometheus):
+```
+charon_waf_requests_total
+charon_waf_blocked_total
+charon_waf_monitored_total
+```
+Structured log fields:
+```
+source: "waf"
+decision: "block" | "monitor"
+mode: "block" | "monitor" | "disabled"
+path: request path
+query: raw query string
+```
+
+Rulesets (`SecurityRuleSet`) are managed via `/security/rulesets` and store raw rule `content` plus metadata (`name`, `source_url`, `mode`). The Caddy manager applies changes after upsert/delete.
+
+---
+## Access Control Lists
+
+Each ACL defines IP/Geo whitelist/blacklist semantics. Cerberus iterates enabled lists and calls `AccessListService.TestIP()`; the first denial aborts with 403. Use ACLs for *static* restrictions (internal-only, geofencing) and rely on CrowdSec / rate limiting for dynamic attacker behavior.
+
+---
+## Decisions & Auditing
+
+`SecurityDecision` captures source (`waf`, `crowdsec`, `ratelimit`, `manual`), action (`allow`, `block`, `challenge`), and context. Manual overrides are created via `POST /security/decisions`. Audit entries (`SecurityAudit`) record actor + action for UI timelines (future visualization).
+
+---
+## Break-Glass & Lockout Prevention
+
+- Include at least one trusted IP/CIDR in `admin_whitelist` before enabling.
+- Generate a token with `POST /security/breakglass/generate`; store securely.
+- Disable from localhost without token for emergency local access.
+
+Rollout path:
+1. Set `waf_mode=monitor`.
+2. Observe metrics & logs; tune rulesets.
+3. Add `admin_whitelist` entries.
+4. Switch to `block`.
+
+---
+## Observability Patterns
+
+Suggested PromQL ideas:
+- Block Rate: `rate(charon_waf_blocked_total[5m]) / rate(charon_waf_requests_total[5m])`
+- Monitor Volume: `rate(charon_waf_monitored_total[5m])`
+- Drift After Enforcement: Compare block vs monitor trend pre/post switch.
+
+Alerting:
+- High block rate spike (>30% sustained 10m)
+- Zero evaluations (requests counter flat) indicating middleware misconfiguration
+
+---
+## Roadmap Phases
+
+| Phase | Focus | Status |
+| :--- | :--- | :--- |
+| 1 | WAF prototype + observability | Complete |
+| 2 | CrowdSec local agent integration | Pending |
+| 3 | True WAF rule evaluation (Coraza CRS load) | Pending |
+| 4 | Rate limiting enforcement | Pending |
+| 5 | Advanced dashboards + adaptive learning | Planned |
+
+---
+## FAQ
+
+**Why monitor before block?** Prevent accidental service impact; gather baseline.
+
+**Can I scrape `/metrics` securely?** Place behind network-level controls or reverse proxy requiring auth; endpoint itself is unauthenticated for simplicity.
+
+**Does monitor mode block today?** Prototype still blocks suspicious `<script>` payloads; this will change to pure logging in a future refinement.
+
+---
+## See Also
+- [Security Overview](security.md)
+- [Features](features.md)
+- [API Reference](api.md)
--- a/docs/features.md
+++ b/docs/features.md
@@ -14,8 +14,22 @@ Block malicious IPs automatically using community-driven threat intelligence. Cr
 → [Learn more about CrowdSec](https://www.crowdsec.net/)

 ### Web Application Firewall (WAF)
-Protect your applications from common web attacks like SQL injection and cross-site scripting using Coraza WAF, an enterprise-grade firewall built into Caddy.
-→ [Learn more about Coraza WAF](https://coraza.io/)
+Protect your applications from common web attacks like SQL injection and cross-site scripting using the integrated (placeholder) Coraza WAF pipeline.
+
+**Global Modes**:
+- `disabled` – WAF not evaluated.
+- `monitor` – Evaluate & log every request (increment Prometheus counters) without blocking.
+- `block` – Enforce rules (suspicious payloads are rejected; counters increment).
+
+**Observability**:
+- Prometheus counters: `charon_waf_requests_total`, `charon_waf_blocked_total`, `charon_waf_monitored_total`.
+- Structured logs: fields `source=waf`, `decision=block|monitor`, `mode`, `path`, `query`.
+
+**Rulesets**:
+- Manage rule sources via the Security UI / API (`/api/v1/security/rulesets`). Each ruleset stores `name`, optional `source_url`, `mode`, and raw `content`.
+- Attach a global rules source using `waf_rules_source` in the security config.
+
+→ [Coraza](https://coraza.io/) · [Cerberus Deep Dive](cerberus.md#waf)

 ### Access Control Lists (ACLs)
 Control who can access your services with IP whitelists, blacklists, and geo-blocking. Block entire countries or allow only specific networks.
--- a/docs/security.md
+++ b/docs/security.md
@@ -65,17 +65,26 @@ environment:

 ### WAF Configuration

-| Variable | Value | Description |
+| Variable | Values | Description |
 | :--- | :--- | :--- |
 | `CERBERUS_SECURITY_WAF_MODE` | `disabled` | (Default) WAF is turned off. |
-| | `enabled` | Enables Coraza WAF with OWASP CRS. |
+|  | `monitor` | Evaluate requests, emit metrics & structured logs, do not block. |
+|  | `block` | Evaluate & actively block suspicious payloads. |

-**Example:**
+**Example (Monitor Mode):**
 ```yaml
 environment:
-  - CERBERUS_SECURITY_WAF_MODE=enabled
+  - CERBERUS_SECURITY_WAF_MODE=monitor
 ```

+**Example (Blocking Mode):**
+```yaml
+environment:
+  - CERBERUS_SECURITY_WAF_MODE=block
+```
+
+> Migration Note: Earlier documentation referenced a value `enabled`. Use `block` going forward for enforcement.
+
 ### ACL Configuration

 | Variable | Value | Description |
@@ -107,7 +116,7 @@ When enabling the Cerberus suite (CrowdSec, WAF, ACLs, Rate Limiting) there is a
 - **Localhost Bypass**: Requests from `127.0.0.1` or `::1` may be allowed to manage the system locally without a token (helpful for local management access).
 - **Manager Checks**: Config deployment will be refused if Cerberus is enabled and no admin whitelist is configured — this prevents accidental global lockouts when applying new configurations.

-Follow a phased approach: deploy in `monitor`/`log-only` modes, validate findings, add admin whitelist entries, then switch to `block`/`enforce` mode.
+Follow a phased approach: deploy in `monitor` (log-only) first, validate findings, add admin whitelist entries, then switch to `block` enforcement.

 ## ACL Best Practices by Service Type

@@ -172,6 +181,86 @@ Because IP-based blocklists are dynamic and often incomplete, we removed the IP-

 Use ACLs primarily for explicit or static restrictions such as geofencing or limiting access to your home/office IP ranges.

+---
+
+## Observability & Logging
+
+Charon exposes security observability through Prometheus metrics and structured logs:
+
+### Prometheus Metrics
+| Metric | Description |
+| :--- | :--- |
+| `charon_waf_requests_total` | Total requests evaluated by the WAF. |
+| `charon_waf_blocked_total` | Requests blocked in `block` mode. |
+| `charon_waf_monitored_total` | Requests logged in `monitor` mode. |
+
+Scrape endpoint: `GET /metrics` (no auth). Integrate with Prometheus server or a compatible collector.
+
+### Structured Logs
+WAF decisions emit JSON-like structured fields:
+```
+source: "waf"
+decision: "block" | "monitor"
+mode: "block" | "monitor" | "disabled"
+path: "/api/v1/..."
+query: "raw url query string"
+```
+Use these fields to build dashboards and alerting (e.g., block rate spikes).
+
+### Recommended Dashboards
+- Block Rate (% blocked / evaluated)
+- Monitor to Block Transition (verify stability before enforcing)
+- Top Paths Triggering Blocks
+- Recent Security Decisions (from `/api/v1/security/decisions`)
+
+---
+
+## Security API Summary
+
+| Endpoint | Method | Purpose |
+| :--- | :--- | :--- |
+| `/api/v1/security/status` | GET | Current enabled state & modes. |
+| `/api/v1/security/config` | GET | Retrieve persisted global security config. |
+| `/api/v1/security/config` | POST | Upsert global security config. |
+| `/api/v1/security/enable` | POST | Enable Cerberus (requires whitelist or break-glass token). |
+| `/api/v1/security/disable` | POST | Disable Cerberus (localhost or break-glass token). |
+| `/api/v1/security/breakglass/generate` | POST | Generate one-time break-glass token. |
+| `/api/v1/security/decisions` | GET | List recent decisions (limit query param). |
+| `/api/v1/security/decisions` | POST | Manually log a decision (override). |
+| `/api/v1/security/rulesets` | GET | List uploaded rulesets. |
+| `/api/v1/security/rulesets` | POST | Create/update a ruleset. |
+| `/api/v1/security/rulesets/:id` | DELETE | Remove a ruleset. |
+
+### Sample Security Config Payload
+```json
+{
+  "name": "default",
+  "enabled": true,
+  "admin_whitelist": "198.51.100.10,203.0.113.0/24",
+  "crowdsec_mode": "local",
+  "crowdsec_api_url": "",
+  "waf_mode": "monitor",
+  "waf_rules_source": "owasp-crs-local",
+  "waf_learning": true,
+  "rate_limit_enable": false,
+  "rate_limit_burst": 0,
+  "rate_limit_requests": 0,
+  "rate_limit_window_sec": 0
+}
+```
+
+### Sample Ruleset Upsert Payload
+```json
+{
+  "name": "owasp-crs-quick",
+  "source_url": "https://example.com/owasp-crs.txt",
+  "mode": "owasp",
+  "content": "# raw rules or placeholder"
+}
+```
+
+---
+
 ## Testing ACLs

 Before applying an ACL to a production service: