Add handlers for enable_standard_headers, forward_auth_enabled, and waf_disabled fields in the proxy host Update function. These fields were defined in the model but were not being processed during updates, causing: - 500 errors when saving proxy host configurations - Auth pass-through failures for apps like Seerr/Overseerr due to missing X-Forwarded-* headers Changes: - backend: Add field handlers for 3 missing fields in proxy_host_handler.go - backend: Add 5 comprehensive unit tests for field handling - frontend: Update TypeScript ProxyHost interface with missing fields - docs: Document fixes in CHANGELOG.md Tests: All 1147 tests pass (backend 85.6%, frontend 87.7% coverage) Security: No vulnerabilities (Trivy + govulncheck clean) Fixes #16 (auth pass-through) Fixes #17 (500 error on save)
13 KiB
CrowdSec Integration Final Validation Report
Date: December 15, 2025 Validator: QA_Security Agent Status: ⚠️ CRITICAL ISSUE FOUND
Executive Summary
The CrowdSec integration implementation has a critical bug that prevents the CrowdSec LAPI (Local API) from starting after container restarts. While the bouncer registration and configuration are correct, a stale PID file causes the reconciliation logic to incorrectly believe CrowdSec is already running, preventing startup.
Test Results
1. ✅ CrowdSec Integration Test (Partial Pass)
Test Command: scripts/crowdsec_startup_test.sh
Results:
- ✅ No fatal 'no datasource enabled' error
- ❌ LAPI health check failed (port 8085 not responding)
- ✅ Acquisition config exists with datasource definition
- ✅ Parsers check passed (with warning)
- ✅ Scenarios check passed (with warning)
- ✅ CrowdSec process check passed (false positive)
Score: 5/6 checks passed, but critical failure in LAPI health
Root Cause Analysis: The CrowdSec process (PID 3469) was running during initial container startup and functioned correctly. However, after a container restart:
- A stale PID file
/app/data/crowdsec/crowdsec.pidcontains PID51 - PID 51 does not exist in the process table
- The reconciliation logic checks if PID file exists and assumes CrowdSec is running
- No validation that the PID in the file corresponds to an actual running process
- CrowdSec LAPI never starts, bouncer cannot connect
Evidence:
# PID file shows 51
$ docker exec charon cat /app/data/crowdsec/crowdsec.pid
51
# But no process with PID 51 exists
$ docker exec charon ps aux | grep 51 | grep -v grep
(no results)
# Reconciliation log incorrectly reports "already running"
{"level":"info","msg":"CrowdSec reconciliation: already running","pid":51,"time":"2025-12-15T16:14:44-05:00"}
Bouncer Errors:
{"level":"error","logger":"crowdsec","msg":"auth-api: auth with api key failed return nil response,
error: dial tcp 127.0.0.1:8085: connect: connection refused","instance_id":"2977e81e"}
2. ❌ Traffic Blocking Validation (FAILED)
Test Commands:
# Added test ban
$ docker exec charon cscli decisions add --ip 203.0.113.99 --duration 10m --type ban --reason "Test ban for QA validation"
level=info msg="Decision successfully added"
# Verified ban exists
$ docker exec charon cscli decisions list
+----+--------+-----------------+----------------------------+--------+---------+----+--------+------------+----------+
| ID | Source | Scope:Value | Reason | Action | Country | AS | Events | expiration | Alert ID |
+----+--------+-----------------+----------------------------+--------+---------+----+--------+------------+----------+
| 1 | cscli | Ip:203.0.113.99 | Test ban for QA validation | ban | | | 1 | 9m59s | 1 |
+----+--------+-----------------+----------------------------+--------+---------+----+--------+------------+----------+
# Tested blocked traffic
$ curl -H "X-Forwarded-For: 203.0.113.99" http://localhost:8080/
< HTTP/1.1 200 OK # ❌ SHOULD BE 403 Forbidden
Status: ❌ FAILED - Traffic NOT blocked
Root Cause:
- CrowdSec LAPI is not running (see Test #1)
- Caddy bouncer cannot retrieve decisions from LAPI
- Without active decisions, all traffic passes through
Bouncer Status (Before LAPI Failure):
----------------------------------------------------------------------------------------------
Name IP Address Valid Last API pull Type Version Auth Type
----------------------------------------------------------------------------------------------
caddy-bouncer 127.0.0.1 ✔️ 2025-12-15T21:14:03Z caddy-cs-bouncer v0.9.2 api-key
----------------------------------------------------------------------------------------------
Note: When LAPI was operational (initially), the bouncer successfully authenticated and pulled decisions. The blocking failure is purely due to LAPI unavailability after restart.
3. ✅ Regression Tests
Backend Tests
Command: cd backend && go test ./...
Result: ✅ PASS
All tests passed (cached)
Coverage: 85.1% (meets 85% requirement)
Frontend Tests
Command: cd frontend && npm run test
Result: ✅ PASS
Test Files 91 passed (91)
Tests 956 passed | 2 skipped (958)
Duration 66.45s
4. ✅ Security Scans
Command: cd backend && go run golang.org/x/vuln/cmd/govulncheck@latest ./...
Result: ✅ PASS
No vulnerabilities found.
5. ✅ Pre-commit Checks
Command: source .venv/bin/activate && pre-commit run --all-files
Result: ✅ PASS
Go Vet...................................................................Passed
Check .version matches latest Git tag....................................Passed
Prevent large files that are not tracked by LFS..........................Passed
Prevent committing CodeQL DB artifacts...................................Passed
Prevent committing data/backups files....................................Passed
Frontend TypeScript Check................................................Passed
Frontend Lint (Fix)......................................................Passed
Coverage: 85.1% (minimum required 85%)
Critical Bug: PID Reuse Vulnerability
Issue Location
File: backend/internal/api/handlers/crowdsec_exec.go
Function: DefaultCrowdsecExecutor.Status() (lines 95-122)
Root Cause: PID Reuse Without Process Name Validation
The Status() function checks if a process exists with the stored PID but does NOT verify that it's actually the CrowdSec process. This causes a critical bug when:
- CrowdSec starts with PID X (e.g., 51) and writes PID file
- CrowdSec crashes or is killed
- System reuses PID X for a different process (e.g., Delve telemetry)
- Status() finds PID X is running and returns
running=true - Reconciliation logic thinks CrowdSec is running and skips startup
- CrowdSec never starts, LAPI remains unavailable
Evidence
PID File Content:
$ docker exec charon cat /app/data/crowdsec/crowdsec.pid
51
Actual Process at PID 51:
$ docker exec charon cat /proc/51/cmdline | tr '\0' ' '
/usr/local/bin/dlv ** telemetry **
NOT CrowdSec! The PID was recycled.
Reconciliation Log (Incorrect):
{"level":"info","msg":"CrowdSec reconciliation: already running","pid":51,"time":"2025-12-15T16:14:44-05:00"}
Current Implementation (Buggy)
func (e *DefaultCrowdsecExecutor) Status(ctx context.Context, configDir string) (running bool, pid int, err error) {
b, err := os.ReadFile(e.pidFile(configDir))
if err != nil {
return false, 0, nil
}
pid, err = strconv.Atoi(string(b))
if err != nil {
return false, 0, nil
}
proc, err := os.FindProcess(pid)
if err != nil {
return false, pid, nil
}
// ❌ BUG: This only checks if *any* process exists with this PID
// It does NOT verify that the process is CrowdSec!
if err = proc.Signal(syscall.Signal(0)); err != nil {
if errors.Is(err, os.ErrProcessDone) {
return false, pid, nil
}
return false, pid, nil
}
return true, pid, nil // ❌ Returns true even if PID is recycled!
}
Required Fix
The fix requires process name validation to ensure the PID belongs to CrowdSec:
func (e *DefaultCrowdsecExecutor) Status(ctx context.Context, configDir string) (running bool, pid int, err error) {
b, err := os.ReadFile(e.pidFile(configDir))
if err != nil {
return false, 0, nil
}
pid, err = strconv.Atoi(string(b))
if err != nil {
return false, 0, nil
}
proc, err := os.FindProcess(pid)
if err != nil {
return false, pid, nil
}
// Check if process exists
if err = proc.Signal(syscall.Signal(0)); err != nil {
if errors.Is(err, os.ErrProcessDone) {
return false, pid, nil
}
return false, pid, nil
}
// ✅ NEW: Verify the process is actually CrowdSec
if !isCrowdSecProcess(pid) {
// PID was recycled - not CrowdSec
return false, pid, nil
}
return true, pid, nil
}
// isCrowdSecProcess checks if the given PID is actually a CrowdSec process
func isCrowdSecProcess(pid int) bool {
cmdlinePath := filepath.Join("/proc", strconv.Itoa(pid), "cmdline")
b, err := os.ReadFile(cmdlinePath)
if err != nil {
return false
}
// cmdline uses null bytes as separators
cmdline := string(b)
// Check if this is crowdsec binary (could be /usr/local/bin/crowdsec or similar)
return strings.Contains(cmdline, "crowdsec")
}
Implementation Details
The fix requires:
- Process name validation by reading
/proc/{pid}/cmdline - String matching to verify "crowdsec" appears in command line
- PID file cleanup when recycled PID detected (optional, but recommended)
- Logging to track PID reuse events
- Test coverage for PID reuse scenario
Alternative Approach (More Robust): Store both PID and process start time in the PID file to detect reboots/recycling.
Configuration Validation
Environment Variables ✅
CHARON_CROWDSEC_CONFIG_DIR=/app/data/crowdsec
CHARON_SECURITY_CROWDSEC_API_KEY=charonbouncerkey2024
CHARON_SECURITY_CROWDSEC_API_URL=http://localhost:8080
CHARON_SECURITY_CROWDSEC_MODE=local
FEATURE_CERBERUS_ENABLED=true
Status: ✅ All correct
Caddy CrowdSec App Configuration ✅
{
"api_key": "charonbouncerkey2024",
"api_url": "http://127.0.0.1:8085",
"enable_streaming": true,
"ticker_interval": "60s"
}
Status: ✅ Correct configuration
CrowdSec Binary Installation ✅
-rwxr-xr-x 1 root root 71772280 Dec 15 12:50 /usr/local/bin/crowdsec
Status: ✅ Binary installed and executable
Recommendations
Immediate Actions (P0 - Critical)
-
Fix Stale PID Detection ⚠️ REQUIRED BEFORE RELEASE
- Add process validation in reconciliation logic
- Remove stale PID files automatically
- Location:
backend/internal/crowdsec/service.go(reconciliation function) - Estimated Effort: 30 minutes
- Testing: Unit tests + integration test with restart scenario
-
Add Restart Integration Test
- Create test that stops CrowdSec, restarts container, verifies startup
- Location:
scripts/crowdsec_restart_test.sh - Acceptance Criteria: CrowdSec starts successfully after restart
Short-term Improvements (P1 - High)
-
Enhanced Health Checks
- Add LAPI connectivity check to container healthcheck
- Alert on prolonged bouncer connection failures
- Impact: Faster detection of CrowdSec issues
-
PID File Management
- Move PID file to
/var/run/crowdsec.pid(standard location) - Use systemd-style PID management if available
- Auto-cleanup on graceful shutdown
- Move PID file to
Long-term Enhancements (P2 - Medium)
-
Monitoring Dashboard
- Add CrowdSec status indicator to UI
- Show LAPI health, bouncer connection status
- Display decision count and recent blocks
-
Auto-recovery
- Implement watchdog timer for CrowdSec process
- Auto-restart on crash detection
- Exponential backoff for restart attempts
Summary
| Category | Status | Score |
|---|---|---|
| Integration Test | ⚠️ Partial | 5/6 (83%) |
| Traffic Blocking | ❌ Failed | 0/1 (0%) |
| Regression Tests | ✅ Pass | 2/2 (100%) |
| Security Scans | ✅ Pass | 1/1 (100%) |
| Pre-commit | ✅ Pass | 1/1 (100%) |
| Overall | ❌ FAIL | 9/11 (82%) |
Verdict
⚠️ VALIDATION FAILED - CRITICAL BUG FOUND
Issue: Stale PID file prevents CrowdSec LAPI from starting after container restart.
Impact:
- ❌ CrowdSec does NOT function after restart
- ❌ Traffic blocking DOES NOT work
- ✅ All other components (tests, security, code quality) pass
Required Before Release:
- Fix stale PID detection in reconciliation logic
- Add restart integration test
- Verify traffic blocking works after container restart
Timeline:
- Fix Implementation: 30-60 minutes
- Testing & Validation: 30 minutes
- Total: ~1.5 hours
Test Evidence
Files Examined
- docker-entrypoint.sh - CrowdSec initialization
- docker-compose.override.yml - Environment variables
- Backend tests: All passed (cached)
- Frontend tests: 956 passed, 2 skipped
Container State
- Container:
charon(Up 43 minutes, healthy) - CrowdSec binary: Installed at
/usr/local/bin/crowdsec(71MB) - LAPI port 8085: Not bound (process not running)
- Bouncer: Registered but cannot connect
Logs Analyzed
- Container logs: 50+ lines analyzed
- CrowdSec logs: Connection refused errors every 10s
- Reconciliation logs: False "already running" messages
Next Steps
- Developer: Implement stale PID fix in
backend/internal/crowdsec/service.go - QA: Re-run validation after fix deployed
- DevOps: Update integration tests to include restart scenario
- Documentation: Add troubleshooting section for PID file issues
Report Generated: 2025-12-15 21:23 UTC Validation Duration: 45 minutes Agent: QA_Security Version: Charon v0.x.x (pre-release)