Files
Charon/docs/testing/crowdsec_auth_manual_verification.md
GitHub Actions ca477c48d4 chore: Enhance documentation for E2E testing:
- Added clarity and structure to README files, including recent updates and getting started sections.
- Improved manual verification documentation for CrowdSec authentication, emphasizing expected outputs and success criteria.
- Updated debugging guide with detailed output examples and automatic trace capture information.
- Refined best practices for E2E tests, focusing on efficient polling, locator strategies, and state management.
- Documented triage report for DNS Provider feature tests, highlighting issues fixed and test results before and after improvements.
- Revised E2E test writing guide to include when to use specific helper functions and patterns for better test reliability.
- Enhanced troubleshooting documentation with clear resolutions for common issues, including timeout and token configuration problems.
- Updated tests README to provide quick links and best practices for writing robust tests.
2026-03-24 01:47:22 +00:00

10 KiB

CrowdSec Authentication Fix - Manual Verification Guide

This document provides step-by-step procedures for manually verifying the Bug #1 fix (CrowdSec LAPI authentication regression).

Prerequisites

  • Docker and docker-compose installed
  • Charon container running (either charon-e2e for testing or production container)
  • Access to container logs
  • Basic understanding of CrowdSec bouncer authentication

Test Scenarios

Scenario 1: Invalid Environment Variable Auto-Recovery

Objective: Verify that when CHARON_SECURITY_CROWDSEC_API_KEY or CROWDSEC_API_KEY is set to an invalid key, Charon detects the failure and auto-generates a new valid key.

Steps:

  1. Set Invalid Environment Variable

    Edit your docker-compose.yml or .env file:

    environment:
      CHARON_SECURITY_CROWDSEC_API_KEY: fakeinvalidkey12345
    
  2. Start/Restart Container

    docker compose up -d charon
    # OR
    docker restart charon
    
  3. Enable CrowdSec via API

    # Login first (adjust credentials as needed)
    curl -c cookies.txt -X POST http://localhost:8080/api/v1/auth/login \
      -H "Content-Type: application/json" \
      -d '{"email":"admin@example.com","password":"yourpassword"}'
    
    # Enable CrowdSec
    curl -b cookies.txt -X POST http://localhost:8080/api/v1/admin/crowdsec/start
    
  4. Verify Logs Show Validation Failure

    docker logs charon --tail 100 | grep -i "invalid"
    

    Expected Output:

    time="..." level=warning msg="Environment variable CHARON_SECURITY_CROWDSEC_API_KEY is set but invalid. Either remove it from docker-compose.yml or update it to match the auto-generated key. A new valid key will be generated and saved." masked_key=fake...345
    
  5. Verify New Key Auto-Generated

    docker exec charon cat /app/data/crowdsec/bouncer_key
    

    Expected: A valid CrowdSec API key (NOT fakeinvalidkey12345)

  6. Verify Caddy Bouncer Connects Successfully

    # Test authentication with new key
    NEW_KEY=$(docker exec charon cat /app/data/crowdsec/bouncer_key)
    curl -H "X-Api-Key: $NEW_KEY" http://localhost:8080/v1/decisions/stream
    

    Expected: HTTP 200 OK (may return empty {"new":null,"deleted":null})

  7. Verify Logs Show Success

    docker logs charon --tail 50 | grep -i "authentication successful"
    

    Expected Output:

    time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file
    

Success Criteria:

  • Warning logged about invalid env var
  • New key auto-generated and saved to /app/data/crowdsec/bouncer_key
  • Bouncer authenticates successfully with new key
  • No "access forbidden" errors in logs

Scenario 2: LAPI Startup Delay Handling

Objective: Verify that when LAPI starts 5+ seconds after Charon, the retry logic succeeds instead of immediately failing.

Steps:

  1. Stop Any Running CrowdSec Instance

    docker exec charon pkill -9 crowdsec || true
    
  2. Enable CrowdSec via API (while LAPI is down)

    curl -b cookies.txt -X POST http://localhost:8080/api/v1/admin/crowdsec/start
    
  3. Monitor Logs for Retry Messages

    docker logs -f charon 2>&1 | grep -i "lapi not ready"
    

    Expected Output:

    time="..." level=info msg="LAPI not ready, retrying with backoff" attempt=1 error="connection refused" next_attempt_ms=500
    time="..." level=info msg="LAPI not ready, retrying with backoff" attempt=2 error="connection refused" next_attempt_ms=750
    time="..." level=info msg="LAPI not ready, retrying with backoff" attempt=3 error="connection refused" next_attempt_ms=1125
    
  4. Wait for LAPI to Start (up to 30 seconds)

    Look for success message:

    time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file
    
  5. Verify Bouncer Connection

    KEY=$(docker exec charon cat /app/data/crowdsec/bouncer_key)
    curl -H "X-Api-Key: $KEY" http://localhost:8080/v1/decisions/stream
    

    Expected: HTTP 200 OK

Success Criteria:

  • Logs show retry attempts with exponential backoff (500ms → 750ms → 1125ms → ...)
  • Connection succeeds after LAPI starts (within 30s max)
  • No immediate failure on first connection refused error

Scenario 3: No More "Access Forbidden" Errors in Production

Objective: Verify that setting an invalid environment variable no longer causes persistent "access forbidden" errors after the fix.

Steps:

  1. Reproduce Pre-Fix Behavior (for comparison - requires reverting to old code)

    With old code, setting invalid env var would cause:

    time="..." level=error msg="LAPI authentication failed" error="access forbidden (403)" key="[REDACTED]"
    
  2. Apply Fix and Repeat Scenario 1

    With new code, same invalid env var should produce:

    time="..." level=warning msg="Environment variable CHARON_SECURITY_CROWDSEC_API_KEY is set but invalid..."
    time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file
    

Success Criteria:

  • No "access forbidden" errors after auto-recovery
  • Bouncer connects successfully with auto-generated key

Scenario 4: Key Source Visibility in Logs

Objective: Verify that logs clearly indicate which key source is used (environment variable vs file vs auto-generated).

Test Cases:

4a. Valid Environment Variable

# Set valid key in env
export CHARON_SECURITY_CROWDSEC_API_KEY=<valid_key_from_cscli>
docker restart charon

Expected Log:

time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="vali...test" source=environment_variable

4b. File-Based Key

# Clear env var, restart with existing file
unset CHARON_SECURITY_CROWDSEC_API_KEY
docker restart charon

Expected Log:

time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file

4c. Auto-Generated Key

# Clear env var and file, start fresh
docker exec charon rm -f /app/data/crowdsec/bouncer_key
docker restart charon

Expected Log:

time="..." level=info msg="Registering new CrowdSec bouncer: caddy-bouncer"
time="..." level=info msg="CrowdSec bouncer registration successful" masked_key="new-...123" source=auto_generated

Success Criteria:

  • Logs clearly show source=environment_variable, source=file, or source=auto_generated
  • User can determine which key is active without reading code

Troubleshooting

Issue: "failed to execute cscli" Errors

Cause: CrowdSec binary not installed in container

Resolution: Ensure CrowdSec is installed via Dockerfile or skip test if binary is intentionally excluded.

Issue: LAPI Timeout After 30 Seconds

Cause: CrowdSec process failed to start or crashed

Debug Steps:

  1. Check LAPI process: docker exec charon ps aux | grep crowdsec
  2. Check LAPI logs: docker exec charon cat /var/log/crowdsec/crowdsec.log
  3. Verify config: docker exec charon cat /etc/crowdsec/config.yaml

Issue: "access forbidden" Despite New Key

Cause: Key not properly registered with LAPI

Resolution:

# List registered bouncers
docker exec charon cscli bouncers list

# If caddy-bouncer missing, re-register
docker exec charon cscli bouncers delete caddy-bouncer || true
docker restart charon

Verification Checklist

Before considering the fix complete, verify all scenarios pass:

  • Scenario 1: Invalid env var triggers auto-recovery
  • Scenario 2: LAPI startup delay handled with retry logic
  • Scenario 3: No "access forbidden" errors in production logs
  • Scenario 4a: Env var source logged correctly
  • Scenario 4b: File source logged correctly
  • Scenario 4c: Auto-generated source logged correctly
  • Integration Tests: All 3 tests in backend/integration/crowdsec_lapi_integration_test.go pass
  • Unit Tests: All 10 tests in backend/internal/api/handlers/crowdsec_handler_test.go pass

Additional Validation

Docker Logs Monitoring (Real-Time)

# Watch logs in real-time for auth-related messages
docker logs -f charon 2>&1 | grep -iE "crowdsec|bouncer|lapi|authentication"

LAPI Health Check

# Check if LAPI is responding
curl http://localhost:8080/v1/health

Expected: HTTP 200 OK

Bouncer Registration Status

# Verify bouncer is registered via cscli
docker exec charon cscli bouncers list

# Expected output should include:
# Name             │ IP Address │ Valid │ Last API Key │ Last API Pull
# ─────────────────┼────────────┼───────┼──────────────┼───────────────
# caddy-bouncer    │            │ ✔️    │ <timestamp>  │ <timestamp>

Notes for QA and Code Review

  • Backward Compatibility: Old behavior (name-based validation) is preserved in validateBouncerKey() for backward compatibility. New authentication logic is in testKeyAgainstLAPI().
  • Security: API keys are masked in logs (first 4 + last 4 chars only) to prevent exposure via CWE-312.
  • File Permissions: Bouncer key file created with 0600 permissions (read/write owner only), directory with 0700.
  • Atomic Writes: saveKeyToFile() uses temp file + rename pattern to prevent corruption.
  • Retry Logic: Connection refused errors trigger exponential backoff (500ms → 750ms → 1125ms → ..., capped at 5s per attempt, 30s total).
  • Fast Fail: 403 Forbidden errors fail immediately without retries (indicates invalid key, not LAPI startup issue).

  • Investigation Report: docs/issues/crowdsec_auth_regression.md
  • Unit Tests: backend/internal/api/handlers/crowdsec_handler_test.go (lines 3970-4294)
  • Integration Tests: backend/integration/crowdsec_lapi_integration_test.go
  • Implementation: backend/internal/api/handlers/crowdsec_handler.go (lines 1548-1720)