Files
Charon/docs/plans/crowdsec_startup_fix.md
GitHub Actions 0543a15344 fix(security): resolve CrowdSec startup permission failures
Fixes CrowdSec failing to start due to multiple permission issues:
- Log directory path was /var/log/ instead of /var/log/crowdsec/
- Database files owned by root (cscli runs as root)
- Config files owned by root after envsubst

Changes to .docker/docker-entrypoint.sh:
- Add sed to fix log_dir path to /var/log/crowdsec/
- Add chown after each envsubst config operation
- Add final chown -R after all cscli commands complete

Testing:
- CrowdSec now starts automatically on container boot
- LAPI listens on port 8085 and responds
- Backend coverage: 85.5%
- All pre-commit checks pass
- 0 security vulnerabilities (Critical/High)
2025-12-23 02:30:22 +00:00

12 KiB

CrowdSec Startup Fix Plan

Date: 2025-12-22 Updated: 2025-12-23 (Post-Implementation Investigation) Status: FAILED - Requires Additional Fixes Priority: CRITICAL

Current State (2025-12-23 Investigation)

CrowdSec is NOT starting due to PERMISSION ERRORS. The initial fix was implemented but did NOT address the actual root causes.

Actual Error Messages from Container Logs

Failed to write to log, can't open new logfile: open /var/log/crowdsec.log: permission denied

FATAL unable to create database client: unable to set perms on /var/lib/crowdsec/data/crowdsec.db: chmod /var/lib/crowdsec/data/crowdsec.db: operation not permitted

{"level":"warning","msg":"CrowdSec started but LAPI not ready within timeout","pid":316,"time":"2025-12-22T21:04:00-05:00"}

File Ownership Issues (VERIFIED)

# Database file owned by root - CrowdSec can't chmod it
$ stat -c '%U:%G %n' /var/lib/crowdsec/data/crowdsec.db
root:root /var/lib/crowdsec/data/crowdsec.db

# Config files owned by root - created by entrypoint running as root
$ stat -c '%U:%G %n' /app/data/crowdsec/config/config.yaml /app/data/crowdsec/config/user.yaml
root:root /app/data/crowdsec/config/config.yaml
root:root /app/data/crowdsec/config/user.yaml

CrowdSec Config Problem (CRITICAL)

The config.yaml has log_dir: /var/log/ (wrong path):

common:
  log_dir: /var/log/          # <-- WRONG: Should be /var/log/crowdsec/
  log_media: file

CrowdSec is trying to write to /var/log/crowdsec.log but /var/log/ is owned by root. The correct path should be /var/log/crowdsec/ which is owned by charon.

Root Cause Analysis (UPDATED)

1. Entrypoint Script Runs CrowdSec Commands as Root

Finding: The entrypoint script runs cscli machines add -a --force and envsubst on config files while still running as root. These operations:

  • Create /var/lib/crowdsec/data/crowdsec.db owned by root
  • Overwrite config.yaml and user.yaml with root ownership

Evidence from entrypoint:

# These run as root BEFORE `su-exec charon` is used
cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration may have failed"
envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"

2. CrowdSec Log Path Configuration Error

Finding: The distributed config.yaml has log_dir: /var/log/ instead of log_dir: /var/log/crowdsec/.

Evidence:

# Current (WRONG):
log_dir: /var/log/

# Should be:
log_dir: /var/log/crowdsec/

3. ReconcileCrowdSecOnStartup IS Being Called (VERIFIED)

Finding: The reconciliation function is now correctly called in backend/cmd/api/main.go#L144 BEFORE the HTTP server starts:

crowdsecExec := handlers.NewDefaultCrowdsecExecutor()
services.ReconcileCrowdSecOnStartup(db, crowdsecExec, crowdsecBinPath, crowdsecDataDir)

This is CORRECT but CrowdSec still fails due to permission issues.

4. CrowdSec Start Method is Correct (VERIFIED)

Finding: The executor's Start method correctly uses os/exec without context cancellation:

cmd := exec.Command(binPath, "-c", configFile)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}

The binary starts but immediately crashes due to permission denied errors.

What Was Implemented vs What Actually Happened

Item Expected Actual
Reconciliation in main.go Added Called on startup
Dockerfile chown for CrowdSec dirs Added Overwritten at runtime by entrypoint
Goroutine removed from routes.go Removed Confirmed removed
Entrypoint permission fix Not implemented Root operations create root-owned files
Config log_dir fix Not implemented Still pointing to /var/log/

REQUIRED FIXES (Specific Code Changes)

FIX 1: Change CrowdSec log_dir in Entrypoint (CRITICAL)

File: .docker/docker-entrypoint.sh Location: After line 155 (after sed -i 's|listen_uri.*|listen_uri: 127.0.0.1:8085|g') Add:

# Fix log_dir path - must point to /var/log/crowdsec/ not /var/log/
if [ -f "/etc/crowdsec/config.yaml" ]; then
    sed -i 's|log_dir: /var/log/$|log_dir: /var/log/crowdsec/|g' /etc/crowdsec/config.yaml
    sed -i 's|log_dir: /var/log/\s*$|log_dir: /var/log/crowdsec/|g' /etc/crowdsec/config.yaml
fi

FIX 2: Run cscli Commands as charon User (CRITICAL)

File: .docker/docker-entrypoint.sh Change: All cscli commands must run as charon user, not root.

Current (WRONG):

cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration may have failed"

Required (CORRECT):

su-exec charon cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration may have failed"

FIX 3: Run envsubst as charon User (CRITICAL)

File: .docker/docker-entrypoint.sh Change: The envsubst operations must preserve charon ownership.

Current (WRONG):

for file in /etc/crowdsec/config.yaml /etc/crowdsec/user.yaml; do
    if [ -f "$file" ]; then
        envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"
    fi
done

Required (CORRECT):

for file in /etc/crowdsec/config.yaml /etc/crowdsec/user.yaml; do
    if [ -f "$file" ]; then
        envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"
        chown charon:charon "$file" 2>/dev/null || true
    fi
done

FIX 4: Fix Ownership AFTER cscli Operations (CRITICAL)

File: .docker/docker-entrypoint.sh Location: After all cscli operations, before "CrowdSec configuration initialized" message Add:

# Fix ownership of files created by cscli (runs as root, creates root-owned files)
# The database and config files must be owned by charon for CrowdSec to start
chown -R charon:charon /var/lib/crowdsec 2>/dev/null || true
chown -R charon:charon /app/data/crowdsec 2>/dev/null || true
chown -R charon:charon /var/log/crowdsec 2>/dev/null || true

FIX 5: Update Default config.yaml in configs/crowdsec/ (PREVENTIVE)

File: configs/crowdsec/config.yaml (if exists) or modify the distributed template Change: Ensure log_dir is correct in the source template:

common:
  daemonize: true
  log_media: file
  log_level: info
  log_dir: /var/log/crowdsec/   # <-- CORRECT PATH

Complete Entrypoint Script Fix

Here's the corrected CrowdSec section for .docker/docker-entrypoint.sh:

# ============================================================================
# CrowdSec Initialization
# ============================================================================

if command -v cscli >/dev/null; then
    echo "Initializing CrowdSec configuration..."

    # Define persistent paths
    CS_PERSIST_DIR="/app/data/crowdsec"
    CS_CONFIG_DIR="$CS_PERSIST_DIR/config"
    CS_DATA_DIR="$CS_PERSIST_DIR/data"
    CS_LOG_DIR="/var/log/crowdsec"

    # Ensure persistent directories exist
    mkdir -p "$CS_CONFIG_DIR" "$CS_DATA_DIR" "$CS_LOG_DIR" 2>/dev/null || true
    mkdir -p /var/lib/crowdsec/data 2>/dev/null || true

    # Initialize persistent config if key files are missing
    if [ ! -f "$CS_CONFIG_DIR/config.yaml" ]; then
        echo "Initializing persistent CrowdSec configuration..."
        if [ -d "/etc/crowdsec.dist" ] && [ -n "$(ls -A /etc/crowdsec.dist 2>/dev/null)" ]; then
            cp -r /etc/crowdsec.dist/* "$CS_CONFIG_DIR/" || exit 1
            echo "Successfully initialized config from .dist directory"
        fi
    fi

    # Create acquisition config
    if [ ! -f "/etc/crowdsec/acquis.yaml" ] || [ ! -s "/etc/crowdsec/acquis.yaml" ]; then
        cat > /etc/crowdsec/acquis.yaml << 'ACQUIS_EOF'
source: file
filenames:
  - /var/log/caddy/access.log
  - /var/log/caddy/*.log
labels:
  type: caddy
ACQUIS_EOF
    fi

    # Environment substitution (preserving ownership after)
    export CFG=/etc/crowdsec
    export DATA="$CS_DATA_DIR"
    export PID=/var/run/crowdsec.pid
    export LOG="$CS_LOG_DIR/crowdsec.log"

    for file in /etc/crowdsec/config.yaml /etc/crowdsec/user.yaml; do
        if [ -f "$file" ]; then
            envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"
            chown charon:charon "$file" 2>/dev/null || true
        fi
    done

    # Configure LAPI port (8085 instead of 8080)
    if [ -f "/etc/crowdsec/config.yaml" ]; then
        sed -i 's|listen_uri: 127.0.0.1:8080|listen_uri: 127.0.0.1:8085|g' /etc/crowdsec/config.yaml
        sed -i 's|listen_uri: 0.0.0.0:8080|listen_uri: 127.0.0.1:8085|g' /etc/crowdsec/config.yaml
        # FIX: Correct log_dir path
        sed -i 's|log_dir: /var/log/$|log_dir: /var/log/crowdsec/|g' /etc/crowdsec/config.yaml
    fi

    # Update local_api_credentials.yaml to use correct port
    if [ -f "/etc/crowdsec/local_api_credentials.yaml" ]; then
        sed -i 's|url: http://127.0.0.1:8080|url: http://127.0.0.1:8085|g' /etc/crowdsec/local_api_credentials.yaml
        sed -i 's|url: http://localhost:8080|url: http://127.0.0.1:8085|g' /etc/crowdsec/local_api_credentials.yaml
    fi

    # Update hub index
    if [ ! -f "/etc/crowdsec/hub/.index.json" ]; then
        echo "Updating CrowdSec hub index..."
        timeout 60s cscli hub update 2>/dev/null || echo "⚠️ Hub update timed out"
    fi

    # Register local machine (run as charon or fix ownership after)
    echo "Registering local machine..."
    cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration failed"

    # *** CRITICAL FIX: Fix ownership of ALL CrowdSec files after cscli operations ***
    # cscli runs as root and creates root-owned files (crowdsec.db, config files)
    # CrowdSec process runs as charon and needs write access
    echo "Fixing CrowdSec file ownership..."
    chown -R charon:charon /var/lib/crowdsec 2>/dev/null || true
    chown -R charon:charon /app/data/crowdsec 2>/dev/null || true
    chown -R charon:charon /var/log/crowdsec 2>/dev/null || true

    echo "CrowdSec configuration initialized. Agent lifecycle is GUI-controlled."
fi

Testing After Fix

  1. Rebuild container:

    docker build -t charon:local . && docker compose -f docker-compose.test.yml up -d
    
  2. Verify ownership is correct:

    docker compose -f docker-compose.test.yml exec charon ls -la /var/lib/crowdsec/data/
    # Expected: all files owned by charon:charon
    
  3. Check CrowdSec logs for permission errors:

    docker compose -f docker-compose.test.yml logs charon 2>&1 | grep -i "permission\|denied\|FATAL"
    # Expected: no permission errors
    
  4. Verify LAPI is listening after manual start:

    curl -X POST http://localhost:8080/api/v1/admin/crowdsec/start
    docker compose -f docker-compose.test.yml exec charon ss -tuln | grep 8085
    # Expected: LISTEN on :8085
    

Success Criteria (Updated)

  • All files in /var/lib/crowdsec/ owned by charon:charon
  • All files in /app/data/crowdsec/ owned by charon:charon
  • config.yaml has log_dir: /var/log/crowdsec/
  • No "permission denied" errors in container logs
  • CrowdSec LAPI binds to port 8085 successfully
  • Manual start via GUI completes without timeout
  • Reconciliation on startup works when mode=local

References

Changelog

2025-12-23 - Investigation Update

  • Status: FAILED - Previous implementation did not fix root cause
  • Finding: Permission errors due to entrypoint running cscli as root
  • Finding: log_dir config points to wrong path (/var/log/ vs /var/log/crowdsec/)
  • Action: Updated plan with specific entrypoint script fixes
  • Priority: Escalated to CRITICAL

2025-12-22 - Initial Plan

  • Created initial plan based on code review
  • Identified timing issue with goroutine call
  • Proposed moving reconciliation to main.go (implemented)