12 KiB
CrowdSec Startup Fix Plan
Date: 2025-12-22 Updated: 2025-12-23 (Post-Implementation Investigation) Status: FAILED - Requires Additional Fixes Priority: CRITICAL
Current State (2025-12-23 Investigation)
CrowdSec is NOT starting due to PERMISSION ERRORS. The initial fix was implemented but did NOT address the actual root causes.
Actual Error Messages from Container Logs
Failed to write to log, can't open new logfile: open /var/log/crowdsec.log: permission denied
FATAL unable to create database client: unable to set perms on /var/lib/crowdsec/data/crowdsec.db: chmod /var/lib/crowdsec/data/crowdsec.db: operation not permitted
{"level":"warning","msg":"CrowdSec started but LAPI not ready within timeout","pid":316,"time":"2025-12-22T21:04:00-05:00"}
File Ownership Issues (VERIFIED)
# Database file owned by root - CrowdSec can't chmod it
$ stat -c '%U:%G %n' /var/lib/crowdsec/data/crowdsec.db
root:root /var/lib/crowdsec/data/crowdsec.db
# Config files owned by root - created by entrypoint running as root
$ stat -c '%U:%G %n' /app/data/crowdsec/config/config.yaml /app/data/crowdsec/config/user.yaml
root:root /app/data/crowdsec/config/config.yaml
root:root /app/data/crowdsec/config/user.yaml
CrowdSec Config Problem (CRITICAL)
The config.yaml has log_dir: /var/log/ (wrong path):
common:
log_dir: /var/log/ # <-- WRONG: Should be /var/log/crowdsec/
log_media: file
CrowdSec is trying to write to /var/log/crowdsec.log but /var/log/ is owned by root. The correct path should be /var/log/crowdsec/ which is owned by charon.
Root Cause Analysis (UPDATED)
1. Entrypoint Script Runs CrowdSec Commands as Root
Finding: The entrypoint script runs cscli machines add -a --force and envsubst on config files while still running as root. These operations:
- Create
/var/lib/crowdsec/data/crowdsec.dbowned by root - Overwrite
config.yamlanduser.yamlwith root ownership
Evidence from entrypoint:
# These run as root BEFORE `su-exec charon` is used
cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration may have failed"
envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"
2. CrowdSec Log Path Configuration Error
Finding: The distributed config.yaml has log_dir: /var/log/ instead of log_dir: /var/log/crowdsec/.
Evidence:
# Current (WRONG):
log_dir: /var/log/
# Should be:
log_dir: /var/log/crowdsec/
3. ReconcileCrowdSecOnStartup IS Being Called (VERIFIED)
Finding: The reconciliation function is now correctly called in backend/cmd/api/main.go#L144 BEFORE the HTTP server starts:
crowdsecExec := handlers.NewDefaultCrowdsecExecutor()
services.ReconcileCrowdSecOnStartup(db, crowdsecExec, crowdsecBinPath, crowdsecDataDir)
This is CORRECT but CrowdSec still fails due to permission issues.
4. CrowdSec Start Method is Correct (VERIFIED)
Finding: The executor's Start method correctly uses os/exec without context cancellation:
cmd := exec.Command(binPath, "-c", configFile)
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
The binary starts but immediately crashes due to permission denied errors.
What Was Implemented vs What Actually Happened
| Item | Expected | Actual |
|---|---|---|
| Reconciliation in main.go | ✅ Added | ✅ Called on startup |
| Dockerfile chown for CrowdSec dirs | ✅ Added | ❌ Overwritten at runtime by entrypoint |
| Goroutine removed from routes.go | ✅ Removed | ✅ Confirmed removed |
| Entrypoint permission fix | ❌ Not implemented | ❌ Root operations create root-owned files |
| Config log_dir fix | ❌ Not implemented | ❌ Still pointing to /var/log/ |
REQUIRED FIXES (Specific Code Changes)
FIX 1: Change CrowdSec log_dir in Entrypoint (CRITICAL)
File: .docker/docker-entrypoint.sh
Location: After line 155 (after sed -i 's|listen_uri.*|listen_uri: 127.0.0.1:8085|g')
Add:
# Fix log_dir path - must point to /var/log/crowdsec/ not /var/log/
if [ -f "/etc/crowdsec/config.yaml" ]; then
sed -i 's|log_dir: /var/log/$|log_dir: /var/log/crowdsec/|g' /etc/crowdsec/config.yaml
sed -i 's|log_dir: /var/log/\s*$|log_dir: /var/log/crowdsec/|g' /etc/crowdsec/config.yaml
fi
FIX 2: Run cscli Commands as charon User (CRITICAL)
File: .docker/docker-entrypoint.sh
Change: All cscli commands must run as charon user, not root.
Current (WRONG):
cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration may have failed"
Required (CORRECT):
su-exec charon cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration may have failed"
FIX 3: Run envsubst as charon User (CRITICAL)
File: .docker/docker-entrypoint.sh
Change: The envsubst operations must preserve charon ownership.
Current (WRONG):
for file in /etc/crowdsec/config.yaml /etc/crowdsec/user.yaml; do
if [ -f "$file" ]; then
envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"
fi
done
Required (CORRECT):
for file in /etc/crowdsec/config.yaml /etc/crowdsec/user.yaml; do
if [ -f "$file" ]; then
envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"
chown charon:charon "$file" 2>/dev/null || true
fi
done
FIX 4: Fix Ownership AFTER cscli Operations (CRITICAL)
File: .docker/docker-entrypoint.sh
Location: After all cscli operations, before "CrowdSec configuration initialized" message
Add:
# Fix ownership of files created by cscli (runs as root, creates root-owned files)
# The database and config files must be owned by charon for CrowdSec to start
chown -R charon:charon /var/lib/crowdsec 2>/dev/null || true
chown -R charon:charon /app/data/crowdsec 2>/dev/null || true
chown -R charon:charon /var/log/crowdsec 2>/dev/null || true
FIX 5: Update Default config.yaml in configs/crowdsec/ (PREVENTIVE)
File: configs/crowdsec/config.yaml (if exists) or modify the distributed template
Change: Ensure log_dir is correct in the source template:
common:
daemonize: true
log_media: file
log_level: info
log_dir: /var/log/crowdsec/ # <-- CORRECT PATH
Complete Entrypoint Script Fix
Here's the corrected CrowdSec section for .docker/docker-entrypoint.sh:
# ============================================================================
# CrowdSec Initialization
# ============================================================================
if command -v cscli >/dev/null; then
echo "Initializing CrowdSec configuration..."
# Define persistent paths
CS_PERSIST_DIR="/app/data/crowdsec"
CS_CONFIG_DIR="$CS_PERSIST_DIR/config"
CS_DATA_DIR="$CS_PERSIST_DIR/data"
CS_LOG_DIR="/var/log/crowdsec"
# Ensure persistent directories exist
mkdir -p "$CS_CONFIG_DIR" "$CS_DATA_DIR" "$CS_LOG_DIR" 2>/dev/null || true
mkdir -p /var/lib/crowdsec/data 2>/dev/null || true
# Initialize persistent config if key files are missing
if [ ! -f "$CS_CONFIG_DIR/config.yaml" ]; then
echo "Initializing persistent CrowdSec configuration..."
if [ -d "/etc/crowdsec.dist" ] && [ -n "$(ls -A /etc/crowdsec.dist 2>/dev/null)" ]; then
cp -r /etc/crowdsec.dist/* "$CS_CONFIG_DIR/" || exit 1
echo "Successfully initialized config from .dist directory"
fi
fi
# Create acquisition config
if [ ! -f "/etc/crowdsec/acquis.yaml" ] || [ ! -s "/etc/crowdsec/acquis.yaml" ]; then
cat > /etc/crowdsec/acquis.yaml << 'ACQUIS_EOF'
source: file
filenames:
- /var/log/caddy/access.log
- /var/log/caddy/*.log
labels:
type: caddy
ACQUIS_EOF
fi
# Environment substitution (preserving ownership after)
export CFG=/etc/crowdsec
export DATA="$CS_DATA_DIR"
export PID=/var/run/crowdsec.pid
export LOG="$CS_LOG_DIR/crowdsec.log"
for file in /etc/crowdsec/config.yaml /etc/crowdsec/user.yaml; do
if [ -f "$file" ]; then
envsubst < "$file" > "$file.tmp" && mv "$file.tmp" "$file"
chown charon:charon "$file" 2>/dev/null || true
fi
done
# Configure LAPI port (8085 instead of 8080)
if [ -f "/etc/crowdsec/config.yaml" ]; then
sed -i 's|listen_uri: 127.0.0.1:8080|listen_uri: 127.0.0.1:8085|g' /etc/crowdsec/config.yaml
sed -i 's|listen_uri: 0.0.0.0:8080|listen_uri: 127.0.0.1:8085|g' /etc/crowdsec/config.yaml
# FIX: Correct log_dir path
sed -i 's|log_dir: /var/log/$|log_dir: /var/log/crowdsec/|g' /etc/crowdsec/config.yaml
fi
# Update local_api_credentials.yaml to use correct port
if [ -f "/etc/crowdsec/local_api_credentials.yaml" ]; then
sed -i 's|url: http://127.0.0.1:8080|url: http://127.0.0.1:8085|g' /etc/crowdsec/local_api_credentials.yaml
sed -i 's|url: http://localhost:8080|url: http://127.0.0.1:8085|g' /etc/crowdsec/local_api_credentials.yaml
fi
# Update hub index
if [ ! -f "/etc/crowdsec/hub/.index.json" ]; then
echo "Updating CrowdSec hub index..."
timeout 60s cscli hub update 2>/dev/null || echo "⚠️ Hub update timed out"
fi
# Register local machine (run as charon or fix ownership after)
echo "Registering local machine..."
cscli machines add -a --force 2>/dev/null || echo "Warning: Machine registration failed"
# *** CRITICAL FIX: Fix ownership of ALL CrowdSec files after cscli operations ***
# cscli runs as root and creates root-owned files (crowdsec.db, config files)
# CrowdSec process runs as charon and needs write access
echo "Fixing CrowdSec file ownership..."
chown -R charon:charon /var/lib/crowdsec 2>/dev/null || true
chown -R charon:charon /app/data/crowdsec 2>/dev/null || true
chown -R charon:charon /var/log/crowdsec 2>/dev/null || true
echo "CrowdSec configuration initialized. Agent lifecycle is GUI-controlled."
fi
Testing After Fix
-
Rebuild container:
docker build -t charon:local . && docker compose -f docker-compose.test.yml up -d -
Verify ownership is correct:
docker compose -f docker-compose.test.yml exec charon ls -la /var/lib/crowdsec/data/ # Expected: all files owned by charon:charon -
Check CrowdSec logs for permission errors:
docker compose -f docker-compose.test.yml logs charon 2>&1 | grep -i "permission\|denied\|FATAL" # Expected: no permission errors -
Verify LAPI is listening after manual start:
curl -X POST http://localhost:8080/api/v1/admin/crowdsec/start docker compose -f docker-compose.test.yml exec charon ss -tuln | grep 8085 # Expected: LISTEN on :8085
Success Criteria (Updated)
- All files in
/var/lib/crowdsec/owned bycharon:charon - All files in
/app/data/crowdsec/owned bycharon:charon config.yamlhaslog_dir: /var/log/crowdsec/- No "permission denied" errors in container logs
- CrowdSec LAPI binds to port 8085 successfully
- Manual start via GUI completes without timeout
- Reconciliation on startup works when mode=local
References
- CrowdSec Documentation
- CrowdSec LAPI Reference
- Caddy CrowdSec Bouncer Plugin
- Issue #16: ACL Implementation (related security feature)
Changelog
2025-12-23 - Investigation Update
- Status: FAILED - Previous implementation did not fix root cause
- Finding: Permission errors due to entrypoint running cscli as root
- Finding: log_dir config points to wrong path (/var/log/ vs /var/log/crowdsec/)
- Action: Updated plan with specific entrypoint script fixes
- Priority: Escalated to CRITICAL
2025-12-22 - Initial Plan
- Created initial plan based on code review
- Identified timing issue with goroutine call
- Proposed moving reconciliation to main.go (implemented)