22 KiB
Executable File
Emergency Lockout Recovery Runbook
Version: 1.0 Last Updated: January 26, 2026 Status: Production Ready Severity: 🔴 CRITICAL
Purpose
This runbook provides step-by-step procedures to regain access to Charon when security modules (ACL, WAF, CrowdSec, Rate Limiting) have blocked legitimate administrative access.
When to use this: You see "403 Forbidden", "Blocked by access control list", or cannot access the Charon web interface.
Symptoms: How to Recognize a Lockout
Symptom 1: ACL Lockout
HTTP 403 Forbidden
{"error": "Blocked by access control list"}
Cause: Your IP address is not in the ACL whitelist, or is in a blacklist.
Symptom 2: WAF Block
HTTP 403 Forbidden
{"error": "Request blocked by Web Application Firewall"}
Cause: Your request triggered a WAF rule (e.g., suspicious pattern in URL or headers).
Symptom 3: CrowdSec Ban
HTTP 403 Forbidden
{"error": "Your IP has been banned"}
Cause: CrowdSec flagged your IP as malicious (brute force, scanning, etc.).
Symptom 4: Rate Limiting
HTTP 429 Too Many Requests
{"error": "Rate limit exceeded"}
Cause: Too many requests from your IP in a short time period.
Test Environment Configuration
Rate Limiting in Test Environments
For test and development environments (CHARON_ENV=test|e2e|development), the emergency rate limiter is set to 50 attempts per minute to facilitate testing and debugging.
Production environments maintain strict rate limiting: 5 attempts per 5 minutes.
⚠️ Security Warning: Always set CHARON_ENV=production (or omit the variable) in production deployments to enforce proper rate limiting.
Testing Both Tiers
E2E tests validate both break glass tiers to ensure defense in depth:
Tier 1 (Main Endpoint):
curl -X POST http://localhost:8080/api/v1/emergency/security-reset \
-H "X-Emergency-Token: $TOKEN"
Tier 2 (Emergency Server):
curl -X POST http://localhost:2020/emergency/security-reset \
-H "X-Emergency-Token: $TOKEN" \
-u admin:password
Environment Variable Reference:
| Environment | Max Attempts | Window | Use Case |
|---|---|---|---|
production (default) |
5 | 5 minutes | Production deployments |
test |
50 | 1 minute | Unit/integration tests |
e2e |
50 | 1 minute | E2E test suites |
development |
50 | 1 minute | Local development |
Recovery Tiers
Charon provides a 3-Tier Break Glass Protocol. Start with Tier 1 and escalate if needed.
| Tier | Method | Use When | Prerequisites |
|---|---|---|---|
| Tier 1 | Emergency Token (Digital Key) | Application accessible | Emergency token, management network access |
| Tier 2 | Emergency Server (Sidecar Door) | Caddy/CrowdSec blocking | SSH access, emergency server enabled |
| Tier 3 | Direct System Access (Physical Key) | Complete failure | SSH/console access to host |
Tier 1: Digital Key (Emergency Token)
Use when: The Charon application is reachable, but security middleware is blocking you.
Prerequisites
- ✅ Emergency token value (64-char hex string from
CHARON_EMERGENCY_TOKEN) - ✅ HTTPS connection to Charon (HTTP also works for local development)
- ✅ Source IP in management network (default: RFC1918 private IPs)
Step-by-Step Procedure
Step 1: Retrieve Emergency Token
The emergency token is configured via the CHARON_EMERGENCY_TOKEN environment variable:
# If using docker-compose.yml
grep CHARON_EMERGENCY_TOKEN docker-compose.yml
# If using .env file
grep CHARON_EMERGENCY_TOKEN .env
# From running container
docker exec charon env | grep CHARON_EMERGENCY_TOKEN
# From secrets manager (example: AWS)
aws secretsmanager get-secret-value --secret-id charon/emergency-token
Security Note: Store this token in a password manager or secrets management system.
Step 2: Send Emergency Reset Request
# Basic usage
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
-H "X-Emergency-Token: your-64-char-hex-token-here" \
-H "Content-Type: application/json"
Expected Response (Success):
{
"success": true,
"message": "All security modules have been disabled",
"disabled_modules": [
"feature.cerberus.enabled",
"security.acl.enabled",
"security.waf.enabled",
"security.rate_limit.enabled",
"security.crowdsec.enabled"
],
"timestamp": "2026-01-26T10:30:45Z"
}
Step 3: Wait for Settings Propagation
Security settings update immediately, but allow 5 seconds for full propagation:
sleep 5
Step 4: Verify Access Restored
# Test health endpoint
curl https://charon.example.com/api/v1/health
# Expected response
{"status": "ok", "version": "1.0.0"}
Step 5: Access Web Interface
Open your browser and navigate to:
https://charon.example.com:8080
You should now have full access to the Charon management interface.
Troubleshooting Tier 1
Error: 403 Forbidden (before reset)
Symptom: Emergency reset endpoint returns 403 before you can submit the token.
Cause: Tier 1 is blocked at the Caddy/CrowdSec layer (Layer 7 reverse proxy).
Solution: Proceed to Tier 2: Emergency Server.
Error: 401 Unauthorized
Symptom: Emergency reset returns 401 with message "Invalid emergency token".
Cause: Token mismatch - the token you provided doesn't match CHARON_EMERGENCY_TOKEN.
Solution:
- Verify token value from configuration
- Check for extra whitespace or line breaks
- Ensure token is at least 32 characters long
- Regenerate token if necessary (see Token Rotation Guide)
Error: 429 Too Many Requests
Symptom: Emergency reset returns 429 with message "Rate limit exceeded".
Cause: Too many failed emergency token attempts (5 per minute per IP).
Solution:
- Wait 60 seconds for rate limit to reset
- Verify token value before retrying
- Use Tier 2 if you cannot wait
Error: 501 Not Implemented
Symptom: Emergency reset returns 501 with message "Emergency token not configured".
Cause: CHARON_EMERGENCY_TOKEN environment variable is not set.
Solution:
- Use Tier 2: Emergency Server
- Or use Tier 3: Direct System Access to set the token
Error: Source IP Not in Management Network
Symptom: 403 with message "Emergency access denied: IP not in management network".
Cause: Your IP is not in the allowed management CIDRs (default: RFC1918 private IPs).
Solution:
- Connect via VPN to access management network
- Use SSH tunnel from allowed IP (see Tier 2)
- Update
CHARON_MANAGEMENT_CIDRSto include your IP (requires Tier 3 access)
Tier 2: Sidecar Door (Emergency Server)
Use when: Tier 1 is blocked at the Caddy/CrowdSec layer, or you need a separate entry point.
Prerequisites
- ✅ VPN or SSH access to Docker host
- ✅ Emergency server enabled (
CHARON_EMERGENCY_SERVER_ENABLED=true) - ✅ Knowledge of emergency server port (default: 2019)
- ✅ Basic Auth credentials (if configured)
Architecture Diagram
[Public Traffic:443] [SSH Tunnel:2019]
↓ ↓
[Caddy Reverse Proxy] [Emergency Server]
↓ (WAF, ACL, CrowdSec) ↓ (Minimal Security)
[Main Application:8080] [Emergency Handlers]
↓ ↓
[BLOCKED] [DIRECT ACCESS ✅]
Step-by-Step Procedure
Step 1: SSH to Docker Host
# SSH to server
ssh admin@docker-host.example.com
Step 2: Verify Emergency Server is Running
# Check container environment
docker exec charon env | grep EMERGENCY
# Expected output
CHARON_EMERGENCY_SERVER_ENABLED=true
CHARON_EMERGENCY_BIND=127.0.0.1:2019
CHARON_EMERGENCY_USERNAME=admin
CHARON_EMERGENCY_PASSWORD=<password>
Step 3: Create SSH Tunnel
From your local machine, create a tunnel to the emergency port:
# Open tunnel (port 2019 on localhost → port 2019 on server)
ssh -L 2019:localhost:2019 admin@docker-host.example.com
# Keep this terminal open - tunnel stays active
Step 4: Test Emergency Server Health
From your local machine (in a new terminal):
# Health check
curl http://localhost:2019/health
# Expected response
{"status":"ok","server":"emergency"}
Step 5: Send Emergency Reset Request
# With Basic Auth
curl -X POST http://localhost:2019/emergency/security-reset \
-H "X-Emergency-Token: your-64-char-hex-token-here" \
-u admin:your-emergency-password
# Without Basic Auth (if not configured)
curl -X POST http://localhost:2019/emergency/security-reset \
-H "X-Emergency-Token: your-64-char-hex-token-here"
Expected Response:
{
"success": true,
"message": "All security modules have been disabled",
"disabled_modules": [...]
}
Step 6: Verify Access Restored
# Test main application
curl https://charon.example.com/api/v1/health
Step 7: Close SSH Tunnel
# In the terminal with the open tunnel, press Ctrl+C
# Or use the kill command
kill $SSH_TUNNEL_PID
Troubleshooting Tier 2
Error: Connection Refused (Port 2019)
Cause: Emergency server is not enabled or not running.
Verification:
# Check if emergency server is enabled
docker exec charon env | grep CHARON_EMERGENCY_SERVER_ENABLED
# Check if port is listening
docker exec charon netstat -tlnp | grep 2019
Solution:
- Enable emergency server in
docker-compose.yml:
environment:
- CHARON_EMERGENCY_SERVER_ENABLED=true
- CHARON_EMERGENCY_BIND=127.0.0.1:2019
- Restart container:
docker-compose restart charon
Error: 401 Unauthorized (Basic Auth)
Cause: Basic Auth credentials are incorrect.
Solution:
- Verify credentials from configuration:
docker exec charon env | grep CHARON_EMERGENCY_
- Reset password in
docker-compose.ymlif needed
Error: SSH Tunnel Fails
Cause: Firewall blocking SSH port 22, or SSH service not running.
Solution:
- Verify SSH service is running:
systemctl status sshd
- Check firewall rules allow SSH:
sudo ufw status | grep 22
- Use alternative port if 22 is blocked:
ssh -p 2222 -L 2019:localhost:2019 admin@server
Tier 3: Physical Key (Direct System Access)
Use when: All application-level recovery methods have failed, or you need to perform system-level repairs.
Prerequisites
- ✅ Root or sudo access to Docker host
- ✅ Knowledge of container name (default:
charonorcharon-e2e) - ✅ Backup access credentials (in case database needs restoration)
Recovery Methods
Method 1: Clear CrowdSec Bans
If you're blocked by CrowdSec:
# SSH to host
ssh admin@docker-host.example.com
# List all bans
docker exec charon cscli decisions list
# Delete specific ban
docker exec charon cscli decisions delete --ip YOUR_IP
# Delete ALL bans (use with caution)
docker exec charon cscli decisions delete --all
# Verify decisions are cleared
docker exec charon cscli decisions list
# Should show: No decisions found
Method 2: Direct Database Access
Disable security modules directly in the database:
# Access SQLite database
docker exec -it charon sqlite3 /app/data/charon.db
# Disable all security modules
sqlite> UPDATE settings SET value = 'false' WHERE key = 'feature.cerberus.enabled';
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.acl.enabled';
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.waf.enabled';
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.rate_limit.enabled';
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.crowdsec.enabled';
# Update SecurityConfig table
sqlite> UPDATE security_configs SET enabled = 0;
# Verify changes
sqlite> SELECT key, value FROM settings WHERE key LIKE 'security.%';
# Exit SQLite
sqlite> .quit
Method 3: Restart with Security Disabled
Temporarily disable all security features:
# Stop container
docker stop charon
# Add environment override to docker-compose.yml
# Or start with inline environment variable
docker start charon -e CERBERUS_DISABLED=true
# Alternative: Edit docker-compose.yml
vim docker-compose.yml
# Add: - CERBERUS_DISABLED=true
# Restart container
docker-compose up -d charon
Method 4: Kill Caddy to Bypass Reverse Proxy
If CrowdSec is blocking at Caddy layer:
# Stop Caddy process (temporary)
docker exec charon pkill caddy
# Warning: This breaks TLS termination
# Only use for emergency access, then restart:
docker restart charon
Method 5: Docker Volume Inspection
Inspect and modify data without running the container:
# Find Charon data volume
docker volume ls | grep charon
# Mount volume to temporary container
docker run --rm -it -v charon_data:/data alpine sh
# Navigate to database
cd /data
# Use SQLite (if installed in Alpine)
apk add sqlite
sqlite3 charon.db
# Or copy database out for external editing
exit
docker cp charon:/app/data/charon.db ~/charon-backup.db
Catastrophic Recovery: Destroy and Recreate
⚠️ WARNING: Last resort only - you will lose all configuration data
Step 1: Backup Everything
# Backup database
docker exec charon tar czf /tmp/backup.tar.gz /app/data
docker cp charon:/tmp/backup.tar.gz ~/charon-backup-$(date +%Y%m%d-%H%M%S).tar.gz
# Record current configuration
docker inspect charon > ~/charon-inspect-$(date +%Y%m%d-%H%M%S).json
Step 2: Destroy Container and Volume
# Stop and remove container
docker stop charon
docker rm charon
# DANGER: Remove data volume (all configuration will be lost)
docker volume rm charon_data
Step 3: Recreate with Fresh Configuration
# Recreate container
docker-compose up -d charon
# Wait for initialization
sleep 10
# Access with default credentials (if auth is implemented)
curl http://localhost:8080/api/v1/health
Step 4: Restore from Backup (Optional)
# Stop container
docker stop charon
# Extract backup
tar xzf ~/charon-backup-YYYYMMDD-HHMMSS.tar.gz -C /tmp
# Copy database back
docker cp /tmp/app/data/charon.db charon:/app/data/charon.db
# Start container
docker start charon
Troubleshooting Tier 3
Error: Permission Denied (SQLite)
Cause: Database file is owned by the container user, not root.
Solution:
# Use docker exec instead of direct file access
docker exec -it charon sh -c "sqlite3 /app/data/charon.db 'UPDATE settings SET value=\"false\" WHERE key=\"security.acl.enabled\"'"
Error: Container Won't Start After Database Changes
Cause: Database corruption or invalid schema.
Solution:
- Check container logs:
docker logs charon --tail 50
- Restore from automated backup:
# List backups
docker exec charon ls -la /app/data/backups/
# Restore latest backup
docker exec charon cp /app/data/backups/charon_backup_YYYYMMDD_030000.db /app/data/charon.db
# Restart container
docker restart charon
Error: Volume Not Found
Cause: Volume was deleted or never created.
Solution:
# Recreate volume
docker volume create charon_data
# Restart container with new volume
docker-compose up -d charon
Post-Recovery Tasks
After regaining access, perform these tasks to prevent future lockouts:
Task 1: Review Audit Logs
Analyze what caused the lockout:
# View recent security events
curl http://localhost:8080/api/v1/audit-logs | jq
# Filter for security events
docker exec charon grep -i "acl_deny\|waf_block\|crowdsec" /var/log/charon.log
Look for:
- Repeated blocks of your IP
- Triggered WAF rules
- CrowdSec ban reasons
Task 2: Adjust ACL Rules
If ACL caused the lockout:
- Navigate to Cerberus → Access Lists
- Review ACL rules that blocked you
- Add your IP to whitelist:
- Create new ACL: "Admin Whitelist"
- Type: IP Whitelist
- IP Ranges:
YOUR_IP/32 - Assign to all critical hosts
- Save configuration
Task 3: Rotate Emergency Token (If Compromised)
If you suspect the emergency token was exposed:
- Generate new token:
openssl rand -hex 32
- Update configuration:
# Edit docker-compose.yml
vim docker-compose.yml
# Change CHARON_EMERGENCY_TOKEN value
# Restart container
docker-compose up -d charon
- See Emergency Token Rotation Guide for detailed steps
Task 4: Document the Incident
Create incident report:
# Security Lockout Incident Report
**Date:** YYYY-MM-DD HH:MM
**Severity:** Critical / High / Medium / Low
**Duration:** X minutes/hours
## Incident Summary
Brief description of what happened
## Root Cause
Why the lockout occurred
## Recovery Method Used
Which tier was used to recover
## Lessons Learned
What we learned from this incident
## Action Items
- [ ] Adjust ACL rules
- [ ] Update documentation
- [ ] Train team on recovery procedures
- [ ] Implement additional monitoring
Task 5: Update Monitoring/Alerting
Set up alerts to prevent future lockouts:
- Navigate to Cerberus → Notification Settings
- Configure webhook or email notifications
- Enable alerts for:
- High rate of ACL denials
- Admin IP blocks
- Emergency token usage
- Test notification delivery
Task 6: Review Management Network Configuration
Ensure your management networks are properly configured:
# Check current CIDRS
docker exec charon env | grep CHARON_MANAGEMENT_CIDRS
# Update in docker-compose.yml
vim docker-compose.yml
Add your office/VPN subnets:
environment:
- CHARON_MANAGEMENT_CIDRS=10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,YOUR_OFFICE_SUBNET
Task 7: Test Recovery Procedures
Schedule quarterly drills to practice recovery:
# Test Tier 1
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
-H "X-Emergency-Token: $CHARON_EMERGENCY_TOKEN"
# Test Tier 2 (if enabled)
ssh -L 2019:localhost:2019 admin@server
curl http://localhost:2019/health
# Test Tier 3 (in staging environment)
docker exec charon cscli decisions list
Quick Reference Card
One-Page Emergency Cheat Sheet
# ---------- TIER 1: EMERGENCY TOKEN ----------
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
-H "X-Emergency-Token: $CHARON_EMERGENCY_TOKEN"
# ---------- TIER 2: EMERGENCY SERVER ----------
# 1. SSH tunnel
ssh -L 2019:localhost:2019 admin@server.example.com
# 2. Reset via emergency port
curl -X POST http://localhost:2019/emergency/security-reset \
-H "X-Emergency-Token: $CHARON_EMERGENCY_TOKEN" \
-u admin:password
# ---------- TIER 3: DIRECT ACCESS ----------
# SSH to host
ssh admin@docker-host.example.com
# Clear CrowdSec bans
docker exec charon cscli decisions delete --all
# Disable security via database
docker exec charon sqlite3 /app/data/charon.db \
"UPDATE settings SET value='false' WHERE key LIKE 'security.%.enabled';"
# Restart container
docker restart charon
# ---------- VERIFICATION ----------
# Test health endpoint
curl http://localhost:8080/api/v1/health
# Check logs
docker logs charon --tail 50
# Verify security is disabled
curl http://localhost:8080/api/v1/settings | grep security
Emergency Contacts
| Role | Contact | Purpose |
|---|---|---|
| Platform Team | platform@example.com |
Infrastructure issues |
| Security Team | security@example.com |
Security policy questions |
| On-Call Engineer | oncall@example.com |
24/7 emergency support |
Critical Environment Variables
# Emergency access
CHARON_EMERGENCY_TOKEN=<64-char-hex>
CHARON_MANAGEMENT_CIDRS=10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
# Emergency server (Tier 2)
CHARON_EMERGENCY_SERVER_ENABLED=true
CHARON_EMERGENCY_BIND=127.0.0.1:2019
CHARON_EMERGENCY_USERNAME=admin
CHARON_EMERGENCY_PASSWORD=<password>
Appendix A: Recovery Decision Tree
START: Cannot access Charon web interface
↓
Can you reach https://charon.example.com?
├─ YES → Try Tier 1 (Emergency Token)
│ ↓
│ Success?
│ ├─ YES → [END] Access restored
│ └─ NO → Try Tier 2 (Emergency Server)
│ ↓
│ Success?
│ ├─ YES → [END] Access restored
│ └─ NO → Proceed to Tier 3
│
└─ NO → Network issue or container down
↓
Check container status
├─ Container running → Proceed to Tier 3
└─ Container down → Start container, then Tier 1
Appendix B: Common Error Codes
| Code | Message | Cause | Solution |
|---|---|---|---|
| 403 | Blocked by access control list | ACL blocking IP | Use Tier 1 or adjust ACL |
| 403 | Request blocked by WAF | WAF rule triggered | Use Tier 1 or disable WAF |
| 403 | Your IP has been banned | CrowdSec ban | Use Tier 3 to clear bans |
| 401 | Invalid emergency token | Token mismatch | Verify token value |
| 429 | Rate limit exceeded | Too many attempts | Wait 60 seconds |
| 501 | Emergency token not configured | Token not set | Use Tier 3 to set token |
| 500 | Internal server error | Application error | Check logs, use Tier 3 |
Appendix C: Testing Checklist
Use this checklist to validate recovery procedures:
Tier 1 Testing:
- Emergency token retrieved from secure storage
- Token works from allowed IP (RFC1918)
- Token blocked from public IP
- Rate limiting works (5 attempts per minute)
- Audit logs capture emergency access
- Settings disabled successfully
Tier 2 Testing:
- SSH tunnel established successfully
- Emergency server health endpoint responds
- Basic Auth works (if configured)
- Emergency reset works via tunnel
- Tunnel closes cleanly
Tier 3 Testing:
- CrowdSec decisions cleared
- Database modifications persist
- Container restarts successfully
- Backup and restore works
- Logs show expected behavior
Related Documentation:
Version History:
- v1.0 (2026-01-26): Initial release
- Author: Charon Project Team
- Maintained by: Security & Operations Team