chore: clean .gitignore cache
This commit is contained in:
@@ -1,946 +0,0 @@
|
||||
# Emergency Lockout Recovery Runbook
|
||||
|
||||
**Version:** 1.0
|
||||
**Last Updated:** January 26, 2026
|
||||
**Status:** Production Ready
|
||||
**Severity:** 🔴 CRITICAL
|
||||
|
||||
---
|
||||
|
||||
## Purpose
|
||||
|
||||
This runbook provides step-by-step procedures to regain access to Charon when security modules
|
||||
(ACL, WAF, CrowdSec, Rate Limiting) have blocked legitimate administrative access.
|
||||
|
||||
**When to use this:** You see "403 Forbidden", "Blocked by access control list", or cannot access
|
||||
the Charon web interface.
|
||||
|
||||
---
|
||||
|
||||
## Symptoms: How to Recognize a Lockout
|
||||
|
||||
### Symptom 1: ACL Lockout
|
||||
|
||||
```text
|
||||
HTTP 403 Forbidden
|
||||
{"error": "Blocked by access control list"}
|
||||
```
|
||||
|
||||
**Cause:** Your IP address is not in the ACL whitelist, or is in a blacklist.
|
||||
|
||||
### Symptom 2: WAF Block
|
||||
|
||||
```text
|
||||
HTTP 403 Forbidden
|
||||
{"error": "Request blocked by Web Application Firewall"}
|
||||
```
|
||||
|
||||
**Cause:** Your request triggered a WAF rule (e.g., suspicious pattern in URL or headers).
|
||||
|
||||
### Symptom 3: CrowdSec Ban
|
||||
|
||||
```text
|
||||
HTTP 403 Forbidden
|
||||
{"error": "Your IP has been banned"}
|
||||
```
|
||||
|
||||
**Cause:** CrowdSec flagged your IP as malicious (brute force, scanning, etc.).
|
||||
|
||||
### Symptom 4: Rate Limiting
|
||||
|
||||
```text
|
||||
HTTP 429 Too Many Requests
|
||||
{"error": "Rate limit exceeded"}
|
||||
```
|
||||
|
||||
**Cause:** Too many requests from your IP in a short time period.
|
||||
|
||||
---
|
||||
|
||||
## Test Environment Configuration
|
||||
|
||||
### Rate Limiting in Test Environments
|
||||
|
||||
For test and development environments (`CHARON_ENV=test|e2e|development`), the emergency rate limiter is set to **50 attempts per minute** to facilitate testing and debugging.
|
||||
|
||||
**Production environments** maintain strict rate limiting: **5 attempts per 5 minutes**.
|
||||
|
||||
⚠️ **Security Warning:** Always set `CHARON_ENV=production` (or omit the variable) in production deployments to enforce proper rate limiting.
|
||||
|
||||
### Testing Both Tiers
|
||||
|
||||
E2E tests validate both break glass tiers to ensure defense in depth:
|
||||
|
||||
**Tier 1 (Main Endpoint):**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: $TOKEN"
|
||||
```
|
||||
|
||||
**Tier 2 (Emergency Server):**
|
||||
```bash
|
||||
curl -X POST http://localhost:2020/emergency/security-reset \
|
||||
-H "X-Emergency-Token: $TOKEN" \
|
||||
-u admin:password
|
||||
```
|
||||
|
||||
**Environment Variable Reference:**
|
||||
|
||||
| Environment | Max Attempts | Window | Use Case |
|
||||
|-------------|--------------|--------|----------|
|
||||
| `production` (default) | 5 | 5 minutes | Production deployments |
|
||||
| `test` | 50 | 1 minute | Unit/integration tests |
|
||||
| `e2e` | 50 | 1 minute | E2E test suites |
|
||||
| `development` | 50 | 1 minute | Local development |
|
||||
|
||||
---
|
||||
|
||||
## Recovery Tiers
|
||||
|
||||
Charon provides a **3-Tier Break Glass Protocol**. Start with Tier 1 and escalate if needed.
|
||||
|
||||
| Tier | Method | Use When | Prerequisites |
|
||||
| ---- | ------ | -------- | ------------- |
|
||||
| **Tier 1** | Emergency Token (Digital Key) | Application accessible | Emergency token, management network access |
|
||||
| **Tier 2** | Emergency Server (Sidecar Door) | Caddy/CrowdSec blocking | SSH access, emergency server enabled |
|
||||
| **Tier 3** | Direct System Access (Physical Key) | Complete failure | SSH/console access to host |
|
||||
|
||||
---
|
||||
|
||||
## Tier 1: Digital Key (Emergency Token)
|
||||
|
||||
**Use when:** The Charon application is reachable, but security middleware is blocking you.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- ✅ Emergency token value (64-char hex string from `CHARON_EMERGENCY_TOKEN`)
|
||||
- ✅ HTTPS connection to Charon (HTTP also works for local development)
|
||||
- ✅ Source IP in management network (default: RFC1918 private IPs)
|
||||
|
||||
### Step-by-Step Procedure
|
||||
|
||||
#### Step 1: Retrieve Emergency Token
|
||||
|
||||
The emergency token is configured via the `CHARON_EMERGENCY_TOKEN` environment variable:
|
||||
|
||||
```bash
|
||||
# If using docker-compose.yml
|
||||
grep CHARON_EMERGENCY_TOKEN docker-compose.yml
|
||||
|
||||
# If using .env file
|
||||
grep CHARON_EMERGENCY_TOKEN .env
|
||||
|
||||
# From running container
|
||||
docker exec charon env | grep CHARON_EMERGENCY_TOKEN
|
||||
|
||||
# From secrets manager (example: AWS)
|
||||
aws secretsmanager get-secret-value --secret-id charon/emergency-token
|
||||
```
|
||||
|
||||
**Security Note:** Store this token in a password manager or secrets management system.
|
||||
|
||||
#### Step 2: Send Emergency Reset Request
|
||||
|
||||
```bash
|
||||
# Basic usage
|
||||
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: your-64-char-hex-token-here" \
|
||||
-H "Content-Type: application/json"
|
||||
```
|
||||
|
||||
**Expected Response (Success):**
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "All security modules have been disabled",
|
||||
"disabled_modules": [
|
||||
"feature.cerberus.enabled",
|
||||
"security.acl.enabled",
|
||||
"security.waf.enabled",
|
||||
"security.rate_limit.enabled",
|
||||
"security.crowdsec.enabled"
|
||||
],
|
||||
"timestamp": "2026-01-26T10:30:45Z"
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Wait for Settings Propagation
|
||||
|
||||
Security settings update immediately, but allow 5 seconds for full propagation:
|
||||
|
||||
```bash
|
||||
sleep 5
|
||||
```
|
||||
|
||||
#### Step 4: Verify Access Restored
|
||||
|
||||
```bash
|
||||
# Test health endpoint
|
||||
curl https://charon.example.com/api/v1/health
|
||||
|
||||
# Expected response
|
||||
{"status": "ok", "version": "1.0.0"}
|
||||
```
|
||||
|
||||
#### Step 5: Access Web Interface
|
||||
|
||||
Open your browser and navigate to:
|
||||
|
||||
```text
|
||||
https://charon.example.com:8080
|
||||
```
|
||||
|
||||
You should now have full access to the Charon management interface.
|
||||
|
||||
### Troubleshooting Tier 1
|
||||
|
||||
#### Error: 403 Forbidden (before reset)
|
||||
|
||||
**Symptom:** Emergency reset endpoint returns 403 before you can submit the token.
|
||||
|
||||
**Cause:** Tier 1 is blocked at the Caddy/CrowdSec layer (Layer 7 reverse proxy).
|
||||
|
||||
**Solution:** Proceed to [Tier 2: Emergency Server](#tier-2-sidecar-door-emergency-server).
|
||||
|
||||
#### Error: 401 Unauthorized
|
||||
|
||||
**Symptom:** Emergency reset returns 401 with message "Invalid emergency token".
|
||||
|
||||
**Cause:** Token mismatch - the token you provided doesn't match `CHARON_EMERGENCY_TOKEN`.
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Verify token value from configuration
|
||||
2. Check for extra whitespace or line breaks
|
||||
3. Ensure token is at least 32 characters long
|
||||
4. Regenerate token if necessary (see [Token Rotation Guide](./emergency-token-rotation.md))
|
||||
|
||||
#### Error: 429 Too Many Requests
|
||||
|
||||
**Symptom:** Emergency reset returns 429 with message "Rate limit exceeded".
|
||||
|
||||
**Cause:** Too many failed emergency token attempts (5 per minute per IP).
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Wait 60 seconds for rate limit to reset
|
||||
2. Verify token value before retrying
|
||||
3. Use Tier 2 if you cannot wait
|
||||
|
||||
#### Error: 501 Not Implemented
|
||||
|
||||
**Symptom:** Emergency reset returns 501 with message "Emergency token not configured".
|
||||
|
||||
**Cause:** `CHARON_EMERGENCY_TOKEN` environment variable is not set.
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Use [Tier 2: Emergency Server](#tier-2-sidecar-door-emergency-server)
|
||||
2. Or use [Tier 3: Direct System Access](#tier-3-physical-key-direct-system-access) to set the token
|
||||
|
||||
#### Error: Source IP Not in Management Network
|
||||
|
||||
**Symptom:** 403 with message "Emergency access denied: IP not in management network".
|
||||
|
||||
**Cause:** Your IP is not in the allowed management CIDRs (default: RFC1918 private IPs).
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Connect via VPN to access management network
|
||||
2. Use SSH tunnel from allowed IP (see Tier 2)
|
||||
3. Update `CHARON_MANAGEMENT_CIDRS` to include your IP (requires Tier 3 access)
|
||||
|
||||
---
|
||||
|
||||
## Tier 2: Sidecar Door (Emergency Server)
|
||||
|
||||
**Use when:** Tier 1 is blocked at the Caddy/CrowdSec layer, or you need a separate entry point.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- ✅ VPN or SSH access to Docker host
|
||||
- ✅ Emergency server enabled (`CHARON_EMERGENCY_SERVER_ENABLED=true`)
|
||||
- ✅ Knowledge of emergency server port (default: 2019)
|
||||
- ✅ Basic Auth credentials (if configured)
|
||||
|
||||
### Architecture Diagram
|
||||
|
||||
```text
|
||||
[Public Traffic:443] [SSH Tunnel:2019]
|
||||
↓ ↓
|
||||
[Caddy Reverse Proxy] [Emergency Server]
|
||||
↓ (WAF, ACL, CrowdSec) ↓ (Minimal Security)
|
||||
[Main Application:8080] [Emergency Handlers]
|
||||
↓ ↓
|
||||
[BLOCKED] [DIRECT ACCESS ✅]
|
||||
```
|
||||
|
||||
### Step-by-Step Procedure
|
||||
|
||||
#### Step 1: SSH to Docker Host
|
||||
|
||||
```bash
|
||||
# SSH to server
|
||||
ssh admin@docker-host.example.com
|
||||
```
|
||||
|
||||
#### Step 2: Verify Emergency Server is Running
|
||||
|
||||
```bash
|
||||
# Check container environment
|
||||
docker exec charon env | grep EMERGENCY
|
||||
|
||||
# Expected output
|
||||
CHARON_EMERGENCY_SERVER_ENABLED=true
|
||||
CHARON_EMERGENCY_BIND=127.0.0.1:2019
|
||||
CHARON_EMERGENCY_USERNAME=admin
|
||||
CHARON_EMERGENCY_PASSWORD=<password>
|
||||
```
|
||||
|
||||
#### Step 3: Create SSH Tunnel
|
||||
|
||||
**From your local machine**, create a tunnel to the emergency port:
|
||||
|
||||
```bash
|
||||
# Open tunnel (port 2019 on localhost → port 2019 on server)
|
||||
ssh -L 2019:localhost:2019 admin@docker-host.example.com
|
||||
|
||||
# Keep this terminal open - tunnel stays active
|
||||
```
|
||||
|
||||
#### Step 4: Test Emergency Server Health
|
||||
|
||||
**From your local machine** (in a new terminal):
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:2019/health
|
||||
|
||||
# Expected response
|
||||
{"status":"ok","server":"emergency"}
|
||||
```
|
||||
|
||||
#### Step 5: Send Emergency Reset Request
|
||||
|
||||
```bash
|
||||
# With Basic Auth
|
||||
curl -X POST http://localhost:2019/emergency/security-reset \
|
||||
-H "X-Emergency-Token: your-64-char-hex-token-here" \
|
||||
-u admin:your-emergency-password
|
||||
|
||||
# Without Basic Auth (if not configured)
|
||||
curl -X POST http://localhost:2019/emergency/security-reset \
|
||||
-H "X-Emergency-Token: your-64-char-hex-token-here"
|
||||
```
|
||||
|
||||
**Expected Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "All security modules have been disabled",
|
||||
"disabled_modules": [...]
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 6: Verify Access Restored
|
||||
|
||||
```bash
|
||||
# Test main application
|
||||
curl https://charon.example.com/api/v1/health
|
||||
```
|
||||
|
||||
#### Step 7: Close SSH Tunnel
|
||||
|
||||
```bash
|
||||
# In the terminal with the open tunnel, press Ctrl+C
|
||||
# Or use the kill command
|
||||
kill $SSH_TUNNEL_PID
|
||||
```
|
||||
|
||||
### Troubleshooting Tier 2
|
||||
|
||||
#### Error: Connection Refused (Port 2019)
|
||||
|
||||
**Cause:** Emergency server is not enabled or not running.
|
||||
|
||||
**Verification:**
|
||||
|
||||
```bash
|
||||
# Check if emergency server is enabled
|
||||
docker exec charon env | grep CHARON_EMERGENCY_SERVER_ENABLED
|
||||
|
||||
# Check if port is listening
|
||||
docker exec charon netstat -tlnp | grep 2019
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Enable emergency server in `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- CHARON_EMERGENCY_SERVER_ENABLED=true
|
||||
- CHARON_EMERGENCY_BIND=127.0.0.1:2019
|
||||
```
|
||||
|
||||
1. Restart container:
|
||||
|
||||
```bash
|
||||
docker-compose restart charon
|
||||
```
|
||||
|
||||
#### Error: 401 Unauthorized (Basic Auth)
|
||||
|
||||
**Cause:** Basic Auth credentials are incorrect.
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Verify credentials from configuration:
|
||||
|
||||
```bash
|
||||
docker exec charon env | grep CHARON_EMERGENCY_
|
||||
```
|
||||
|
||||
1. Reset password in `docker-compose.yml` if needed
|
||||
|
||||
#### Error: SSH Tunnel Fails
|
||||
|
||||
**Cause:** Firewall blocking SSH port 22, or SSH service not running.
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Verify SSH service is running:
|
||||
|
||||
```bash
|
||||
systemctl status sshd
|
||||
```
|
||||
|
||||
1. Check firewall rules allow SSH:
|
||||
|
||||
```bash
|
||||
sudo ufw status | grep 22
|
||||
```
|
||||
|
||||
1. Use alternative port if 22 is blocked:
|
||||
|
||||
```bash
|
||||
ssh -p 2222 -L 2019:localhost:2019 admin@server
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tier 3: Physical Key (Direct System Access)
|
||||
|
||||
**Use when:** All application-level recovery methods have failed, or you need to perform system-level repairs.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- ✅ Root or sudo access to Docker host
|
||||
- ✅ Knowledge of container name (default: `charon` or `charon-e2e`)
|
||||
- ✅ Backup access credentials (in case database needs restoration)
|
||||
|
||||
### Recovery Methods
|
||||
|
||||
#### Method 1: Clear CrowdSec Bans
|
||||
|
||||
If you're blocked by CrowdSec:
|
||||
|
||||
```bash
|
||||
# SSH to host
|
||||
ssh admin@docker-host.example.com
|
||||
|
||||
# List all bans
|
||||
docker exec charon cscli decisions list
|
||||
|
||||
# Delete specific ban
|
||||
docker exec charon cscli decisions delete --ip YOUR_IP
|
||||
|
||||
# Delete ALL bans (use with caution)
|
||||
docker exec charon cscli decisions delete --all
|
||||
|
||||
# Verify decisions are cleared
|
||||
docker exec charon cscli decisions list
|
||||
# Should show: No decisions found
|
||||
```
|
||||
|
||||
#### Method 2: Direct Database Access
|
||||
|
||||
Disable security modules directly in the database:
|
||||
|
||||
```bash
|
||||
# Access SQLite database
|
||||
docker exec -it charon sqlite3 /app/data/charon.db
|
||||
|
||||
# Disable all security modules
|
||||
sqlite> UPDATE settings SET value = 'false' WHERE key = 'feature.cerberus.enabled';
|
||||
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.acl.enabled';
|
||||
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.waf.enabled';
|
||||
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.rate_limit.enabled';
|
||||
sqlite> UPDATE settings SET value = 'false' WHERE key = 'security.crowdsec.enabled';
|
||||
|
||||
# Update SecurityConfig table
|
||||
sqlite> UPDATE security_configs SET enabled = 0;
|
||||
|
||||
# Verify changes
|
||||
sqlite> SELECT key, value FROM settings WHERE key LIKE 'security.%';
|
||||
|
||||
# Exit SQLite
|
||||
sqlite> .quit
|
||||
```
|
||||
|
||||
#### Method 3: Restart with Security Disabled
|
||||
|
||||
Temporarily disable all security features:
|
||||
|
||||
```bash
|
||||
# Stop container
|
||||
docker stop charon
|
||||
|
||||
# Add environment override to docker-compose.yml
|
||||
# Or start with inline environment variable
|
||||
docker start charon -e CERBERUS_DISABLED=true
|
||||
|
||||
# Alternative: Edit docker-compose.yml
|
||||
vim docker-compose.yml
|
||||
# Add: - CERBERUS_DISABLED=true
|
||||
|
||||
# Restart container
|
||||
docker-compose up -d charon
|
||||
```
|
||||
|
||||
#### Method 4: Kill Caddy to Bypass Reverse Proxy
|
||||
|
||||
If CrowdSec is blocking at Caddy layer:
|
||||
|
||||
```bash
|
||||
# Stop Caddy process (temporary)
|
||||
docker exec charon pkill caddy
|
||||
|
||||
# Warning: This breaks TLS termination
|
||||
# Only use for emergency access, then restart:
|
||||
docker restart charon
|
||||
```
|
||||
|
||||
#### Method 5: Docker Volume Inspection
|
||||
|
||||
Inspect and modify data without running the container:
|
||||
|
||||
```bash
|
||||
# Find Charon data volume
|
||||
docker volume ls | grep charon
|
||||
|
||||
# Mount volume to temporary container
|
||||
docker run --rm -it -v charon_data:/data alpine sh
|
||||
|
||||
# Navigate to database
|
||||
cd /data
|
||||
|
||||
# Use SQLite (if installed in Alpine)
|
||||
apk add sqlite
|
||||
sqlite3 charon.db
|
||||
|
||||
# Or copy database out for external editing
|
||||
exit
|
||||
docker cp charon:/app/data/charon.db ~/charon-backup.db
|
||||
```
|
||||
|
||||
### Catastrophic Recovery: Destroy and Recreate
|
||||
|
||||
> ⚠️ **WARNING**: Last resort only - you will lose all configuration data
|
||||
|
||||
#### Step 1: Backup Everything
|
||||
|
||||
```bash
|
||||
# Backup database
|
||||
docker exec charon tar czf /tmp/backup.tar.gz /app/data
|
||||
docker cp charon:/tmp/backup.tar.gz ~/charon-backup-$(date +%Y%m%d-%H%M%S).tar.gz
|
||||
|
||||
# Record current configuration
|
||||
docker inspect charon > ~/charon-inspect-$(date +%Y%m%d-%H%M%S).json
|
||||
```
|
||||
|
||||
#### Step 2: Destroy Container and Volume
|
||||
|
||||
```bash
|
||||
# Stop and remove container
|
||||
docker stop charon
|
||||
docker rm charon
|
||||
|
||||
# DANGER: Remove data volume (all configuration will be lost)
|
||||
docker volume rm charon_data
|
||||
```
|
||||
|
||||
#### Step 3: Recreate with Fresh Configuration
|
||||
|
||||
```bash
|
||||
# Recreate container
|
||||
docker-compose up -d charon
|
||||
|
||||
# Wait for initialization
|
||||
sleep 10
|
||||
|
||||
# Access with default credentials (if auth is implemented)
|
||||
curl http://localhost:8080/api/v1/health
|
||||
```
|
||||
|
||||
#### Step 4: Restore from Backup (Optional)
|
||||
|
||||
```bash
|
||||
# Stop container
|
||||
docker stop charon
|
||||
|
||||
# Extract backup
|
||||
tar xzf ~/charon-backup-YYYYMMDD-HHMMSS.tar.gz -C /tmp
|
||||
|
||||
# Copy database back
|
||||
docker cp /tmp/app/data/charon.db charon:/app/data/charon.db
|
||||
|
||||
# Start container
|
||||
docker start charon
|
||||
```
|
||||
|
||||
### Troubleshooting Tier 3
|
||||
|
||||
#### Error: Permission Denied (SQLite)
|
||||
|
||||
**Cause:** Database file is owned by the container user, not root.
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Use docker exec instead of direct file access
|
||||
docker exec -it charon sh -c "sqlite3 /app/data/charon.db 'UPDATE settings SET value=\"false\" WHERE key=\"security.acl.enabled\"'"
|
||||
```
|
||||
|
||||
#### Error: Container Won't Start After Database Changes
|
||||
|
||||
**Cause:** Database corruption or invalid schema.
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Check container logs:
|
||||
|
||||
```bash
|
||||
docker logs charon --tail 50
|
||||
```
|
||||
|
||||
1. Restore from automated backup:
|
||||
|
||||
```bash
|
||||
# List backups
|
||||
docker exec charon ls -la /app/data/backups/
|
||||
|
||||
# Restore latest backup
|
||||
docker exec charon cp /app/data/backups/charon_backup_YYYYMMDD_030000.db /app/data/charon.db
|
||||
|
||||
# Restart container
|
||||
docker restart charon
|
||||
```
|
||||
|
||||
#### Error: Volume Not Found
|
||||
|
||||
**Cause:** Volume was deleted or never created.
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Recreate volume
|
||||
docker volume create charon_data
|
||||
|
||||
# Restart container with new volume
|
||||
docker-compose up -d charon
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Recovery Tasks
|
||||
|
||||
After regaining access, perform these tasks to prevent future lockouts:
|
||||
|
||||
### Task 1: Review Audit Logs
|
||||
|
||||
Analyze what caused the lockout:
|
||||
|
||||
```bash
|
||||
# View recent security events
|
||||
curl http://localhost:8080/api/v1/audit-logs | jq
|
||||
|
||||
# Filter for security events
|
||||
docker exec charon grep -i "acl_deny\|waf_block\|crowdsec" /var/log/charon.log
|
||||
```
|
||||
|
||||
**Look for:**
|
||||
|
||||
- Repeated blocks of your IP
|
||||
- Triggered WAF rules
|
||||
- CrowdSec ban reasons
|
||||
|
||||
### Task 2: Adjust ACL Rules
|
||||
|
||||
If ACL caused the lockout:
|
||||
|
||||
1. Navigate to **Cerberus → Access Lists**
|
||||
2. Review ACL rules that blocked you
|
||||
3. Add your IP to whitelist:
|
||||
- Create new ACL: "Admin Whitelist"
|
||||
- Type: IP Whitelist
|
||||
- IP Ranges: `YOUR_IP/32`
|
||||
- Assign to all critical hosts
|
||||
4. Save configuration
|
||||
|
||||
### Task 3: Rotate Emergency Token (If Compromised)
|
||||
|
||||
If you suspect the emergency token was exposed:
|
||||
|
||||
1. Generate new token:
|
||||
|
||||
```bash
|
||||
openssl rand -hex 32
|
||||
```
|
||||
|
||||
1. Update configuration:
|
||||
|
||||
```bash
|
||||
# Edit docker-compose.yml
|
||||
vim docker-compose.yml
|
||||
# Change CHARON_EMERGENCY_TOKEN value
|
||||
|
||||
# Restart container
|
||||
docker-compose up -d charon
|
||||
```
|
||||
|
||||
1. See [Emergency Token Rotation Guide](./emergency-token-rotation.md) for detailed steps
|
||||
|
||||
### Task 4: Document the Incident
|
||||
|
||||
Create incident report:
|
||||
|
||||
```markdown
|
||||
# Security Lockout Incident Report
|
||||
|
||||
**Date:** YYYY-MM-DD HH:MM
|
||||
**Severity:** Critical / High / Medium / Low
|
||||
**Duration:** X minutes/hours
|
||||
|
||||
## Incident Summary
|
||||
Brief description of what happened
|
||||
|
||||
## Root Cause
|
||||
Why the lockout occurred
|
||||
|
||||
## Recovery Method Used
|
||||
Which tier was used to recover
|
||||
|
||||
## Lessons Learned
|
||||
What we learned from this incident
|
||||
|
||||
## Action Items
|
||||
- [ ] Adjust ACL rules
|
||||
- [ ] Update documentation
|
||||
- [ ] Train team on recovery procedures
|
||||
- [ ] Implement additional monitoring
|
||||
```
|
||||
|
||||
### Task 5: Update Monitoring/Alerting
|
||||
|
||||
Set up alerts to prevent future lockouts:
|
||||
|
||||
1. Navigate to **Cerberus → Notification Settings**
|
||||
2. Configure webhook or email notifications
|
||||
3. Enable alerts for:
|
||||
- High rate of ACL denials
|
||||
- Admin IP blocks
|
||||
- Emergency token usage
|
||||
4. Test notification delivery
|
||||
|
||||
### Task 6: Review Management Network Configuration
|
||||
|
||||
Ensure your management networks are properly configured:
|
||||
|
||||
```bash
|
||||
# Check current CIDRS
|
||||
docker exec charon env | grep CHARON_MANAGEMENT_CIDRS
|
||||
|
||||
# Update in docker-compose.yml
|
||||
vim docker-compose.yml
|
||||
```
|
||||
|
||||
Add your office/VPN subnets:
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
- CHARON_MANAGEMENT_CIDRS=10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,YOUR_OFFICE_SUBNET
|
||||
```
|
||||
|
||||
### Task 7: Test Recovery Procedures
|
||||
|
||||
Schedule quarterly drills to practice recovery:
|
||||
|
||||
```bash
|
||||
# Test Tier 1
|
||||
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: $CHARON_EMERGENCY_TOKEN"
|
||||
|
||||
# Test Tier 2 (if enabled)
|
||||
ssh -L 2019:localhost:2019 admin@server
|
||||
curl http://localhost:2019/health
|
||||
|
||||
# Test Tier 3 (in staging environment)
|
||||
docker exec charon cscli decisions list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Card
|
||||
|
||||
### One-Page Emergency Cheat Sheet
|
||||
|
||||
```bash
|
||||
# ---------- TIER 1: EMERGENCY TOKEN ----------
|
||||
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: $CHARON_EMERGENCY_TOKEN"
|
||||
|
||||
# ---------- TIER 2: EMERGENCY SERVER ----------
|
||||
# 1. SSH tunnel
|
||||
ssh -L 2019:localhost:2019 admin@server.example.com
|
||||
|
||||
# 2. Reset via emergency port
|
||||
curl -X POST http://localhost:2019/emergency/security-reset \
|
||||
-H "X-Emergency-Token: $CHARON_EMERGENCY_TOKEN" \
|
||||
-u admin:password
|
||||
|
||||
# ---------- TIER 3: DIRECT ACCESS ----------
|
||||
# SSH to host
|
||||
ssh admin@docker-host.example.com
|
||||
|
||||
# Clear CrowdSec bans
|
||||
docker exec charon cscli decisions delete --all
|
||||
|
||||
# Disable security via database
|
||||
docker exec charon sqlite3 /app/data/charon.db \
|
||||
"UPDATE settings SET value='false' WHERE key LIKE 'security.%.enabled';"
|
||||
|
||||
# Restart container
|
||||
docker restart charon
|
||||
|
||||
# ---------- VERIFICATION ----------
|
||||
# Test health endpoint
|
||||
curl http://localhost:8080/api/v1/health
|
||||
|
||||
# Check logs
|
||||
docker logs charon --tail 50
|
||||
|
||||
# Verify security is disabled
|
||||
curl http://localhost:8080/api/v1/settings | grep security
|
||||
```
|
||||
|
||||
### Emergency Contacts
|
||||
|
||||
| Role | Contact | Purpose |
|
||||
| ---- | ------- | ------- |
|
||||
| Platform Team | `platform@example.com` | Infrastructure issues |
|
||||
| Security Team | `security@example.com` | Security policy questions |
|
||||
| On-Call Engineer | `oncall@example.com` | 24/7 emergency support |
|
||||
|
||||
### Critical Environment Variables
|
||||
|
||||
```bash
|
||||
# Emergency access
|
||||
CHARON_EMERGENCY_TOKEN=<64-char-hex>
|
||||
CHARON_MANAGEMENT_CIDRS=10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
|
||||
|
||||
# Emergency server (Tier 2)
|
||||
CHARON_EMERGENCY_SERVER_ENABLED=true
|
||||
CHARON_EMERGENCY_BIND=127.0.0.1:2019
|
||||
CHARON_EMERGENCY_USERNAME=admin
|
||||
CHARON_EMERGENCY_PASSWORD=<password>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix A: Recovery Decision Tree
|
||||
|
||||
```text
|
||||
START: Cannot access Charon web interface
|
||||
↓
|
||||
Can you reach https://charon.example.com?
|
||||
├─ YES → Try Tier 1 (Emergency Token)
|
||||
│ ↓
|
||||
│ Success?
|
||||
│ ├─ YES → [END] Access restored
|
||||
│ └─ NO → Try Tier 2 (Emergency Server)
|
||||
│ ↓
|
||||
│ Success?
|
||||
│ ├─ YES → [END] Access restored
|
||||
│ └─ NO → Proceed to Tier 3
|
||||
│
|
||||
└─ NO → Network issue or container down
|
||||
↓
|
||||
Check container status
|
||||
├─ Container running → Proceed to Tier 3
|
||||
└─ Container down → Start container, then Tier 1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Appendix B: Common Error Codes
|
||||
|
||||
| Code | Message | Cause | Solution |
|
||||
| ---- | ------- | ----- | -------- |
|
||||
| 403 | Blocked by access control list | ACL blocking IP | Use Tier 1 or adjust ACL |
|
||||
| 403 | Request blocked by WAF | WAF rule triggered | Use Tier 1 or disable WAF |
|
||||
| 403 | Your IP has been banned | CrowdSec ban | Use Tier 3 to clear bans |
|
||||
| 401 | Invalid emergency token | Token mismatch | Verify token value |
|
||||
| 429 | Rate limit exceeded | Too many attempts | Wait 60 seconds |
|
||||
| 501 | Emergency token not configured | Token not set | Use Tier 3 to set token |
|
||||
| 500 | Internal server error | Application error | Check logs, use Tier 3 |
|
||||
|
||||
---
|
||||
|
||||
## Appendix C: Testing Checklist
|
||||
|
||||
Use this checklist to validate recovery procedures:
|
||||
|
||||
**Tier 1 Testing:**
|
||||
|
||||
- [ ] Emergency token retrieved from secure storage
|
||||
- [ ] Token works from allowed IP (RFC1918)
|
||||
- [ ] Token blocked from public IP
|
||||
- [ ] Rate limiting works (5 attempts per minute)
|
||||
- [ ] Audit logs capture emergency access
|
||||
- [ ] Settings disabled successfully
|
||||
|
||||
**Tier 2 Testing:**
|
||||
|
||||
- [ ] SSH tunnel established successfully
|
||||
- [ ] Emergency server health endpoint responds
|
||||
- [ ] Basic Auth works (if configured)
|
||||
- [ ] Emergency reset works via tunnel
|
||||
- [ ] Tunnel closes cleanly
|
||||
|
||||
**Tier 3 Testing:**
|
||||
|
||||
- [ ] CrowdSec decisions cleared
|
||||
- [ ] Database modifications persist
|
||||
- [ ] Container restarts successfully
|
||||
- [ ] Backup and restore works
|
||||
- [ ] Logs show expected behavior
|
||||
|
||||
---
|
||||
|
||||
**Related Documentation:**
|
||||
|
||||
- [Emergency Token Rotation](./emergency-token-rotation.md)
|
||||
- [Break Glass Protocol Design](../plans/break_glass_protocol_redesign.md)
|
||||
- [Security Documentation](../security.md)
|
||||
- [Configuration Guide](../configuration/emergency-setup.md)
|
||||
|
||||
---
|
||||
|
||||
**Version History:**
|
||||
|
||||
- v1.0 (2026-01-26): Initial release
|
||||
- Author: Charon Project Team
|
||||
- Maintained by: Security & Operations Team
|
||||
@@ -1,502 +0,0 @@
|
||||
# Emergency Token Rotation Runbook
|
||||
|
||||
**Version:** 1.0
|
||||
**Last Updated:** January 26, 2026
|
||||
**Purpose:** Secure procedure for rotating the emergency break glass token
|
||||
|
||||
---
|
||||
|
||||
## When to Rotate
|
||||
|
||||
Rotate the emergency token in these situations:
|
||||
|
||||
- **Scheduled rotation** — Every 90 days (recommended)
|
||||
- **After use** — Token was used during an incident
|
||||
- **Suspected compromise** — Token may have been exposed in logs, screenshots, or shared insecurely
|
||||
- **Personnel changes** — Team member with token access has left
|
||||
- **Security audit** — Compliance requirement or security policy mandate
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- ✅ Access to Charon configuration (`docker-compose.yml` or secrets manager)
|
||||
- ✅ Ability to restart Charon container
|
||||
- ✅ Write access to secrets management system (if used)
|
||||
- ✅ Documentation of where token is stored
|
||||
|
||||
---
|
||||
|
||||
## Step-by-Step Rotation Procedure
|
||||
|
||||
### Step 1: Generate New Token
|
||||
|
||||
Generate a cryptographically secure 64-character hex token:
|
||||
|
||||
```bash
|
||||
# Using OpenSSL (recommended)
|
||||
openssl rand -hex 32
|
||||
|
||||
# Example output
|
||||
a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0u1v2w3x4y5z6a7b8c9d0e1f2
|
||||
|
||||
# Using /dev/urandom
|
||||
head -c 32 /dev/urandom | xxd -p -c 64
|
||||
|
||||
# Using Python
|
||||
python3 -c "import secrets; print(secrets.token_hex(32))"
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
|
||||
- Minimum 32 characters (produces 64-char hex)
|
||||
- Use cryptographically secure random generator
|
||||
- Never reuse old tokens
|
||||
|
||||
### Step 2: Document Token Securely
|
||||
|
||||
Store the new token in your secrets management system:
|
||||
|
||||
**HashiCorp Vault:**
|
||||
|
||||
```bash
|
||||
vault kv put secret/charon/emergency-token \
|
||||
token='NEW_TOKEN_HERE'
|
||||
```
|
||||
|
||||
**AWS Secrets Manager:**
|
||||
|
||||
```bash
|
||||
aws secretsmanager update-secret \
|
||||
--secret-id charon/emergency-token \
|
||||
--secret-string 'NEW_TOKEN_HERE'
|
||||
```
|
||||
|
||||
**Azure Key Vault:**
|
||||
|
||||
```bash
|
||||
az keyvault secret set \
|
||||
--vault-name charon-vault \
|
||||
--name emergency-token \
|
||||
--value 'NEW_TOKEN_HERE'
|
||||
```
|
||||
|
||||
**Password Manager:**
|
||||
|
||||
- Store in "Charon Emergency Access" entry
|
||||
- Add expiration reminder (90 days)
|
||||
- Share with authorized personnel only
|
||||
|
||||
### Step 3: Update Docker Compose Configuration
|
||||
|
||||
#### Option A: Environment Variable (Less Secure)
|
||||
|
||||
Edit `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
charon:
|
||||
environment:
|
||||
- CHARON_EMERGENCY_TOKEN=NEW_TOKEN_HERE # <-- Update this
|
||||
```
|
||||
|
||||
#### Option B: Docker Secrets (More Secure)
|
||||
|
||||
```yaml
|
||||
services:
|
||||
charon:
|
||||
secrets:
|
||||
- charon_emergency_token
|
||||
environment:
|
||||
- CHARON_EMERGENCY_TOKEN_FILE=/run/secrets/charon_emergency_token
|
||||
|
||||
secrets:
|
||||
charon_emergency_token:
|
||||
external: true
|
||||
```
|
||||
|
||||
Create Docker secret:
|
||||
|
||||
```bash
|
||||
echo "NEW_TOKEN_HERE" | docker secret create charon_emergency_token -
|
||||
```
|
||||
|
||||
#### Option C: Environment File (Recommended)
|
||||
|
||||
Create `.env` file (add to `.gitignore`):
|
||||
|
||||
```bash
|
||||
# .env
|
||||
CHARON_EMERGENCY_TOKEN=NEW_TOKEN_HERE
|
||||
```
|
||||
|
||||
Update `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
charon:
|
||||
env_file:
|
||||
- .env
|
||||
```
|
||||
|
||||
### Step 4: Restart Charon Container
|
||||
|
||||
```bash
|
||||
# Using docker-compose
|
||||
docker-compose down
|
||||
docker-compose up -d
|
||||
|
||||
# Or restart existing container
|
||||
docker-compose restart charon
|
||||
|
||||
# Verify container started successfully
|
||||
docker logs charon --tail 20
|
||||
```
|
||||
|
||||
**Expected log output:**
|
||||
|
||||
```text
|
||||
[INFO] Emergency token configured (64 characters)
|
||||
[INFO] Emergency bypass middleware enabled
|
||||
[INFO] Management CIDRs: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
|
||||
```
|
||||
|
||||
### Step 5: Verify New Token Works
|
||||
|
||||
Test the new token from an allowed IP:
|
||||
|
||||
```bash
|
||||
# Test emergency reset endpoint
|
||||
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: NEW_TOKEN_HERE" \
|
||||
-H "Content-Type: application/json"
|
||||
|
||||
# Expected response
|
||||
{
|
||||
"success": true,
|
||||
"message": "All security modules have been disabled",
|
||||
"disabled_modules": [...]
|
||||
}
|
||||
```
|
||||
|
||||
**If testing in production:** Re-enable security immediately:
|
||||
|
||||
```bash
|
||||
# Navigate to Cerberus Dashboard
|
||||
# Toggle security modules back ON
|
||||
```
|
||||
|
||||
### Step 6: Verify Old Token is Revoked
|
||||
|
||||
Test that the old token no longer works:
|
||||
|
||||
```bash
|
||||
# Test with old token (should fail)
|
||||
curl -X POST https://charon.example.com/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: OLD_TOKEN_HERE" \
|
||||
-H "Content-Type: application/json"
|
||||
|
||||
# Expected response (401 Unauthorized)
|
||||
{
|
||||
"error": "Invalid emergency token",
|
||||
"code": 401
|
||||
}
|
||||
```
|
||||
|
||||
### Step 7: Update Documentation
|
||||
|
||||
Update all locations where the token is documented:
|
||||
|
||||
- [ ] Password manager entry
|
||||
- [ ] Secrets management system
|
||||
- [ ] Runbooks (if token is referenced)
|
||||
- [ ] Team wiki or internal docs
|
||||
- [ ] Incident response procedures
|
||||
- [ ] Backup/recovery documentation
|
||||
|
||||
### Step 8: Notify Team
|
||||
|
||||
Inform authorized personnel:
|
||||
|
||||
```markdown
|
||||
Subject: [ACTION REQUIRED] Charon Emergency Token Rotated
|
||||
|
||||
The Charon emergency break glass token has been rotated as part of our regular security maintenance.
|
||||
|
||||
**Action Required:**
|
||||
- Update your local password manager with the new token
|
||||
- Retrieve new token from: [secrets management location]
|
||||
- Old token is no longer valid as of: [timestamp]
|
||||
|
||||
**Next Rotation:** [90 days from now]
|
||||
|
||||
If you need access to the new token, contact: [security team contact]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Rotation (Compromise Suspected)
|
||||
|
||||
If the token has been compromised, follow this expedited procedure:
|
||||
|
||||
### Immediate Actions (within 1 hour)
|
||||
|
||||
1. **Rotate token immediately** (Steps 1-5 above)
|
||||
2. **Review audit logs** for unauthorized emergency access:
|
||||
|
||||
```bash
|
||||
# Check for emergency token usage
|
||||
docker logs charon | grep -i "emergency"
|
||||
|
||||
# Check audit logs
|
||||
curl http://localhost:8080/api/v1/audit-logs | jq '.[] | select(.action | contains("emergency"))'
|
||||
```
|
||||
|
||||
1. **Alert security team** if unauthorized access detected
|
||||
2. **Disable compromised accounts** that may have used the token
|
||||
|
||||
### Investigation (within 24 hours)
|
||||
|
||||
1. **Determine exposure scope:**
|
||||
- Was token in logs or screenshots?
|
||||
- Was token shared via insecure channel (email, Slack)?
|
||||
- Who had access to the token?
|
||||
- Was token committed to version control?
|
||||
|
||||
2. **Check for signs of abuse:**
|
||||
- Review recent configuration changes
|
||||
- Check for new proxy hosts or certificates
|
||||
- Verify ACL rules haven't been modified
|
||||
- Review CrowdSec decision history
|
||||
|
||||
3. **Document incident:**
|
||||
- Create incident report
|
||||
- Timeline of exposure
|
||||
- Impact assessment
|
||||
- Remediation actions taken
|
||||
|
||||
### Remediation
|
||||
|
||||
1. **Revoke access** for compromised accounts
|
||||
2. **Rotate all related secrets** (database passwords, API keys)
|
||||
3. **Implement additional controls:**
|
||||
- Require 2FA for emergency access (future enhancement)
|
||||
- Implement emergency token session limits
|
||||
- Add approval workflow for emergency access
|
||||
4. **Update policies** to prevent future exposure
|
||||
|
||||
---
|
||||
|
||||
## Automation Script
|
||||
|
||||
Save this script as `rotate-emergency-token.sh`:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
echo -e "${GREEN}Charon Emergency Token Rotation Script${NC}"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
|
||||
# Step 1: Generate new token
|
||||
echo -e "${YELLOW}Step 1: Generating new token...${NC}"
|
||||
NEW_TOKEN=$(openssl rand -hex 32)
|
||||
echo "New token generated: ${NEW_TOKEN:0:16}...${NEW_TOKEN: -16}"
|
||||
echo ""
|
||||
|
||||
# Step 2: Backup old configuration
|
||||
echo -e "${YELLOW}Step 2: Backing up current configuration...${NC}"
|
||||
BACKUP_FILE="docker-compose.yml.backup-$(date +%Y%m%d-%H%M%S)"
|
||||
cp docker-compose.yml "$BACKUP_FILE"
|
||||
echo "Backup saved to: $BACKUP_FILE"
|
||||
echo ""
|
||||
|
||||
# Step 3: Update docker-compose.yml
|
||||
echo -e "${YELLOW}Step 3: Updating docker-compose.yml...${NC}"
|
||||
sed -i.bak "s/CHARON_EMERGENCY_TOKEN=.*/CHARON_EMERGENCY_TOKEN=${NEW_TOKEN}/" docker-compose.yml
|
||||
echo "Configuration updated"
|
||||
echo ""
|
||||
|
||||
# Step 4: Restart container
|
||||
echo -e "${YELLOW}Step 4: Restarting Charon container...${NC}"
|
||||
docker-compose restart charon
|
||||
sleep 5
|
||||
echo "Container restarted"
|
||||
echo ""
|
||||
|
||||
# Step 5: Verify new token
|
||||
echo -e "${YELLOW}Step 5: Verifying new token...${NC}"
|
||||
RESPONSE=$(curl -s -X POST http://localhost:8080/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: ${NEW_TOKEN}" \
|
||||
-H "Content-Type: application/json")
|
||||
|
||||
if echo "$RESPONSE" | grep -q '"success":true'; then
|
||||
echo -e "${GREEN}✓ New token verified successfully${NC}"
|
||||
else
|
||||
echo -e "${RED}✗ Token verification failed${NC}"
|
||||
echo "Response: $RESPONSE"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Step 6: Save token securely
|
||||
echo -e "${YELLOW}Step 6: Token rotation complete${NC}"
|
||||
echo ""
|
||||
echo "========================================"
|
||||
echo -e "${GREEN}NEXT STEPS:${NC}"
|
||||
echo "1. Save new token to password manager:"
|
||||
echo " ${NEW_TOKEN}"
|
||||
echo ""
|
||||
echo "2. Update secrets manager (Vault, AWS, Azure)"
|
||||
echo "3. Notify team of token rotation"
|
||||
echo "4. Test old token is revoked"
|
||||
echo "5. Schedule next rotation: $(date -d '+90 days' +%Y-%m-%d)"
|
||||
echo "========================================"
|
||||
```
|
||||
|
||||
Make executable:
|
||||
|
||||
```bash
|
||||
chmod +x rotate-emergency-token.sh
|
||||
```
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
./rotate-emergency-token.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compliance Checklist
|
||||
|
||||
For organizations with compliance requirements:
|
||||
|
||||
- [ ] Token rotation documented in change log
|
||||
- [ ] Rotation approved by security team
|
||||
- [ ] Old token marked as revoked in secrets manager
|
||||
- [ ] Access to new token limited to authorized personnel
|
||||
- [ ] Token rotation logged in audit trail
|
||||
- [ ] Backup configuration saved securely
|
||||
- [ ] Team notification sent and acknowledged
|
||||
- [ ] Next rotation scheduled (90 days)
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Error: New Token Not Working After Rotation
|
||||
|
||||
**Symptom:** New token returns 401 Unauthorized.
|
||||
|
||||
**Causes:**
|
||||
|
||||
1. Token not saved correctly in configuration
|
||||
2. Container not restarted after update
|
||||
3. Token contains whitespace or line breaks
|
||||
4. Environment variable not exported
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Verify environment variable
|
||||
docker exec charon env | grep CHARON_EMERGENCY_TOKEN
|
||||
|
||||
# Check logs for token loading
|
||||
docker logs charon | grep -i "emergency token"
|
||||
|
||||
# Restart container
|
||||
docker-compose restart charon
|
||||
```
|
||||
|
||||
### Error: Container Won't Start After Update
|
||||
|
||||
**Symptom:** Container exits immediately after restart.
|
||||
|
||||
**Cause:** Malformed docker-compose.yml or invalid token format.
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Validate docker-compose.yml syntax
|
||||
docker-compose config
|
||||
|
||||
# Restore backup
|
||||
cp docker-compose.yml.backup docker-compose.yml
|
||||
|
||||
# Fix syntax error
|
||||
vim docker-compose.yml
|
||||
|
||||
# Restart
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
### Error: Lost Access to Old Token
|
||||
|
||||
**Symptom:** Need to verify old token is revoked, but don't have it.
|
||||
|
||||
**Solution:**
|
||||
|
||||
```bash
|
||||
# Check backup configuration
|
||||
grep CHARON_EMERGENCY_TOKEN docker-compose.yml.backup-*
|
||||
|
||||
# Or check container environment (if not restarted)
|
||||
docker exec charon env | grep CHARON_EMERGENCY_TOKEN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **Never commit tokens to version control**
|
||||
- Add to `.gitignore`: `.env`, `docker-compose.override.yml`
|
||||
- Use pre-commit hooks to scan for secrets
|
||||
- Use `git-secrets` or `trufflehog`
|
||||
|
||||
2. **Use secrets management systems**
|
||||
- HashiCorp Vault
|
||||
- AWS Secrets Manager
|
||||
- Azure Key Vault
|
||||
- Kubernetes Secrets (with encryption at rest)
|
||||
|
||||
3. **Limit token access**
|
||||
- Only senior engineers and ops team
|
||||
- Require 2FA for secrets manager access
|
||||
- Audit who accesses the token
|
||||
|
||||
4. **Rotate regularly**
|
||||
- Every 90 days (at minimum)
|
||||
- After any security incident
|
||||
- When team members leave
|
||||
|
||||
5. **Monitor emergency token usage**
|
||||
- Set up alerts for emergency access
|
||||
- Review audit logs weekly
|
||||
- Investigate any unexpected usage
|
||||
|
||||
6. **Test recovery procedures**
|
||||
- Quarterly disaster recovery drills
|
||||
- Verify backup token storage works
|
||||
- Ensure team knows how to retrieve token
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Emergency Lockout Recovery Runbook](./emergency-lockout-recovery.md)
|
||||
- [Security Documentation](../security.md)
|
||||
- [Configuration Guide](../configuration/emergency-setup.md)
|
||||
|
||||
---
|
||||
|
||||
**Version History:**
|
||||
|
||||
- v1.0 (2026-01-26): Initial release
|
||||
Reference in New Issue
Block a user