feat: Add emergency token rotation runbook and automation script
- Created a comprehensive runbook for emergency token rotation, detailing when to rotate, prerequisites, and step-by-step procedures. - Included methods for generating secure tokens, updating configurations, and verifying new tokens. - Added an automation script for token rotation to streamline the process. - Implemented compliance checklist and troubleshooting sections for better guidance. test: Implement E2E tests for emergency server and token functionality - Added tests for the emergency server to ensure it operates independently of the main application. - Verified that the emergency server can bypass security controls and reset security settings. - Implemented tests for emergency token validation, rate limiting, and audit logging. - Documented expected behaviors for emergency access and security enforcement. refactor: Introduce security test fixtures for better test management - Created a fixtures file to manage security-related test data and functions. - Included helper functions for enabling/disabling security modules and testing emergency access. - Improved test readability and maintainability by centralizing common logic. test: Enhance emergency token tests for robustness and coverage - Expanded tests to cover various scenarios including token validation, rate limiting, and idempotency. - Ensured that emergency token functionality adheres to security best practices. - Documented expected behaviors and outcomes for clarity in test results.
This commit is contained in:
366
CONTRIBUTING.md
366
CONTRIBUTING.md
@@ -361,6 +361,372 @@ See [QA Coverage Report](docs/reports/qa_crowdsec_frontend_coverage_report.md) f
|
||||
- Bug fixes should include regression tests
|
||||
- CrowdSec modules maintain 100% frontend coverage
|
||||
|
||||
---
|
||||
|
||||
## Testing Emergency Break Glass Protocol
|
||||
|
||||
When contributing changes to security modules (ACL, WAF, Cerberus, Rate Limiting, CrowdSec), you **MUST** test that the emergency break glass protocol still functions correctly. A broken emergency recovery system can lock administrators out of their own systems during production incidents.
|
||||
|
||||
### Why This Matters
|
||||
|
||||
The emergency break glass protocol is a critical safety mechanism. If your changes break emergency access:
|
||||
|
||||
- ❌ Administrators locked out by security modules cannot recover
|
||||
- ❌ Production incidents become catastrophic (no way to regain access)
|
||||
- ❌ System may require physical access or complete rebuild
|
||||
|
||||
**Always test emergency recovery before merging security-related PRs.**
|
||||
|
||||
### Quick Test Procedure
|
||||
|
||||
#### Prerequisites
|
||||
|
||||
```bash
|
||||
# Ensure container is running
|
||||
docker-compose up -d
|
||||
|
||||
# Set emergency token
|
||||
export CHARON_EMERGENCY_TOKEN=test-emergency-token-for-e2e-32chars
|
||||
```
|
||||
|
||||
#### Test 1: Verify Lockout Scenario
|
||||
|
||||
Enable security modules with restrictive settings to simulate a lockout:
|
||||
|
||||
```bash
|
||||
# Enable ACL with restrictive whitelist (via API or database)
|
||||
curl -X POST http://localhost:8080/api/v1/settings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"key": "security.acl.enabled", "value": "true"}'
|
||||
|
||||
# Enable WAF in block mode
|
||||
curl -X POST http://localhost:8080/api/v1/settings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"key": "security.waf.enabled", "value": "true"}'
|
||||
|
||||
# Enable Cerberus
|
||||
curl -X POST http://localhost:8080/api/v1/settings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"key": "feature.cerberus.enabled", "value": "true"}'
|
||||
```
|
||||
|
||||
#### Test 2: Verify You're Locked Out
|
||||
|
||||
Attempt to access a protected endpoint (should fail):
|
||||
|
||||
```bash
|
||||
# Attempt normal access
|
||||
curl http://localhost:8080/api/v1/proxy-hosts
|
||||
|
||||
# Expected response: 403 Forbidden
|
||||
# {
|
||||
# "error": "Blocked by access control list"
|
||||
# }
|
||||
```
|
||||
|
||||
If you're **NOT** blocked, investigate why security isn't working before proceeding.
|
||||
|
||||
#### Test 3: Test Emergency Token Works (Tier 1)
|
||||
|
||||
Use the emergency token to regain access:
|
||||
|
||||
```bash
|
||||
# Send emergency reset request
|
||||
curl -X POST http://localhost:8080/api/v1/emergency/security-reset \
|
||||
-H "X-Emergency-Token: test-emergency-token-for-e2e-32chars" \
|
||||
-H "Content-Type: application/json"
|
||||
|
||||
# Expected response: 200 OK
|
||||
# {
|
||||
# "success": true,
|
||||
# "message": "All security modules have been disabled",
|
||||
# "disabled_modules": [
|
||||
# "feature.cerberus.enabled",
|
||||
# "security.acl.enabled",
|
||||
# "security.waf.enabled",
|
||||
# "security.rate_limit.enabled",
|
||||
# "security.crowdsec.enabled"
|
||||
# ]
|
||||
# }
|
||||
```
|
||||
|
||||
**If this fails:** Your changes broke Tier 1 emergency access. Fix before merging.
|
||||
|
||||
#### Test 4: Verify Lockout is Cleared
|
||||
|
||||
Confirm you can now access protected endpoints:
|
||||
|
||||
```bash
|
||||
# Wait for settings to propagate
|
||||
sleep 5
|
||||
|
||||
# Test normal access (should work now)
|
||||
curl http://localhost:8080/api/v1/proxy-hosts
|
||||
|
||||
# Expected response: 200 OK
|
||||
# [... list of proxy hosts ...]
|
||||
```
|
||||
|
||||
#### Test 5: Test Emergency Server (Tier 2 - Optional)
|
||||
|
||||
If the emergency server is enabled (`CHARON_EMERGENCY_SERVER_ENABLED=true`):
|
||||
|
||||
```bash
|
||||
# Test emergency server health
|
||||
curl http://localhost:2019/health
|
||||
|
||||
# Expected: {"status":"ok","server":"emergency"}
|
||||
|
||||
# Test emergency reset via emergency server
|
||||
curl -X POST http://localhost:2019/emergency/security-reset \
|
||||
-H "X-Emergency-Token: test-emergency-token-for-e2e-32chars" \
|
||||
-u admin:changeme
|
||||
|
||||
# Expected: {"success":true, ...}
|
||||
```
|
||||
|
||||
### Complete Test Script
|
||||
|
||||
Save this as `scripts/test-emergency-access.sh`:
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
GREEN='\033[0;32m'
|
||||
RED='\033[0;31m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m'
|
||||
|
||||
echo -e "${YELLOW}Testing Emergency Break Glass Protocol${NC}"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
|
||||
# Configuration
|
||||
BASE_URL="http://localhost:8080"
|
||||
EMERGENCY_TOKEN="${CHARON_EMERGENCY_TOKEN:-test-emergency-token-for-e2e-32chars}"
|
||||
|
||||
# Test 1: Enable security (create lockout scenario)
|
||||
echo -e "${YELLOW}Test 1: Creating lockout scenario...${NC}"
|
||||
curl -s -X POST "$BASE_URL/api/v1/settings" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"key": "security.acl.enabled", "value": "true"}' > /dev/null
|
||||
|
||||
curl -s -X POST "$BASE_URL/api/v1/settings" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"key": "feature.cerberus.enabled", "value": "true"}' > /dev/null
|
||||
|
||||
sleep 2
|
||||
echo -e "${GREEN}✓ Security enabled${NC}"
|
||||
echo ""
|
||||
|
||||
# Test 2: Verify lockout
|
||||
echo -e "${YELLOW}Test 2: Verifying lockout...${NC}"
|
||||
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" "$BASE_URL/api/v1/proxy-hosts")
|
||||
|
||||
if [ "$RESPONSE" = "403" ]; then
|
||||
echo -e "${GREEN}✓ Lockout confirmed (403 Forbidden)${NC}"
|
||||
else
|
||||
echo -e "${RED}✗ Expected 403, got $RESPONSE${NC}"
|
||||
echo -e "${YELLOW}Warning: Security may not be blocking correctly${NC}"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 3: Emergency token recovery
|
||||
echo -e "${YELLOW}Test 3: Testing emergency token...${NC}"
|
||||
RESPONSE=$(curl -s -X POST "$BASE_URL/api/v1/emergency/security-reset" \
|
||||
-H "X-Emergency-Token: $EMERGENCY_TOKEN" \
|
||||
-H "Content-Type: application/json")
|
||||
|
||||
if echo "$RESPONSE" | grep -q '"success":true'; then
|
||||
echo -e "${GREEN}✓ Emergency token works${NC}"
|
||||
else
|
||||
echo -e "${RED}✗ Emergency token failed${NC}"
|
||||
echo "Response: $RESPONSE"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 4: Verify access restored
|
||||
echo -e "${YELLOW}Test 4: Verifying access restored...${NC}"
|
||||
sleep 5
|
||||
|
||||
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" "$BASE_URL/api/v1/proxy-hosts")
|
||||
|
||||
if [ "$RESPONSE" = "200" ]; then
|
||||
echo -e "${GREEN}✓ Access restored (200 OK)${NC}"
|
||||
else
|
||||
echo -e "${RED}✗ Access not restored, got $RESPONSE${NC}"
|
||||
exit 1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Test 5: Emergency server (if enabled)
|
||||
if curl -s http://localhost:2019/health > /dev/null 2>&1; then
|
||||
echo -e "${YELLOW}Test 5: Testing emergency server...${NC}"
|
||||
|
||||
RESPONSE=$(curl -s http://localhost:2019/health)
|
||||
if echo "$RESPONSE" | grep -q '"server":"emergency"'; then
|
||||
echo -e "${GREEN}✓ Emergency server responding${NC}"
|
||||
else
|
||||
echo -e "${RED}✗ Emergency server not responding correctly${NC}"
|
||||
fi
|
||||
else
|
||||
echo -e "${YELLOW}Test 5: Skipped (emergency server not enabled)${NC}"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
echo "========================================"
|
||||
echo -e "${GREEN}All tests passed! Emergency access is functional.${NC}"
|
||||
```
|
||||
|
||||
Make executable and run:
|
||||
|
||||
```bash
|
||||
chmod +x scripts/test-emergency-access.sh
|
||||
./scripts/test-emergency-access.sh
|
||||
```
|
||||
|
||||
### Integration Test (Go)
|
||||
|
||||
Add to your backend test suite:
|
||||
|
||||
```go
|
||||
func TestEmergencyAccessIntegration(t *testing.T) {
|
||||
// Setup test database and router
|
||||
db := setupTestDB(t)
|
||||
router := setupTestRouter(db)
|
||||
|
||||
// Enable security (create lockout scenario)
|
||||
enableSecurity(t, db)
|
||||
|
||||
// Test 1: Regular endpoint should be blocked
|
||||
req := httptest.NewRequest(http.MethodGET, "/api/v1/proxy-hosts", nil)
|
||||
req.RemoteAddr = "127.0.0.1:12345"
|
||||
w := httptest.NewRecorder()
|
||||
router.ServeHTTP(w, req)
|
||||
|
||||
assert.Equal(t, http.StatusForbidden, w.Code, "Regular access should be blocked")
|
||||
|
||||
// Test 2: Emergency endpoint should work with valid token
|
||||
req = httptest.NewRequest(http.MethodPOST, "/api/v1/emergency/security-reset", nil)
|
||||
req.Header.Set("X-Emergency-Token", "test-emergency-token-for-e2e-32chars")
|
||||
req.RemoteAddr = "127.0.0.1:12345"
|
||||
w = httptest.NewRecorder()
|
||||
router.ServeHTTP(w, req)
|
||||
|
||||
assert.Equal(t, http.StatusOK, w.Code, "Emergency endpoint should work")
|
||||
|
||||
var response map[string]interface{}
|
||||
err := json.Unmarshal(w.Body.Bytes(), &response)
|
||||
require.NoError(t, err)
|
||||
assert.True(t, response["success"].(bool))
|
||||
|
||||
// Test 3: Regular endpoint should work after emergency reset
|
||||
time.Sleep(2 * time.Second)
|
||||
req = httptest.NewRequest(http.MethodGET, "/api/v1/proxy-hosts", nil)
|
||||
req.RemoteAddr = "127.0.0.1:12345"
|
||||
w = httptest.NewRecorder()
|
||||
router.ServeHTTP(w, req)
|
||||
|
||||
assert.Equal(t, http.StatusOK, w.Code, "Access should be restored after emergency reset")
|
||||
}
|
||||
```
|
||||
|
||||
### E2E Test (Playwright)
|
||||
|
||||
Add to your Playwright test suite:
|
||||
|
||||
```typescript
|
||||
import { test, expect } from '@playwright/test'
|
||||
|
||||
test.describe('Emergency Break Glass Protocol', () => {
|
||||
test('should recover from complete security lockout', async ({ request }) => {
|
||||
const baseURL = 'http://localhost:8080'
|
||||
const emergencyToken = 'test-emergency-token-for-e2e-32chars'
|
||||
|
||||
// Step 1: Enable all security modules
|
||||
await request.post(`${baseURL}/api/v1/settings`, {
|
||||
data: { key: 'feature.cerberus.enabled', value: 'true' }
|
||||
})
|
||||
await request.post(`${baseURL}/api/v1/settings`, {
|
||||
data: { key: 'security.acl.enabled', value: 'true' }
|
||||
})
|
||||
|
||||
// Wait for settings to propagate
|
||||
await new Promise(resolve => setTimeout(resolve, 2000))
|
||||
|
||||
// Step 2: Verify lockout (expect 403)
|
||||
const lockedResponse = await request.get(`${baseURL}/api/v1/proxy-hosts`)
|
||||
expect(lockedResponse.status()).toBe(403)
|
||||
|
||||
// Step 3: Use emergency token to recover
|
||||
const emergencyResponse = await request.post(
|
||||
`${baseURL}/api/v1/emergency/security-reset`,
|
||||
{
|
||||
headers: { 'X-Emergency-Token': emergencyToken }
|
||||
}
|
||||
)
|
||||
|
||||
expect(emergencyResponse.status()).toBe(200)
|
||||
const body = await emergencyResponse.json()
|
||||
expect(body.success).toBe(true)
|
||||
expect(body.disabled_modules).toContain('security.acl.enabled')
|
||||
|
||||
// Wait for settings to propagate
|
||||
await new Promise(resolve => setTimeout(resolve, 2000))
|
||||
|
||||
// Step 4: Verify access restored
|
||||
const restoredResponse = await request.get(`${baseURL}/api/v1/proxy-hosts`)
|
||||
expect(restoredResponse.ok()).toBeTruthy()
|
||||
})
|
||||
})
|
||||
```
|
||||
|
||||
### When to Run These Tests
|
||||
|
||||
Run emergency access tests:
|
||||
|
||||
- ✅ **Before every PR** that touches security-related code
|
||||
- ✅ **After modifying** ACL, WAF, Cerberus, or Rate Limiting modules
|
||||
- ✅ **After changing** middleware order or request pipeline
|
||||
- ✅ **After updating** authentication or authorization logic
|
||||
- ✅ **Before releases** to ensure emergency access works in production
|
||||
|
||||
### Troubleshooting Test Failures
|
||||
|
||||
**Emergency token returns 401 Unauthorized:**
|
||||
|
||||
- Verify `CHARON_EMERGENCY_TOKEN` is set correctly
|
||||
- Check token is at least 32 characters
|
||||
- Ensure token matches exactly (no whitespace or line breaks)
|
||||
|
||||
**Emergency token returns 403 Forbidden:**
|
||||
|
||||
- Tier 1 bypass may be blocked at Caddy/CrowdSec layer
|
||||
- Test Tier 2 (emergency server) instead
|
||||
- Check `CHARON_MANAGEMENT_CIDRS` includes your test IP
|
||||
|
||||
**Access not restored after emergency reset:**
|
||||
|
||||
- Check response includes `"success":true`
|
||||
- Verify settings were actually disabled in database
|
||||
- Increase wait time between reset and verification (may need > 5 seconds)
|
||||
- Check logs: `docker logs charon | grep emergency`
|
||||
|
||||
**Emergency server not responding:**
|
||||
|
||||
- Verify `CHARON_EMERGENCY_SERVER_ENABLED=true` in environment
|
||||
- Check port 2019 is exposed in docker-compose.yml
|
||||
- Test with Basic Auth if configured: `curl -u admin:password`
|
||||
|
||||
### Related Documentation
|
||||
|
||||
- [Emergency Lockout Recovery Runbook](docs/runbooks/emergency-lockout-recovery.md)
|
||||
- [Emergency Token Rotation Guide](docs/runbooks/emergency-token-rotation.md)
|
||||
- [Configuration Examples](docs/configuration/emergency-setup.md)
|
||||
- [Break Glass Protocol Design](docs/plans/break_glass_protocol_redesign.md)
|
||||
|
||||
## Adding New Skills
|
||||
|
||||
Charon uses [Agent Skills](https://agentskills.io) for AI-discoverable development tasks. Skills are standardized, self-documenting task definitions that can be executed by humans and AI assistants.
|
||||
|
||||
Reference in New Issue
Block a user