Files
Charon/docs/security-incident-response.md
GitHub Actions e0a39518ba chore: migrate Docker base images from Alpine to Debian Trixie
Migrated all Docker stages from Alpine 3.23 to Debian Trixie (13) to
address critical CVE in Alpine's gosu package and improve security
update frequency.

Key changes:

Updated CADDY_IMAGE to debian:trixie-slim
Added gosu-builder stage to compile gosu 1.17 from source with Go 1.25.6
Migrated all builder stages to golang:1.25-trixie
Updated package manager from apk to apt-get
Updated user/group creation to use groupadd/useradd
Changed nologin path from /sbin/nologin to /usr/sbin/nologin
Security impact:

Resolved gosu Critical CVE (built from source eliminates vulnerable Go stdlib)
Reduced overall CVE count from 6 (bookworm) to 2 (trixie)
Remaining 2 CVEs are glibc-related with no upstream fix available
All Go binaries verified vulnerability-free by Trivy and govulncheck
Verification:

E2E tests: 243 passed (5 pre-existing failures unrelated to migration)
Backend coverage: 87.2%
Frontend coverage: 85.89%
Pre-commit hooks: 13/13 passed
TypeScript: 0 errors
Refs: CVE-2026-0861 (glibc, no upstream fix - accepted risk)
2026-01-20 06:11:59 +00:00

401 lines
9.4 KiB
Markdown

---
title: Security Incident Response Plan
description: Industry-standard incident response procedures for Charon deployments, including detection, containment, recovery, and post-incident review.
---
## Security Incident Response Plan (SIRP)
This document provides a structured approach to handling security incidents in Charon deployments. Following these procedures ensures consistent, effective responses that minimize damage and recovery time.
---
## Incident Classification
### Severity Levels
| Level | Name | Description | Response Time | Examples |
|-------|------|-------------|---------------|----------|
| **P1** | Critical | Active exploitation, data breach, or complete service compromise | Immediate (< 15 min) | Confirmed data exfiltration, ransomware, root access compromise |
| **P2** | High | Attempted exploitation, security control bypass, or significant vulnerability | < 1 hour | WAF bypass detected, brute-force attack in progress, credential stuffing |
| **P3** | Medium | Suspicious activity, minor vulnerability, or policy violation | < 4 hours | Unusual traffic patterns, failed authentication spike, misconfiguration |
| **P4** | Low | Informational security events, minor policy deviations | < 24 hours | Routine blocked requests, scanner traffic, expired certificates |
### Classification Criteria
**Escalate to P1 immediately if:**
- ❌ Confirmed unauthorized access to sensitive data
- ❌ Active malware or ransomware detected
- ❌ Complete loss of security controls
- ❌ Evidence of data exfiltration
- ❌ Critical infrastructure compromise
**Escalate to P2 if:**
- ⚠️ Multiple failed bypass attempts from same source
- ⚠️ Vulnerability actively being probed
- ⚠️ Partial security control failure
- ⚠️ Credential compromise suspected
---
## Detection Methods
### Automated Detection
**Cerberus Security Dashboard:**
1. Navigate to **Cerberus → Dashboard**
2. Monitor the **Live Activity** section for real-time events
3. Review **Security → Decisions** for blocked requests
4. Check alert notifications (Discord, Slack, email)
**Key Indicators to Monitor:**
- Sudden spike in blocked requests
- Multiple blocks from same IP/network
- WAF rules triggering on unusual patterns
- CrowdSec decisions for known threat actors
- Rate limiting thresholds exceeded
**Log Analysis:**
```bash
# View recent security events
docker logs charon | grep -E "(BLOCK|DENY|ERROR)" | tail -100
# Check CrowdSec decisions
docker exec charon cscli decisions list
# Review WAF activity
docker exec charon cat /var/log/coraza-waf.log | tail -50
```
### Manual Detection
**Regular Security Reviews:**
- [ ] Weekly review of Cerberus Dashboard
- [ ] Monthly review of access patterns
- [ ] Quarterly penetration testing
- [ ] Annual security audit
---
## Containment Procedures
### Immediate Actions (All Severity Levels)
1. **Document the incident start time**
2. **Preserve evidence** — Do NOT restart containers until logs are captured
3. **Assess scope** — Determine affected systems and data
### P1/P2 Containment
**Step 1: Isolate the Threat**
```bash
# Block attacking IP immediately
docker exec charon cscli decisions add --ip <ATTACKER_IP> --duration 720h --reason "Incident response"
# If compromise confirmed, stop the container
docker stop charon
# Preserve container state for forensics
docker commit charon charon-incident-$(date +%Y%m%d%H%M%S)
```
**Step 2: Preserve Evidence**
```bash
# Export all logs
docker logs charon > /tmp/incident-logs-$(date +%Y%m%d%H%M%S).txt 2>&1
# Export CrowdSec decisions
docker exec charon cscli decisions list -o json > /tmp/crowdsec-decisions.json
# Copy data directory
cp -r ./charon-data /tmp/incident-backup-$(date +%Y%m%d%H%M%S)
```
**Step 3: Notify Stakeholders**
- System administrators
- Security team (if applicable)
- Management (P1 only)
- Legal/compliance (if data breach)
### P3/P4 Containment
1. Block offending IPs via Cerberus Dashboard
2. Review and update access lists if needed
3. Document the event in incident log
4. Continue monitoring
---
## Recovery Steps
### Pre-Recovery Checklist
- [ ] Incident fully contained
- [ ] Evidence preserved
- [ ] Root cause identified (or investigation ongoing)
- [ ] Clean backups available
### Recovery Procedure
**Step 1: Verify Backup Integrity**
```bash
# List available backups
ls -la ./charon-data/backups/
# Verify backup can be read
docker run --rm -v ./charon-data/backups:/backups debian:bookworm-slim ls -la /backups
```
**Step 2: Restore from Clean State**
```bash
# Stop compromised instance
docker stop charon
# Rename compromised data
mv ./charon-data ./charon-data-compromised-$(date +%Y%m%d)
# Restore from backup
cp -r ./charon-data-backup-YYYYMMDD ./charon-data
# Start fresh instance
docker-compose up -d
```
**Step 3: Apply Security Hardening**
1. Review and update all access lists
2. Rotate any potentially compromised credentials
3. Update Charon to latest version
4. Enable additional security features if not already active
**Step 4: Verify Recovery**
```bash
# Check Charon is running
docker ps | grep charon
# Verify LAPI status
docker exec charon cscli lapi status
# Test proxy functionality
curl -I https://your-proxied-domain.com
```
### Communication During Recovery
- Update stakeholders every 30 minutes (P1) or hourly (P2)
- Document all recovery actions taken
- Prepare user communication if service was affected
---
## Post-Incident Review
### Review Meeting Agenda
Schedule within 48 hours of incident resolution.
**Attendees:** All involved responders, system owners, management (P1/P2)
**Agenda:**
1. Incident timeline reconstruction
2. What worked well?
3. What could be improved?
4. Action items and owners
5. Documentation updates needed
### Post-Incident Checklist
- [ ] Incident fully documented
- [ ] Timeline created with all actions taken
- [ ] Root cause analysis completed
- [ ] Lessons learned documented
- [ ] Security controls reviewed and updated
- [ ] Monitoring/alerting improved
- [ ] Team training needs identified
- [ ] Documentation updated
### Incident Report Template
```markdown
## Incident Report: [INCIDENT-YYYY-MM-DD-###]
**Severity:** P1/P2/P3/P4
**Status:** Resolved / Under Investigation
**Duration:** [Start Time] to [End Time]
### Summary
[Brief description of what happened]
### Timeline
- [HH:MM] - Event detected
- [HH:MM] - Containment initiated
- [HH:MM] - Root cause identified
- [HH:MM] - Recovery completed
### Impact
- Systems affected: [List]
- Data affected: [Yes/No, details]
- Users affected: [Count/scope]
- Service downtime: [Duration]
### Root Cause
[Technical explanation of what caused the incident]
### Actions Taken
1. [Action 1]
2. [Action 2]
3. [Action 3]
### Lessons Learned
- [Learning 1]
- [Learning 2]
### Follow-up Actions
| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
| [Action] | [Name] | [Date] | Open |
```
---
## Communication Templates
### Internal Notification (P1/P2)
```
SECURITY INCIDENT ALERT
Severity: [P1/P2]
Time Detected: [YYYY-MM-DD HH:MM UTC]
Status: [Active / Contained / Resolved]
Summary:
[Brief description]
Current Actions:
- [Action being taken]
Next Update: [Time]
Contact: [Incident Commander Name/Channel]
```
### User Communication (If Service Affected)
```
Service Notification
We are currently experiencing [brief issue description].
Status: [Investigating / Identified / Monitoring / Resolved]
Started: [Time]
Expected Resolution: [Time or "Under investigation"]
We apologize for any inconvenience and will provide updates as available.
Last Updated: [Time]
```
### Post-Incident Summary (External)
```
Security Incident Summary
On [Date], we identified and responded to a security incident affecting [scope].
What happened:
[Non-technical summary]
What we did:
[Response actions taken]
What we're doing to prevent this:
[Improvements being made]
Was my data affected?
[Clear statement about data impact]
Questions?
[Contact information]
```
---
## Emergency Contacts
Maintain an up-to-date contact list:
| Role | Contact Method | Escalation Time |
|------|----------------|-----------------|
| Primary On-Call | [Phone/Pager] | Immediate |
| Security Team | [Email/Slack] | < 15 min (P1/P2) |
| System Administrator | [Phone] | < 1 hour |
| Management | [Phone] | P1 only |
---
## Quick Reference Card
### P1 Critical — Immediate Response
1. ⏱️ Start timer, document everything
2. 🔒 Isolate: `docker stop charon`
3. 📋 Preserve: `docker logs charon > incident.log`
4. 📞 Notify: Security team, management
5. 🔍 Investigate: Determine scope and root cause
6. 🔧 Recover: Restore from clean backup
7. 📝 Review: Post-incident meeting within 48h
### P2 High — Urgent Response
1. 🔒 Block attacker: `cscli decisions add --ip <IP>`
2. 📋 Capture logs before they rotate
3. 📞 Notify: Security team
4. 🔍 Investigate root cause
5. 🔧 Apply fixes
6. 📝 Document and review
### Key Commands
```bash
# Block IP immediately
docker exec charon cscli decisions add --ip <IP> --duration 720h
# List all active blocks
docker exec charon cscli decisions list
# Export logs
docker logs charon > incident-$(date +%s).log 2>&1
# Check security status
docker exec charon cscli lapi status
```
---
## Document Maintenance
| Version | Date | Author | Changes |
|---------|------|--------|---------|
| 1.0 | 2025-12-21 | Security Team | Initial SIRP creation |
**Review Schedule:** Quarterly or after any P1/P2 incident
**Owner:** Security Team
**Last Reviewed:** 2025-12-21
```