--- title: Security Incident Response Plan description: Industry-standard incident response procedures for Charon deployments, including detection, containment, recovery, and post-incident review. --- ## Security Incident Response Plan (SIRP) This document provides a structured approach to handling security incidents in Charon deployments. Following these procedures ensures consistent, effective responses that minimize damage and recovery time. --- ## Incident Classification ### Severity Levels | Level | Name | Description | Response Time | Examples | |-------|------|-------------|---------------|----------| | **P1** | Critical | Active exploitation, data breach, or complete service compromise | Immediate (< 15 min) | Confirmed data exfiltration, ransomware, root access compromise | | **P2** | High | Attempted exploitation, security control bypass, or significant vulnerability | < 1 hour | WAF bypass detected, brute-force attack in progress, credential stuffing | | **P3** | Medium | Suspicious activity, minor vulnerability, or policy violation | < 4 hours | Unusual traffic patterns, failed authentication spike, misconfiguration | | **P4** | Low | Informational security events, minor policy deviations | < 24 hours | Routine blocked requests, scanner traffic, expired certificates | ### Classification Criteria **Escalate to P1 immediately if:** - ❌ Confirmed unauthorized access to sensitive data - ❌ Active malware or ransomware detected - ❌ Complete loss of security controls - ❌ Evidence of data exfiltration - ❌ Critical infrastructure compromise **Escalate to P2 if:** - ⚠️ Multiple failed bypass attempts from same source - ⚠️ Vulnerability actively being probed - ⚠️ Partial security control failure - ⚠️ Credential compromise suspected --- ## Detection Methods ### Automated Detection **Cerberus Security Dashboard:** 1. Navigate to **Cerberus → Dashboard** 2. Monitor the **Live Activity** section for real-time events 3. Review **Security → Decisions** for blocked requests 4. Check alert notifications (Discord, Slack, email) **Key Indicators to Monitor:** - Sudden spike in blocked requests - Multiple blocks from same IP/network - WAF rules triggering on unusual patterns - CrowdSec decisions for known threat actors - Rate limiting thresholds exceeded **Log Analysis:** ```bash # View recent security events docker logs charon | grep -E "(BLOCK|DENY|ERROR)" | tail -100 # Check CrowdSec decisions docker exec charon cscli decisions list # Review WAF activity docker exec charon cat /var/log/coraza-waf.log | tail -50 ``` ### Manual Detection **Regular Security Reviews:** - [ ] Weekly review of Cerberus Dashboard - [ ] Monthly review of access patterns - [ ] Quarterly penetration testing - [ ] Annual security audit --- ## Containment Procedures ### Immediate Actions (All Severity Levels) 1. **Document the incident start time** 2. **Preserve evidence** — Do NOT restart containers until logs are captured 3. **Assess scope** — Determine affected systems and data ### P1/P2 Containment **Step 1: Isolate the Threat** ```bash # Block attacking IP immediately docker exec charon cscli decisions add --ip --duration 720h --reason "Incident response" # If compromise confirmed, stop the container docker stop charon # Preserve container state for forensics docker commit charon charon-incident-$(date +%Y%m%d%H%M%S) ``` **Step 2: Preserve Evidence** ```bash # Export all logs docker logs charon > /tmp/incident-logs-$(date +%Y%m%d%H%M%S).txt 2>&1 # Export CrowdSec decisions docker exec charon cscli decisions list -o json > /tmp/crowdsec-decisions.json # Copy data directory cp -r ./charon-data /tmp/incident-backup-$(date +%Y%m%d%H%M%S) ``` **Step 3: Notify Stakeholders** - System administrators - Security team (if applicable) - Management (P1 only) - Legal/compliance (if data breach) ### P3/P4 Containment 1. Block offending IPs via Cerberus Dashboard 2. Review and update access lists if needed 3. Document the event in incident log 4. Continue monitoring --- ## Recovery Steps ### Pre-Recovery Checklist - [ ] Incident fully contained - [ ] Evidence preserved - [ ] Root cause identified (or investigation ongoing) - [ ] Clean backups available ### Recovery Procedure **Step 1: Verify Backup Integrity** ```bash # List available backups ls -la ./charon-data/backups/ # Verify backup can be read docker run --rm -v ./charon-data/backups:/backups debian:bookworm-slim ls -la /backups ``` **Step 2: Restore from Clean State** ```bash # Stop compromised instance docker stop charon # Rename compromised data mv ./charon-data ./charon-data-compromised-$(date +%Y%m%d) # Restore from backup cp -r ./charon-data-backup-YYYYMMDD ./charon-data # Start fresh instance docker-compose up -d ``` **Step 3: Apply Security Hardening** 1. Review and update all access lists 2. Rotate any potentially compromised credentials 3. Update Charon to latest version 4. Enable additional security features if not already active **Step 4: Verify Recovery** ```bash # Check Charon is running docker ps | grep charon # Verify LAPI status docker exec charon cscli lapi status # Test proxy functionality curl -I https://your-proxied-domain.com ``` ### Communication During Recovery - Update stakeholders every 30 minutes (P1) or hourly (P2) - Document all recovery actions taken - Prepare user communication if service was affected --- ## Post-Incident Review ### Review Meeting Agenda Schedule within 48 hours of incident resolution. **Attendees:** All involved responders, system owners, management (P1/P2) **Agenda:** 1. Incident timeline reconstruction 2. What worked well? 3. What could be improved? 4. Action items and owners 5. Documentation updates needed ### Post-Incident Checklist - [ ] Incident fully documented - [ ] Timeline created with all actions taken - [ ] Root cause analysis completed - [ ] Lessons learned documented - [ ] Security controls reviewed and updated - [ ] Monitoring/alerting improved - [ ] Team training needs identified - [ ] Documentation updated ### Incident Report Template ```markdown ## Incident Report: [INCIDENT-YYYY-MM-DD-###] **Severity:** P1/P2/P3/P4 **Status:** Resolved / Under Investigation **Duration:** [Start Time] to [End Time] ### Summary [Brief description of what happened] ### Timeline - [HH:MM] - Event detected - [HH:MM] - Containment initiated - [HH:MM] - Root cause identified - [HH:MM] - Recovery completed ### Impact - Systems affected: [List] - Data affected: [Yes/No, details] - Users affected: [Count/scope] - Service downtime: [Duration] ### Root Cause [Technical explanation of what caused the incident] ### Actions Taken 1. [Action 1] 2. [Action 2] 3. [Action 3] ### Lessons Learned - [Learning 1] - [Learning 2] ### Follow-up Actions | Action | Owner | Due Date | Status | |--------|-------|----------|--------| | [Action] | [Name] | [Date] | Open | ``` --- ## Communication Templates ### Internal Notification (P1/P2) ``` SECURITY INCIDENT ALERT Severity: [P1/P2] Time Detected: [YYYY-MM-DD HH:MM UTC] Status: [Active / Contained / Resolved] Summary: [Brief description] Current Actions: - [Action being taken] Next Update: [Time] Contact: [Incident Commander Name/Channel] ``` ### User Communication (If Service Affected) ``` Service Notification We are currently experiencing [brief issue description]. Status: [Investigating / Identified / Monitoring / Resolved] Started: [Time] Expected Resolution: [Time or "Under investigation"] We apologize for any inconvenience and will provide updates as available. Last Updated: [Time] ``` ### Post-Incident Summary (External) ``` Security Incident Summary On [Date], we identified and responded to a security incident affecting [scope]. What happened: [Non-technical summary] What we did: [Response actions taken] What we're doing to prevent this: [Improvements being made] Was my data affected? [Clear statement about data impact] Questions? [Contact information] ``` --- ## Emergency Contacts Maintain an up-to-date contact list: | Role | Contact Method | Escalation Time | |------|----------------|-----------------| | Primary On-Call | [Phone/Pager] | Immediate | | Security Team | [Email/Slack] | < 15 min (P1/P2) | | System Administrator | [Phone] | < 1 hour | | Management | [Phone] | P1 only | --- ## Quick Reference Card ### P1 Critical — Immediate Response 1. ⏱️ Start timer, document everything 2. 🔒 Isolate: `docker stop charon` 3. 📋 Preserve: `docker logs charon > incident.log` 4. 📞 Notify: Security team, management 5. 🔍 Investigate: Determine scope and root cause 6. 🔧 Recover: Restore from clean backup 7. 📝 Review: Post-incident meeting within 48h ### P2 High — Urgent Response 1. 🔒 Block attacker: `cscli decisions add --ip ` 2. 📋 Capture logs before they rotate 3. 📞 Notify: Security team 4. 🔍 Investigate root cause 5. 🔧 Apply fixes 6. 📝 Document and review ### Key Commands ```bash # Block IP immediately docker exec charon cscli decisions add --ip --duration 720h # List all active blocks docker exec charon cscli decisions list # Export logs docker logs charon > incident-$(date +%s).log 2>&1 # Check security status docker exec charon cscli lapi status ``` --- ## Document Maintenance | Version | Date | Author | Changes | |---------|------|--------|---------| | 1.0 | 2025-12-21 | Security Team | Initial SIRP creation | **Review Schedule:** Quarterly or after any P1/P2 incident **Owner:** Security Team **Last Reviewed:** 2025-12-21 ```