feat: add additional security enhancements (Issue #365)

- Add constant-time token comparison utility (crypto/subtle) - Add SBOM generation and attestation to CI/CD pipeline - Document TLS enforcement, DNS security (DoH/DoT), and container hardening - Create Security Incident Response Plan (SIRP) - Add security update notification documentation Security enhancements: - Mitigates timing attacks on invite token validation - Provides supply chain transparency with CycloneDX SBOM - Documents production container hardening (read_only, cap_drop) Closes #365
2025-12-21 19:00:29 +00:00
parent 84a8c1ff11
commit 2dfe7ee241
12 changed files with 1046 additions and 290 deletions
--- a/docs/security-incident-response.md
+++ b/docs/security-incident-response.md
@@ -0,0 +1,400 @@
+```markdown
+---
+title: Security Incident Response Plan
+description: Industry-standard incident response procedures for Charon deployments, including detection, containment, recovery, and post-incident review.
+---
+
+## Security Incident Response Plan (SIRP)
+
+This document provides a structured approach to handling security incidents in Charon deployments. Following these procedures ensures consistent, effective responses that minimize damage and recovery time.
+
+---
+
+## Incident Classification
+
+### Severity Levels
+
+| Level | Name | Description | Response Time | Examples |
+|-------|------|-------------|---------------|----------|
+| **P1** | Critical | Active exploitation, data breach, or complete service compromise | Immediate (< 15 min) | Confirmed data exfiltration, ransomware, root access compromise |
+| **P2** | High | Attempted exploitation, security control bypass, or significant vulnerability | < 1 hour | WAF bypass detected, brute-force attack in progress, credential stuffing |
+| **P3** | Medium | Suspicious activity, minor vulnerability, or policy violation | < 4 hours | Unusual traffic patterns, failed authentication spike, misconfiguration |
+| **P4** | Low | Informational security events, minor policy deviations | < 24 hours | Routine blocked requests, scanner traffic, expired certificates |
+
+### Classification Criteria
+
+**Escalate to P1 immediately if:**
+
+- ❌ Confirmed unauthorized access to sensitive data
+- ❌ Active malware or ransomware detected
+- ❌ Complete loss of security controls
+- ❌ Evidence of data exfiltration
+- ❌ Critical infrastructure compromise
+
+**Escalate to P2 if:**
+
+- ⚠️ Multiple failed bypass attempts from same source
+- ⚠️ Vulnerability actively being probed
+- ⚠️ Partial security control failure
+- ⚠️ Credential compromise suspected
+
+---
+
+## Detection Methods
+
+### Automated Detection
+
+**Cerberus Security Dashboard:**
+
+1. Navigate to **Cerberus → Dashboard**
+2. Monitor the **Live Activity** section for real-time events
+3. Review **Security → Decisions** for blocked requests
+4. Check alert notifications (Discord, Slack, email)
+
+**Key Indicators to Monitor:**
+
+- Sudden spike in blocked requests
+- Multiple blocks from same IP/network
+- WAF rules triggering on unusual patterns
+- CrowdSec decisions for known threat actors
+- Rate limiting thresholds exceeded
+
+**Log Analysis:**
+
+```bash
+# View recent security events
+docker logs charon | grep -E "(BLOCK|DENY|ERROR)" | tail -100
+
+# Check CrowdSec decisions
+docker exec charon cscli decisions list
+
+# Review WAF activity
+docker exec charon cat /var/log/coraza-waf.log | tail -50
+```
+
+### Manual Detection
+
+**Regular Security Reviews:**
+
+- [ ] Weekly review of Cerberus Dashboard
+- [ ] Monthly review of access patterns
+- [ ] Quarterly penetration testing
+- [ ] Annual security audit
+
+---
+
+## Containment Procedures
+
+### Immediate Actions (All Severity Levels)
+
+1. **Document the incident start time**
+2. **Preserve evidence** — Do NOT restart containers until logs are captured
+3. **Assess scope** — Determine affected systems and data
+
+### P1/P2 Containment
+
+**Step 1: Isolate the Threat**
+
+```bash
+# Block attacking IP immediately
+docker exec charon cscli decisions add --ip <ATTACKER_IP> --duration 720h --reason "Incident response"
+
+# If compromise confirmed, stop the container
+docker stop charon
+
+# Preserve container state for forensics
+docker commit charon charon-incident-$(date +%Y%m%d%H%M%S)
+```
+
+**Step 2: Preserve Evidence**
+
+```bash
+# Export all logs
+docker logs charon > /tmp/incident-logs-$(date +%Y%m%d%H%M%S).txt 2>&1
+
+# Export CrowdSec decisions
+docker exec charon cscli decisions list -o json > /tmp/crowdsec-decisions.json
+
+# Copy data directory
+cp -r ./charon-data /tmp/incident-backup-$(date +%Y%m%d%H%M%S)
+```
+
+**Step 3: Notify Stakeholders**
+
+- System administrators
+- Security team (if applicable)
+- Management (P1 only)
+- Legal/compliance (if data breach)
+
+### P3/P4 Containment
+
+1. Block offending IPs via Cerberus Dashboard
+2. Review and update access lists if needed
+3. Document the event in incident log
+4. Continue monitoring
+
+---
+
+## Recovery Steps
+
+### Pre-Recovery Checklist
+
+- [ ] Incident fully contained
+- [ ] Evidence preserved
+- [ ] Root cause identified (or investigation ongoing)
+- [ ] Clean backups available
+
+### Recovery Procedure
+
+**Step 1: Verify Backup Integrity**
+
+```bash
+# List available backups
+ls -la ./charon-data/backups/
+
+# Verify backup can be read
+docker run --rm -v ./charon-data/backups:/backups alpine ls -la /backups
+```
+
+**Step 2: Restore from Clean State**
+
+```bash
+# Stop compromised instance
+docker stop charon
+
+# Rename compromised data
+mv ./charon-data ./charon-data-compromised-$(date +%Y%m%d)
+
+# Restore from backup
+cp -r ./charon-data-backup-YYYYMMDD ./charon-data
+
+# Start fresh instance
+docker-compose up -d
+```
+
+**Step 3: Apply Security Hardening**
+
+1. Review and update all access lists
+2. Rotate any potentially compromised credentials
+3. Update Charon to latest version
+4. Enable additional security features if not already active
+
+**Step 4: Verify Recovery**
+
+```bash
+# Check Charon is running
+docker ps | grep charon
+
+# Verify LAPI status
+docker exec charon cscli lapi status
+
+# Test proxy functionality
+curl -I https://your-proxied-domain.com
+```
+
+### Communication During Recovery
+
+- Update stakeholders every 30 minutes (P1) or hourly (P2)
+- Document all recovery actions taken
+- Prepare user communication if service was affected
+
+---
+
+## Post-Incident Review
+
+### Review Meeting Agenda
+
+Schedule within 48 hours of incident resolution.
+
+**Attendees:** All involved responders, system owners, management (P1/P2)
+
+**Agenda:**
+
+1. Incident timeline reconstruction
+2. What worked well?
+3. What could be improved?
+4. Action items and owners
+5. Documentation updates needed
+
+### Post-Incident Checklist
+
+- [ ] Incident fully documented
+- [ ] Timeline created with all actions taken
+- [ ] Root cause analysis completed
+- [ ] Lessons learned documented
+- [ ] Security controls reviewed and updated
+- [ ] Monitoring/alerting improved
+- [ ] Team training needs identified
+- [ ] Documentation updated
+
+### Incident Report Template
+
+```markdown
+## Incident Report: [INCIDENT-YYYY-MM-DD-###]
+
+**Severity:** P1/P2/P3/P4
+**Status:** Resolved / Under Investigation
+**Duration:** [Start Time] to [End Time]
+
+### Summary
+[Brief description of what happened]
+
+### Timeline
+- [HH:MM] - Event detected
+- [HH:MM] - Containment initiated
+- [HH:MM] - Root cause identified
+- [HH:MM] - Recovery completed
+
+### Impact
+- Systems affected: [List]
+- Data affected: [Yes/No, details]
+- Users affected: [Count/scope]
+- Service downtime: [Duration]
+
+### Root Cause
+[Technical explanation of what caused the incident]
+
+### Actions Taken
+1. [Action 1]
+2. [Action 2]
+3. [Action 3]
+
+### Lessons Learned
+- [Learning 1]
+- [Learning 2]
+
+### Follow-up Actions
+| Action | Owner | Due Date | Status |
+|--------|-------|----------|--------|
+| [Action] | [Name] | [Date] | Open |
+```
+
+---
+
+## Communication Templates
+
+### Internal Notification (P1/P2)
+
+```
+SECURITY INCIDENT ALERT
+
+Severity: [P1/P2]
+Time Detected: [YYYY-MM-DD HH:MM UTC]
+Status: [Active / Contained / Resolved]
+
+Summary:
+[Brief description]
+
+Current Actions:
+- [Action being taken]
+
+Next Update: [Time]
+
+Contact: [Incident Commander Name/Channel]
+```
+
+### User Communication (If Service Affected)
+
+```
+Service Notification
+
+We are currently experiencing [brief issue description].
+
+Status: [Investigating / Identified / Monitoring / Resolved]
+Started: [Time]
+Expected Resolution: [Time or "Under investigation"]
+
+We apologize for any inconvenience and will provide updates as available.
+
+Last Updated: [Time]
+```
+
+### Post-Incident Summary (External)
+
+```
+Security Incident Summary
+
+On [Date], we identified and responded to a security incident affecting [scope].
+
+What happened:
+[Non-technical summary]
+
+What we did:
+[Response actions taken]
+
+What we're doing to prevent this:
+[Improvements being made]
+
+Was my data affected?
+[Clear statement about data impact]
+
+Questions?
+[Contact information]
+```
+
+---
+
+## Emergency Contacts
+
+Maintain an up-to-date contact list:
+
+| Role | Contact Method | Escalation Time |
+|------|----------------|-----------------|
+| Primary On-Call | [Phone/Pager] | Immediate |
+| Security Team | [Email/Slack] | < 15 min (P1/P2) |
+| System Administrator | [Phone] | < 1 hour |
+| Management | [Phone] | P1 only |
+
+---
+
+## Quick Reference Card
+
+### P1 Critical — Immediate Response
+
+1. ⏱️ Start timer, document everything
+2. 🔒 Isolate: `docker stop charon`
+3. 📋 Preserve: `docker logs charon > incident.log`
+4. 📞 Notify: Security team, management
+5. 🔍 Investigate: Determine scope and root cause
+6. 🔧 Recover: Restore from clean backup
+7. 📝 Review: Post-incident meeting within 48h
+
+### P2 High — Urgent Response
+
+1. 🔒 Block attacker: `cscli decisions add --ip <IP>`
+2. 📋 Capture logs before they rotate
+3. 📞 Notify: Security team
+4. 🔍 Investigate root cause
+5. 🔧 Apply fixes
+6. 📝 Document and review
+
+### Key Commands
+
+```bash
+# Block IP immediately
+docker exec charon cscli decisions add --ip <IP> --duration 720h
+
+# List all active blocks
+docker exec charon cscli decisions list
+
+# Export logs
+docker logs charon > incident-$(date +%s).log 2>&1
+
+# Check security status
+docker exec charon cscli lapi status
+```
+
+---
+
+## Document Maintenance
+
+| Version | Date | Author | Changes |
+|---------|------|--------|---------|
+| 1.0 | 2025-12-21 | Security Team | Initial SIRP creation |
+
+**Review Schedule:** Quarterly or after any P1/P2 incident
+
+**Owner:** Security Team
+
+**Last Reviewed:** 2025-12-21
+```