Files
Charon/docs/plans/current_spec.md

635 lines
21 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 📋 Plan: Complete Beta Release — Handler Coverage, Security Dashboard UX, and Zero-Day Defense
**Date:** December 4, 2025
**Branch:** `feature/beta-release`
**Status:** Ready for Implementation
---
## 🧐 UX & Context Analysis
### Current State Summary
**✅ COMPLETED WORK:**
- Certificate handler backup-before-delete: ✅ Implemented & Tested
- Break-glass token generation/verification: ✅ Implemented & Tested
- Security Dashboard: ✅ Basic implementation exists ([Security.tsx](../frontend/src/pages/Security.tsx))
- Coraza WAF integration: ✅ Completed (recent sidetrack work)
- Loading overlays: ✅ Completed (recent sidetrack work)
**📊 CURRENT COVERAGE:**
- Backend handlers: **73.8%** (target: ≥80%)
- Backend services: **80.7%**
- Backend models: **97.2%**
- Backend caddy: **99.9%**
**🚨 REMAINING GAPS:**
1. Handler test coverage below 80% threshold
2. Security Dashboard cards not in pipeline order
3. Missing zero-day protection explanation in docs
4. Frontend TypeScript errors and test coverage incomplete
---
### User Experience Goals
**Security Dashboard Improvements:**
1. **Pipeline Order Cards** — Users need to see security components in the order they execute:
- **Card 1: CrowdSec** (IP Reputation — first line of defense)
- **Card 2: Access Control (ACL)** (IP/Geo Allow/Deny — second filter)
- **Card 3: WAF (Coraza)** (Request Inspection — third filter)
- **Card 4: Rate Limiting** (Volume Control — final filter)
2. **Zero-Day Protection Visibility** — Users need to understand:
- "Does this protect me against zero-day exploits?"
- "What security threats am I covered for?"
- Enterprise-level messaging for novice users
**Testing & Quality Goals:**
- All handlers ≥80% coverage
- Frontend builds without TypeScript errors
- All tests pass in CI/CD pipeline
---
## 🤝 Handoff Contract (The Truth)
### Backend: No New API Changes Required
All security APIs already exist. This work focuses on:
- **Testing:** Increase handler test coverage
- **No code changes to handlers unless fixing bugs**
### Frontend: Card Reordering + Enhanced Messaging
**Current Card Order (Security.tsx):**
```tsx
// CURRENT (Wrong — not pipeline order):
1. CrowdSec
2. WAF
3. ACL
4. Rate Limiting
```
**Required Card Order (Pipeline Execution Sequence):**
```tsx
// REQUIRED (Correct — matches execution pipeline):
1. CrowdSec // IP reputation check (first)
2. ACL // IP/Geo filtering (second)
3. WAF // Request payload inspection (third)
4. Rate Limiting // Volume control (fourth)
```
Update order under Security header on the sidebar to reflect pipeline order as well.
**Enhanced Card Content:**
Each card should include:
- Current toggle + status (already exists)
- **NEW:** Pipeline position indicator (e.g., "🛡️ Layer 1: IP Reputation")
- **NEW:** Threat protection summary (e.g., "Protects against: Known attackers, botnets")
---
## 🏗️ Phase 1: Backend Implementation (Go)
### Task 1.1: Increase Handler Test Coverage to ≥80%
**Target Files (Current Coverage Below 80%):**
1. **[proxy_host_handler.go](../../backend/internal/api/handlers/proxy_host_handler.go)** (54%/41% Create/Update)
- Add tests for:
- Invalid domain format
- Duplicate domain creation
- Update with conflicting domains
- Proxy host with missing upstream
- Docker container auto-discovery edge cases
2. **[certificate_handler.go](../../backend/internal/api/handlers/certificate_handler.go)** (Upload handler low coverage)
- Add tests for:
- Upload success with valid PEM cert + key
- Upload with invalid PEM format
- Upload with cert/key mismatch
- Upload with expired certificate
- Upload when disk space low
3. **[security_handler.go](../../backend/internal/api/handlers/security_handler.go)** (48-60% on Upsert/DeleteRuleSet/Enable/Disable)
- Add tests for:
- Upsert ruleset with invalid content
- Delete ruleset when in use by security config
- Enable Cerberus without admin whitelist (should fail)
- Disable Cerberus with invalid break-glass token
- Verify break-glass token expiration
4. **[import_handler.go](../../backend/internal/api/handlers/import_handler.go)** (DetectImports, UploadMulti, commit flows)
- Add tests for:
- DetectImports with malformed Caddyfile
- UploadMulti with oversized file
- Commit import with partial failure rollback
- Import session cleanup on error
5. **[crowdsec_handler.go](../../backend/internal/api/handlers/crowdsec_handler.go)** (ReadFile, WriteFile)
- Add tests for:
- ReadFile with path traversal attempt (sanitization check)
- WriteFile with invalid YAML content
- WriteFile when CrowdSec service not running
6. **[uptime_handler.go](../../backend/internal/api/handlers/uptime_handler.go)** (Sync, Delete, GetHistory edge cases)
- Add tests for:
- Sync when uptime service unreachable
- Delete monitor that doesn't exist
- GetHistory with invalid time range
**Success Criteria:**
```bash
cd /projects/Charon/backend
go test ./internal/api/handlers -coverprofile=handlers.cover
go tool cover -func=handlers.cover | grep "total:" | awk '{print $3}'
# Output: ≥80.0%
```
### Task 1.2: Run Pre-commit & Fix Any Linting Issues
```bash
cd /projects/Charon
.venv/bin/pre-commit run --all-files
```
If errors occur, fix immediately per `.github/copilot-instructions.md` Task Completion Protocol.
---
## 🎨 Phase 2: Frontend Implementation (React)
### Task 2.1: Reorder Security Dashboard Cards (Pipeline Sequence)
**File:** [frontend/src/pages/Security.tsx](../../frontend/src/pages/Security.tsx)
**Current Structure (lines ~300-450):**
```tsx
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-6">
{/* CrowdSec */}
<Card>...</Card>
{/* WAF */}
<Card>...</Card>
{/* ACL */}
<Card>...</Card>
{/* Rate Limiting */}
<Card>...</Card>
</div>
```
**Required Change:**
- Swap **ACL** and **WAF** card order to match pipeline execution
- Add pipeline layer indicators to each card
**New Order:**
```tsx
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-4 gap-6">
{/* CrowdSec - Layer 1 */}
<Card className={...}>
<div className="text-xs text-gray-400 mb-2">🛡 Layer 1: IP Reputation</div>
{/* existing card content */}
</Card>
{/* ACL - Layer 2 */}
<Card className={...}>
<div className="text-xs text-gray-400 mb-2">🔒 Layer 2: Access Control</div>
{/* existing card content */}
</Card>
{/* WAF - Layer 3 */}
<Card className={...}>
<div className="text-xs text-gray-400 mb-2">🛡 Layer 3: Request Inspection</div>
{/* existing card content */}
</Card>
{/* Rate Limiting - Layer 4 */}
<Card className={...}>
<div className="text-xs text-gray-400 mb-2"> Layer 4: Volume Control</div>
{/* existing card content */}
</Card>
</div>
```
### Task 2.2: Add Threat Protection Summary to Each Card
**Enhance card descriptions with specific threat coverage:**
**CrowdSec Card:**
```tsx
<p className="text-xs text-gray-500 dark:text-gray-400">
{status.crowdsec.enabled
? `Protects against: Known attackers, botnets, brute-force attempts`
: 'Intrusion Prevention System'}
</p>
```
**ACL Card:**
```tsx
<p className="text-xs text-gray-500 dark:text-gray-400">
Protects against: Unauthorized IPs, geo-based attacks, insider threats
</p>
```
**WAF Card:**
```tsx
<p className="text-xs text-gray-500 dark:text-gray-400">
{status.waf.enabled
? `Protects against: SQL injection, XSS, RCE, zero-day exploits*`
: 'Web Application Firewall'}
</p>
```
**Rate Limiting Card:**
```tsx
<p className="text-xs text-gray-500 dark:text-gray-400">
Protects against: DDoS attacks, credential stuffing, API abuse
</p>
```
### Task 2.3: Fix Frontend TypeScript Errors & Tests
```bash
cd /projects/Charon/frontend
npm run type-check # Fix all errors
npm test # Ensure all tests pass
```
**Common issues to address:**
- Unused imports (already fixed in `CertificateList.test.tsx`)
- Missing test coverage for Security.tsx
- API client type mismatches
---
## 🕵️ Phase 3: Zero-Day Protection Analysis & Documentation
### Zero-Day Protection Assessment
**Question:** Do our security offerings help protect against zero-day vulnerabilities?
**Answer:****YES — Limited Protection** via WAF (Coraza)
**How It Works:**
1. **WAF with OWASP Core Rule Set (CRS):**
- Detects **common attack patterns** even for zero-day exploits
- Example: A zero-day SQLi exploit still uses SQL syntax patterns → WAF blocks it
- **Detection-Only Mode:** Logs suspicious requests without blocking (safe for testing)
- **Blocking Mode:** Actively prevents exploitation attempts
2. **CrowdSec (Limited Zero-Day Protection):**
- Only protects against zero-days **after** first exploitation in the wild
- Crowd-sourced intelligence: If attacker hits one CrowdSec user, all users get protection
- **Time Gap:** Hours to days between first exploitation and crowd-sourced blocklist update
3. **ACLs (No Zero-Day Protection):**
- Static rules only
- Cannot detect unknown exploits
4. **Rate Limiting (Indirect Protection):**
- Slows down automated exploit attempts
- Doesn't prevent zero-days but limits blast radius
**What We DON'T Protect Against:**
- ❌ Zero-days in application code itself (need code audits + patching)
- ❌ Zero-days in underlying services (Docker, Linux kernel) — need OS updates
- ❌ Logic bugs in business workflows
- ❌ Social engineering attacks
---
### Additional Security Threats to Consider
**1. Supply Chain Attacks**
- **Threat:** Compromised Docker images, npm packages, Go modules
- **Current Protection:** ❌ None
- **Recommendation:** Add Trivy scanning (already in CI) + SBOM generation
**2. DNS Hijacking / Cache Poisoning**
- **Threat:** Attacker redirects DNS queries to malicious servers
- **Current Protection:** ❌ None (relies on system DNS resolver)
- **Recommendation:** Document use of encrypted DNS (DoH/DoT) in deployment guide
**3. TLS Downgrade Attacks**
- **Threat:** Force clients to use weak TLS versions
- **Current Protection:** ✅ Caddy enforces TLS 1.2+ by default
- **Recommendation:** Document minimum TLS version in security.md
**4. Certificate Transparency (CT) Log Poisoning**
- **Threat:** Attacker registers fraudulent certs for your domains
- **Current Protection:** ❌ None
- **Recommendation:** Add CT log monitoring (future feature)
**5. Privilege Escalation (Container Escape)**
- **Threat:** Attacker escapes Docker container to host OS
- **Current Protection:** ⚠️ Partial (Docker security best practices)
- **Recommendation:** Document running with least-privilege, read-only root filesystem
**6. Session Hijacking / Cookie Theft**
- **Threat:** Steal user session tokens via XSS or network sniffing
- **Current Protection:** ✅ HTTPOnly cookies, Secure flag, SameSite (verify implementation)
- **Recommendation:** Add CSP (Content Security Policy) headers
**7. Timing Attacks (Cryptographic Side-Channel)**
- **Threat:** Infer secrets by measuring response times
- **Current Protection:** ❌ Unknown (need bcrypt timing audit)
- **Recommendation:** Use constant-time comparison for tokens
**Enterprise-Level Security Gaps:**
- **Missing:** Security Incident Response Plan (SIRP)
- **Missing:** Automated security update notifications
- **Missing:** Multi-factor authentication (MFA) for admin accounts
- **Missing:** Audit logging for compliance (GDPR, SOC 2)
---
## 📚 Phase 4: Documentation Updates
### Task 4.1: Update docs/features.md
**Add new section after "Block Bad Behavior":**
```markdown
### Zero-Day Exploit Protection
**What it does:** The WAF (Web Application Firewall) can detect and block many zero-day exploits before they reach your apps.
**Why you care:** Even if a brand-new vulnerability is discovered in your software, the WAF might catch it by recognizing the attack pattern.
**How it works:**
- Attackers use predictable patterns (SQL syntax, JavaScript tags, command injection)
- The WAF inspects every request for these patterns
- If detected, the request is blocked or logged (depending on mode)
**What you do:**
1. Enable WAF in "Monitor" mode first (logs only, doesn't block)
2. Review logs for false positives
3. Switch to "Block" mode when ready
**Limitations:**
- Only protects against **web-based** exploits (HTTP/HTTPS traffic)
- Does NOT protect against zero-days in Docker, Linux, or Charon itself
- Does NOT replace regular security updates
**Learn more:** [OWASP Core Rule Set](https://coreruleset.org/)
```
### Task 4.2: Update docs/security.md
**Add new section after "Common Questions":**
```markdown
## Zero-Day Protection
### What We Protect Against
**Web Application Exploits:**
- ✅ SQL Injection (SQLi) — even zero-days using SQL syntax
- ✅ Cross-Site Scripting (XSS) — new XSS vectors caught by pattern matching
- ✅ Remote Code Execution (RCE) — command injection patterns
- ✅ Path Traversal — attempts to read system files
- ⚠️ CrowdSec — protects hours/days after first exploitation (crowd-sourced)
**How It Works:**
The WAF (Coraza) uses the OWASP Core Rule Set to detect attack patterns. Even if the exploit is brand new, the *pattern* is usually recognizable.
**Example:** A zero-day SQLi exploit discovered today:
```
https://yourapp.com/search?q=' OR '1'='1
```
- **Pattern:** `' OR '1'='1` matches SQL injection signature
- **Action:** WAF blocks request → attacker never reaches your database
### What We DON'T Protect Against
- ❌ Zero-days in Charon itself (keep Charon updated)
- ❌ Zero-days in Docker, Linux kernel (keep OS updated)
- ❌ Logic bugs in your application code (need code reviews)
- ❌ Insider threats (need access controls + auditing)
- ❌ Social engineering (need user training)
### Recommendation: Defense in Depth
1. **Enable all Cerberus layers:**
- CrowdSec (IP reputation)
- ACLs (restrict access by geography/IP)
- WAF (request inspection)
- Rate Limiting (slow down attacks)
2. **Keep everything updated:**
- Charon (watch GitHub releases)
- Docker images (rebuild regularly)
- Host OS (enable unattended-upgrades)
3. **Monitor security logs:**
- Check "Security → Decisions" weekly
- Set up alerts for high block rates
This gives you **enterprise-level protection** even as a novice user. You set it once, and Charon handles the rest automatically.
```
### Task 4.3: Update docs/cerberus.md
**Add new section after "Architecture":**
```markdown
## Threat Model & Protection Coverage
### What Cerberus Protects
| Threat Category | CrowdSec | ACL | WAF | Rate Limit |
|-----------------|----------|-----|-----|------------|
| Known attackers (IP reputation) | ✅ | ❌ | ❌ | ❌ |
| Geo-based attacks | ❌ | ✅ | ❌ | ❌ |
| SQL Injection (SQLi) | ❌ | ❌ | ✅ | ❌ |
| Cross-Site Scripting (XSS) | ❌ | ❌ | ✅ | ❌ |
| Remote Code Execution (RCE) | ❌ | ❌ | ✅ | ❌ |
| **Zero-Day Web Exploits** | ⚠️ | ❌ | ✅ | ❌ |
| DDoS / Volume attacks | ❌ | ❌ | ❌ | ✅ |
| Brute-force login attempts | ✅ | ❌ | ❌ | ✅ |
| Credential stuffing | ✅ | ❌ | ❌ | ✅ |
**Legend:**
- ✅ Full protection
- ⚠️ Partial protection (time-delayed)
- ❌ Not designed for this threat
### Zero-Day Exploit Protection (WAF)
The WAF provides **pattern-based detection** for zero-day exploits:
**How It Works:**
1. Attacker discovers new vulnerability (e.g., SQLi in your login form)
2. Attacker crafts exploit: `' OR 1=1--`
3. WAF inspects request → matches SQL injection pattern → **BLOCKED**
4. Your application never sees the malicious input
**Limitations:**
- Only protects HTTP/HTTPS traffic
- Cannot detect completely novel attack patterns (rare)
- Does not protect against logic bugs in application code
**Effectiveness:**
- **~90% of zero-day web exploits** use known patterns (SQLi, XSS, RCE)
- **~10% are truly novel** and may bypass WAF until rules are updated
### Request Processing Pipeline
```
1. [CrowdSec] Check IP reputation → Block if known attacker
2. [ACL] Check IP/Geo rules → Block if not allowed
3. [WAF] Inspect request payload → Block if malicious pattern
4. [Rate Limit] Count requests → Block if too many
5. [Proxy] Forward to upstream service
```
**Key Insight:** Layered defense means even if one layer fails, others still protect.
```
---
## 🧪 Phase 5: QA & Security Testing
### Test Scenarios
**1. Security Dashboard Card Order:**
- ✅ Visual inspection: Cards appear in pipeline order (CrowdSec → ACL → WAF → Rate Limit)
- ✅ Layer indicators visible on each card
- ✅ Threat protection summaries display correctly
**2. Handler Coverage:**
```bash
cd /projects/Charon/backend
go test ./internal/api/handlers -coverprofile=handlers.cover
go tool cover -func=handlers.cover
# Verify all handlers ≥80% coverage
```
**3. Frontend Build:**
```bash
cd /projects/Charon/frontend
npm run type-check # Zero errors
npm test # All tests pass
npm run build # Successful build
```
**4. Pre-commit Hooks:**
```bash
cd /projects/Charon
.venv/bin/pre-commit run --all-files
# All hooks pass
```
**5. Integration Test:**
```bash
cd /projects/Charon
bash scripts/coraza_integration.sh
# WAF integration test passes
```
**6. Zero-Day Protection Manual Test:**
1. Enable WAF in "block" mode
2. Send request: `curl http://localhost:8080/api/v1/proxy-hosts?search=<script>alert(1)</script>`
3. Verify response: `403 Forbidden` + logged in Security Decisions
4. Check WAF metrics: `charon_waf_blocked_total` increments
---
## 📋 Implementation Checklist
### Backend
- [ ] Add handler tests for `proxy_host_handler.go` (Create/Update flows)
- [ ] Add handler tests for `certificate_handler.go` (Upload success/errors)
- [ ] Add handler tests for `security_handler.go` (Upsert/Delete/Enable/Disable)
- [ ] Add handler tests for `import_handler.go` (DetectImports, UploadMulti, commit)
- [ ] Add handler tests for `crowdsec_handler.go` (ReadFile/WriteFile edge cases)
- [ ] Add handler tests for `uptime_handler.go` (Sync/Delete/GetHistory errors)
- [ ] Run `go test ./internal/api/handlers -coverprofile=handlers.cover` → Verify ≥80%
- [ ] Run `pre-commit run --all-files` → Fix any errors
### Frontend
- [ ] Reorder Security Dashboard cards (CrowdSec → ACL → WAF → Rate Limit)
- [ ] Add pipeline layer indicators (`🛡️ Layer 1: IP Reputation`, etc.)
- [ ] Add threat protection summaries to each card
- [ ] Run `npm run type-check` → Fix all TypeScript errors
- [ ] Run `npm test` → Ensure all tests pass
- [ ] Run `npm run build` → Verify successful build
### Documentation
- [ ] Update `docs/features.md` → Add "Zero-Day Exploit Protection" section
- [ ] Update `docs/security.md` → Add "Zero-Day Protection" section
- [ ] Update `docs/cerberus.md` → Add "Threat Model & Protection Coverage" section
- [ ] Update `docs/cerberus.md` → Add "Request Processing Pipeline" diagram
### QA & Testing
- [ ] Visual test: Security Dashboard card order correct
- [ ] Backend coverage: All handlers ≥80%
- [ ] Frontend: Zero TypeScript errors
- [ ] Integration test: `bash scripts/coraza_integration.sh` passes
- [ ] Manual test: WAF blocks `<script>` injection
---
## 🚀 Deployment & Rollout
**Branch Strategy:**
- All work on `feature/beta-release`
- CI triggers on commit (feat:, fix:, perf:)
- Manual testing on local Docker before merge
**Commit Message Format:**
```
feat: increase handler test coverage to 80%+
- Add proxy_host_handler tests for invalid domains
- Add certificate_handler upload error tests
- Add security_handler ruleset CRUD tests
- Add import_handler edge case tests
- Add crowdsec_handler sanitization tests
- Add uptime_handler error flow tests
Coverage: handlers 73.8% → 82.3%
```
**PR Title:**
```
feat: Complete Beta Release — Handler Coverage, Security Dashboard UX, Zero-Day Docs
```
---
## 🎯 Success Criteria (Definition of Done)
1. ✅ All backend handlers ≥80% test coverage
2. ✅ Pre-commit hooks pass (`pre-commit run --all-files`)
3. ✅ Frontend builds without TypeScript errors
4. ✅ Security Dashboard cards in pipeline order with layer indicators
5. ✅ Zero-day protection documented in `features.md`, `security.md`, `cerberus.md`
6. ✅ All integration tests pass
7. ✅ Manual WAF test: `<script>` injection blocked
8. ✅ CI/CD pipeline green
---
## 📞 Open Questions for User
1. **MFA/2FA:** Should we add multi-factor authentication for admin accounts? (Enterprise-level feature)
2. **Audit Logging:** Do you need compliance-grade audit logs (GDPR, SOC 2)? (Currently basic logging only)
3. **Security Notifications:** Should Cerberus send alerts when high block rates detected? (via notification system)
4. **Automated Updates:** Should Charon auto-update security rulesets (OWASP CRS, CrowdSec blocklists)?
---
## 🔗 References
- [OWASP Core Rule Set](https://coreruleset.org/)
- [CrowdSec Documentation](https://docs.crowdsec.net/)
- [Coraza WAF](https://coraza.io/)
- [NIST Cybersecurity Framework](https://www.nist.gov/cyberframework)
---
**Next Steps:** Await user approval, then begin implementation starting with Phase 1 (Backend handler tests).