Some checks are pending
Go Benchmark / Performance Regression Check (push) Waiting to run
Cerberus Integration / Cerberus Security Stack Integration (push) Waiting to run
Upload Coverage to Codecov / Backend Codecov Upload (push) Waiting to run
Upload Coverage to Codecov / Frontend Codecov Upload (push) Waiting to run
CodeQL - Analyze / CodeQL analysis (go) (push) Waiting to run
CodeQL - Analyze / CodeQL analysis (javascript-typescript) (push) Waiting to run
CrowdSec Integration / CrowdSec Bouncer Integration (push) Waiting to run
Docker Build, Publish & Test / build-and-push (push) Waiting to run
Docker Build, Publish & Test / Security Scan PR Image (push) Blocked by required conditions
Quality Checks / Auth Route Protection Contract (push) Waiting to run
Quality Checks / Codecov Trigger/Comment Parity Guard (push) Waiting to run
Quality Checks / Backend (Go) (push) Waiting to run
Quality Checks / Frontend (React) (push) Waiting to run
Rate Limit integration / Rate Limiting Integration (push) Waiting to run
Security Scan (PR) / Trivy Binary Scan (push) Waiting to run
Supply Chain Verification (PR) / Verify Supply Chain (push) Waiting to run
WAF integration / Coraza WAF Integration (push) Waiting to run
690 lines
27 KiB
Markdown
Executable File
690 lines
27 KiB
Markdown
Executable File
# Duplicate Proxy Host Bug Fix - Simplified Validator (SYSTEMIC ISSUE)
|
|
|
|
**Status**: ACTIVE - MINIMAL FIX APPROACH
|
|
**Priority**: CRITICAL 🔴🔴🔴 - ALL 18 ENABLED PROXY HOSTS DOWN
|
|
**Created**: 2026-01-28
|
|
**Updated**: 2026-01-28 (EXPANDED SCOPE - Systemic issue confirmed)
|
|
**Bug**: Caddy validator rejects emergency+main route pattern for EVERY proxy host (duplicate host with different path constraints)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**CRITICAL SYSTEMIC BUG**: Caddy's pre-flight validator rejects the emergency+main route pattern for **EVERY enabled proxy host**. The emergency route (with path matchers) and main route (without path matchers) share the same domain, causing "duplicate host matcher" error on ALL hosts.
|
|
|
|
**Impact**:
|
|
- 🔴🔴🔴 **ZERO routes loaded in Caddy** - ALL proxy hosts are down
|
|
- 🔴 **18 enabled proxy hosts** cannot be activated (not just Host ID 24)
|
|
- 🔴 Entire reverse proxy functionality is non-functional
|
|
- 🟡 Emergency bypass routes blocked for all hosts
|
|
- 🟡 Sequential failures: Host 24 → Host 22 → (pattern repeats for every host)
|
|
- 🟢 Backend health endpoint returns 200 OK (separate container health issue)
|
|
|
|
**Root Cause**: Validator treats ALL duplicate hosts as errors without considering that routes with different path constraints are valid. The emergency+main route pattern is applied to EVERY proxy host by design, causing systematic rejection.
|
|
|
|
**Minimal Fix**: Simplify validator to allow duplicate hosts when ONE has path matchers and ONE doesn't. **This will unblock ALL 18 enabled proxy hosts simultaneously**, restoring full reverse proxy functionality. Full overlap detection is future work.
|
|
|
|
**Database**: NO issues - DNS is already case-insensitive. No migration needed.
|
|
|
|
**Secondary Issues** (tracked but deferred):
|
|
- 🟡 Slow SQL queries (>200ms) on uptime_heartbeats and security_configs tables
|
|
- 🟡 Container health check fails despite 200 OK from health endpoint (may be timeout issue)
|
|
|
|
---
|
|
|
|
## Technical Analysis
|
|
|
|
### Current Route Structure
|
|
|
|
For each proxy host, `GenerateConfig` creates TWO routes with the SAME domain list:
|
|
|
|
1. **Emergency Route** (lines 571-584 in config.go):
|
|
```go
|
|
emergencyRoute := &Route{
|
|
Match: []Match{{
|
|
Host: uniqueDomains, // immaculaterr.hatfieldhosted.com
|
|
Path: emergencyPaths, // /api/v1/emergency/*
|
|
}},
|
|
Handle: emergencyHandlers,
|
|
Terminal: true,
|
|
}
|
|
```
|
|
|
|
2. **Main Route** (lines 586-598 in config.go):
|
|
```go
|
|
route := &Route{
|
|
Match: []Match{{
|
|
Host: uniqueDomains, // immaculaterr.hatfieldhosted.com (DUPLICATE!)
|
|
}},
|
|
Handle: mainHandlers,
|
|
Terminal: true,
|
|
}
|
|
```
|
|
|
|
### Why Validator Fails
|
|
|
|
```go
|
|
// validator.go lines 89-93
|
|
for _, host := range match.Host {
|
|
if seenHosts[host] {
|
|
return fmt.Errorf("duplicate host matcher: %s", host)
|
|
}
|
|
seenHosts[host] = true
|
|
}
|
|
```
|
|
|
|
The validator:
|
|
1. Processes emergency route: adds "immaculaterr.hatfieldhosted.com" to `seenHosts`
|
|
2. Processes main route: sees "immaculaterr.hatfieldhosted.com" again → ERROR
|
|
|
|
The validator does NOT consider:
|
|
- Path matchers that make routes non-overlapping
|
|
- Route ordering/priority (emergency route is checked first)
|
|
- Caddy's native ability to handle this correctly
|
|
|
|
### Why Caddy Handles This Correctly
|
|
|
|
Caddy processes routes in order:
|
|
1. First matches emergency route (host + path): `/api/v1/emergency/*` → bypass security
|
|
2. Falls through to main route (host only): everything else → apply security
|
|
|
|
This is a **valid and intentional design pattern** - the validator is wrong to reject it.
|
|
|
|
---
|
|
|
|
## Solution: Simplified Validator Fix ⭐ CHOSEN APPROACH
|
|
|
|
**Approach**: Minimal fix to allow emergency+main route pattern specifically.
|
|
|
|
**Implementation**:
|
|
- Track hosts seen with path matchers vs without path matchers separately
|
|
- Allow duplicate host if ONE has paths and ONE doesn't (the emergency+main pattern)
|
|
- Reject if both routes have paths OR both have no paths
|
|
|
|
**Pros**:
|
|
- ✅ Minimal change - unblocks ALL 18 proxy hosts simultaneously
|
|
- ✅ Preserves current route structure
|
|
- ✅ Simple logic - easy to understand and maintain
|
|
- ✅ Fixes the systemic design pattern bug affecting entire reverse proxy
|
|
|
|
**Limitations** (Future Work):
|
|
- ⚠️ Does not detect complex path overlaps (e.g., `/api/*` vs `/api/v1/*`)
|
|
- ⚠️ Full path pattern analysis deferred to future enhancement
|
|
- ⚠️ Assumes emergency+main pattern is primary use case
|
|
|
|
**Changes Required**:
|
|
- `backend/internal/caddy/validator.go`: Simplified duplicate detection (two maps: withPaths/withoutPaths)
|
|
- Tests for emergency+main pattern, route ordering, rollback
|
|
|
|
**Deferred**:
|
|
- Database migration (DNS already case-insensitive)
|
|
- Complex path overlap detection (future enhancement)
|
|
|
|
---
|
|
|
|
## Phase 1: Root Cause Verification - SYSTEMIC SCOPE
|
|
|
|
**Objective**: Confirm bug affects ALL enabled proxy hosts and document the systemic failure pattern.
|
|
|
|
**Tasks**:
|
|
|
|
1. **Verify Systemic Impact** ⭐ NEW:
|
|
- [ ] Query database for ALL enabled proxy hosts (should be 18)
|
|
- [ ] Verify Caddy has ZERO routes loaded (admin API check)
|
|
- [ ] Document sequential failure pattern (Host 24 disabled → Host 22 fails next)
|
|
- [ ] Confirm EVERY enabled host triggers same validator error
|
|
- [ ] Test hypothesis: Disable all hosts except one → still fails
|
|
|
|
2. **Reproduce Error on Multiple Hosts**:
|
|
- [ ] Test Host ID 24 (immaculaterr.hatfieldhosted.com) - original failure
|
|
- [ ] Test Host ID 22 (dockhand.hatfieldhosted.com) - second failure after disabling 24
|
|
- [ ] Test at least 3 additional hosts to confirm pattern
|
|
- [ ] Capture full error message from validator for each
|
|
- [ ] Document that error is identical across all hosts
|
|
|
|
3. **Analyze Generated Config for ALL Hosts**:
|
|
- [ ] Add debug logging to `GenerateConfig` before validation
|
|
- [ ] Log `uniqueDomains` list after deduplication for each host
|
|
- [ ] Log complete route structure before sending to validator
|
|
- [ ] Count how many routes contain each domain (should be 2: emergency + main)
|
|
- [ ] Verify emergency+main pattern exists for EVERY proxy host
|
|
|
|
4. **Trace Validation Flow**:
|
|
- [ ] Add debug logging to `validateRoute` function
|
|
- [ ] Log each host as it's added to `seenHosts` map
|
|
- [ ] Log route index and match conditions when duplicate detected
|
|
- [ ] Confirm emergency route (index 0) succeeds for all hosts
|
|
- [ ] Confirm main route (index 1) triggers duplicate error for all hosts
|
|
|
|
**Success Criteria**:
|
|
- ✅ Confirmed: ALL 18 enabled proxy hosts trigger the same error
|
|
- ✅ Confirmed: Caddy has ZERO routes loaded (admin API returns empty)
|
|
- ✅ Confirmed: Sequential failure pattern documented (disable one → next fails)
|
|
- ✅ Confirmed: Emergency+main route pattern exists for EVERY host
|
|
- ✅ Confirmed: Validator rejects at main route (index 1) for all hosts
|
|
- ✅ Confirmed: This is a design pattern bug, not a data issue
|
|
|
|
**Files**:
|
|
- `backend/internal/caddy/config.go` - Add debug logging
|
|
- `backend/internal/caddy/validator.go` - Add debug logging
|
|
- `backend/internal/services/proxyhost_service.go` - Trigger config generation
|
|
- `docs/reports/duplicate_proxy_host_diagnosis.md` - Document systemic findings
|
|
|
|
**Estimated Time**: 30 minutes (increased for systemic verification)
|
|
|
|
---
|
|
|
|
## Phase 2: Fix Validator (Simplified Path Detection)
|
|
|
|
**Objective**: MINIMAL fix to allow emergency+main route pattern (duplicate host where ONE has paths, ONE doesn't).
|
|
|
|
**Implementation Strategy**:
|
|
|
|
Simplify validator to handle the specific emergency+main pattern:
|
|
- Track hosts seen with paths vs without paths
|
|
- Allow duplicate hosts if ONE has path matchers, ONE doesn't
|
|
- This handles emergency route (has paths) + main route (no paths)
|
|
|
|
**Algorithm**:
|
|
|
|
```go
|
|
// Track hosts by whether they have path constraints
|
|
type hostTracking struct {
|
|
withPaths map[string]bool // hosts that have path matchers
|
|
withoutPaths map[string]bool // hosts without path matchers
|
|
}
|
|
|
|
for each route:
|
|
for each match in route.Match:
|
|
for each host:
|
|
hasPaths := len(match.Path) > 0
|
|
|
|
if hasPaths:
|
|
// Check if we've seen this host WITHOUT paths
|
|
if tracking.withoutPaths[host]:
|
|
continue // ALLOWED: emergency (with) + main (without)
|
|
}
|
|
if tracking.withPaths[host]:
|
|
return error("duplicate host with paths")
|
|
}
|
|
tracking.withPaths[host] = true
|
|
} else {
|
|
// Check if we've seen this host WITH paths
|
|
if tracking.withPaths[host]:
|
|
continue // ALLOWED: emergency (with) + main (without)
|
|
}
|
|
if tracking.withoutPaths[host]:
|
|
return error("duplicate host without paths")
|
|
}
|
|
tracking.withoutPaths[host] = true
|
|
}
|
|
```
|
|
|
|
**Simplified Rules**:
|
|
1. Same host + both have paths = DUPLICATE ❌
|
|
2. Same host + both have NO paths = DUPLICATE ❌
|
|
3. Same host + one with paths, one without = ALLOWED ✅ (emergency+main pattern)
|
|
|
|
**Future Work**: Full overlap detection for complex path patterns is deferred.
|
|
|
|
**Tasks**:
|
|
|
|
1. **Create Simple Tracking Structure**:
|
|
- [ ] Add `withPaths` and `withoutPaths` maps to validator
|
|
- [ ] Track hosts separately based on path presence
|
|
|
|
2. **Update Validation Logic**:
|
|
- [ ] Check if match has path matchers (len(match.Path) > 0)
|
|
- [ ] For hosts with paths: allow if counterpart without paths exists
|
|
- [ ] For hosts without paths: allow if counterpart with paths exists
|
|
- [ ] Reject if both routes have same path configuration
|
|
|
|
3. **Update Error Messages**:
|
|
- [ ] Clear error: "duplicate host with paths" or "duplicate host without paths"
|
|
- [ ] Document that this is minimal fix for emergency+main pattern
|
|
|
|
**Success Criteria**:
|
|
- ✅ Emergency + main routes with same host pass validation (one has paths, one doesn't)
|
|
- ✅ True duplicates rejected (both with paths OR both without paths)
|
|
- ✅ Clear error messages when validation fails
|
|
- ✅ All existing tests continue to pass
|
|
|
|
**Files**:
|
|
- `backend/internal/caddy/validator.go` - Simplified duplicate detection
|
|
- `backend/internal/caddy/validator_test.go` - Add test cases
|
|
|
|
**Estimated Time**: 30 minutes (simplified approach)
|
|
|
|
---
|
|
|
|
## Phase 3: Database Migration (DEFERRED)
|
|
|
|
**Status**: ⏸️ DEFERRED - Not needed for this bug fix
|
|
|
|
**Rationale**:
|
|
- DNS is already case-insensitive by RFC spec
|
|
- Caddy handles domains case-insensitively
|
|
- No database duplicates found in current data
|
|
- This bug is purely a code-level validation issue
|
|
- Database constraints can be added in future enhancement if needed
|
|
|
|
**Future Consideration**:
|
|
If case-sensitive duplicates become an issue in production:
|
|
1. Add UNIQUE index on `LOWER(domain_names)`
|
|
2. Add `BeforeSave` hook to normalize domains
|
|
3. Update frontend validation
|
|
|
|
**Estimated Time**: 0 minutes (deferred)
|
|
|
|
---
|
|
|
|
## Phase 4: Testing & Verification
|
|
|
|
**Objective**: Comprehensive testing to ensure fix works and no regressions.
|
|
|
|
**Test Categories**:
|
|
|
|
### Unit Tests
|
|
|
|
1. **Validator Tests** (`validator_test.go`):
|
|
- [ ] Test: Single route with one host → PASS
|
|
- [ ] Test: Two routes with different hosts → PASS
|
|
- [ ] Test: Emergency + main route pattern (one with paths, one without) → PASS ✅ NEW
|
|
- [ ] Test: Two routes with same host, both with paths → FAIL
|
|
- [ ] Test: Two routes with same host, both without paths → FAIL
|
|
- [ ] Test: Route ordering (emergency before main) → PASS ✅ NEW
|
|
- [ ] Test: Multiple proxy hosts (5, 10, 18 hosts) → PASS ✅ NEW
|
|
- [ ] Test: All hosts enabled simultaneously (real-world scenario) → PASS ✅ NEW
|
|
|
|
2. **Config Generation Tests** (`config_test.go`):
|
|
- [ ] Test: Single host generates emergency + main routes
|
|
- [ ] Test: Both routes have same domain list
|
|
- [ ] Test: Emergency route has path matchers
|
|
- [ ] Test: Main route has no path matchers
|
|
- [ ] Test: Route ordering preserved (emergency before main)
|
|
- [ ] Test: Deduplication map prevents domain appearing twice in `uniqueDomains`
|
|
|
|
3. **Performance Tests** (NEW):
|
|
- [ ] Benchmark: Validation with 100 routes
|
|
- [ ] Benchmark: Validation with 1000 routes
|
|
- [ ] Verify: No more than 5% overhead vs old validator
|
|
- [ ] Profile: Memory usage with large configs
|
|
|
|
### Integration Tests
|
|
- Multi-Host Scenario** ⭐ UPDATED:
|
|
- [ ] Create proxy_host with domain "ImmaculateRR.HatfieldHosted.com"
|
|
- [ ] Trigger config generation via `ApplyConfig`
|
|
- [ ] Verify validator passes
|
|
- [ ] Verify Caddy accepts config
|
|
- [ ] **Enable 5 hosts simultaneously** - verify all routes created
|
|
- [ ] **Enable 10 hosts simultaneously** - verify all routes created
|
|
- [ ] **Enable all 18 hosts** - verify complete config loads successfully
|
|
|
|
2. **Emergency Bypass Test - Multiple Hosts**:
|
|
- [ ] Enable multiple proxy hosts with security features (WAF, rate limit)
|
|
- [ ] Verify emergency endpoint `/api/v1/emergency/security-reset` bypasses security on ALL hosts
|
|
- [ ] Verify main application routes have security checks on ALL hosts
|
|
- [ ] Confirm route ordering is correct for ALL hosts (emergency checked first)
|
|
|
|
3. **Rollback Test - Systemic Impact**:
|
|
- [ ] Apply validator fix
|
|
- [ ] Enable ALL 18 proxy hosts successfully
|
|
- [ ] Verify Caddy loads all routes (admin API check)
|
|
- [ ] Rollback to old validator code
|
|
- [ ] Verify sequential failures (Host 24 → Host 22 → ...)
|
|
- [ ] Re-apply fix and confirm all 18 hosts work
|
|
|
|
4. **Caddy AdmiALL Proxy Hosts** ⭐ UPDATED:
|
|
- [ ] Update database: `UPDATE proxy_hosts SET enabled = 1` (enable ALL hosts)
|
|
- [ ] Restart backend or trigger config reload
|
|
- [ ] Verify no "duplicate host matcher" errors for ANY host
|
|
- [ ] Verify Caddy logs show successful config load with all routes
|
|
- [ ] Query Caddy admin API: confirm 36+ routes loaded
|
|
- [ ] Test at least 5 different domains in browser
|
|
|
|
2. **Cross-Browser Test - Multiple Hosts**:
|
|
- [ ] Test at least 3 different proxy host domains from multiple browsers
|
|
- [ ] Verify HTTPS redirects work correctly on all tested hosts
|
|
- [ ] Confirm no certificate warnings on any host
|
|
- [ ] Test emergency endpoint accessibility on all hosts
|
|
|
|
3. **Load Test - All Hosts Enabled** ⭐ NEW:
|
|
- [ ] Enable all 18 proxy hosts
|
|
- [ ] Verify backend startup time is acceptable (<30s)
|
|
- [ ] Verify Caddy config reload time is acceptable (<5s)
|
|
- [ ] Monitor memory usage with full config loaded
|
|
- [ ] Verify no performance degradation vs single host
|
|
|
|
**Success Criteria**:
|
|
- ✅ All unit tests pass (including multi-host scenarios)
|
|
- ✅ All integration tests pass (including 5, 10, 18 host scenarios)
|
|
- ✅ ALL 18 proxy hosts can be enabled simultaneously without errors
|
|
- ✅ Caddy admin API shows 36+ routes loaded (2 per host minimum)
|
|
- ✅ Emergency routes bypass security correctly on ALL hosts
|
|
- ✅ Route ordering verified for ALL hosts (emergency before main)
|
|
- ✅ Rollback test proves fix was necessary (sequential failures return)
|
|
- [ ] Test emergency endpoint accessibility
|
|
|
|
**Success Criteria**:
|
|
- ✅ All unit tests p60 minutes (increased for multi-host testing)
|
|
- ✅ All integration tests pass
|
|
- ✅ Host ID 24 can be enabled without errors
|
|
- ✅ Emergency routes bypass security correctly
|
|
- ✅ Route ordering verified (emergency before main)
|
|
- ✅ Rollback test proves fix was necessary
|
|
- ✅ Performance benchmarks show <5% overhead
|
|
- ✅ No regressions in existing functionality
|
|
|
|
**Estimated Time**: 45 minutes
|
|
|
|
---
|
|
|
|
## Phase 5: Documentation & Deployment
|
|
|
|
**Objective**: Document the fix, update runbooks, and prepare for deployment.
|
|
|
|
**Tasks**:
|
|
|
|
1. **Code Documentation**:
|
|
- [ ] Add comprehensive comments to validator route signature logic
|
|
- [ ] Document why duplicate hosts with different paths are allowed
|
|
- [ ] Add examples of valid and invalid route patterns
|
|
- [ ] Document edge cases and how they're handled
|
|
|
|
2. **API Documentation**:
|
|
- [ ] Update `/docs/api.md` with validator behavior
|
|
- [ ] Document emergency+main route pattern
|
|
- [ ] Explain why duplicate hosts are allowed in this case
|
|
- [ ] Add note that DNS is case-insensitive by nature
|
|
|
|
3. **Runbook Updates**:
|
|
- [ ] Create "Duplicate Host Matcher Error" troubleshooting section
|
|
- [ ] Document root cause and fix
|
|
- [ ] Add steps to diagnose similar issues
|
|
- [ ] Add validation bypass procedure (if needed for emergency)
|
|
|
|
4. **Troubleshooting Guide**:
|
|
- [ ] Document "duplicate host matcher" error
|
|
- [ ] Explain emergency+main route pattern
|
|
- [ ] Provide steps to verify route ordering
|
|
- [ ] Add validation test procedure
|
|
|
|
5. **Changelog**:
|
|
- [ ] Add entry to `CHANGELOG.md` under "Fixed" section:
|
|
```markdown
|
|
### Fixed
|
|
- **CRITICAL**: Fixed systemic "duplicate host matcher" error affecting ALL 18 enabled proxy hosts
|
|
- Simplified Caddy config validator to allow emergency+main route pattern (one with paths, one without)
|
|
- Restored full reverse proxy functionality - Caddy now correctly loads routes for all enabled hosts
|
|
- Emergency bypass routes now function correctly for all proxy hosts
|
|
```
|
|
|
|
6. **Create Diagnostic Tool** (Optional Enhancement):
|
|
- [ ] Add admin API endpoint: `GET /api/v1/debug/caddy-routes`
|
|
- [ ] Returns current route structure with host/path matchers
|
|
- [ ] Highlights potential conflicts before validation
|
|
- [ ] Useful for troubleshooting future issues
|
|
|
|
**Success Criteria**:
|
|
- ✅ Code is well-documented with clear explanations
|
|
- ✅ API docs reflect new behavior
|
|
- ✅ Runbook provides clear troubleshooting steps
|
|
- ✅ Migration is documented and tested
|
|
- ✅ Changelog is updated
|
|
|
|
**Files**:
|
|
- `backend/internal/caddy/validator.go` - Inline comments
|
|
- `backend/internal/caddy/config.go` - Route generation comments
|
|
- `docs/api.md` - API documentation
|
|
- `docs/troubleshooting/duplicate-host-matcher.md` - NEW runbook
|
|
- `CHANGELOG.md` - Version entry
|
|
|
|
**Estimated Time**: 30 minutes
|
|
|
|
---Phase 6: Performance Investigation (DEFERRED - Optional)
|
|
|
|
**Status**: ⏸️ DEFERRED - Secondary issue, not blocking proxy functionality
|
|
ALL 18 enabled proxy hosts can be enabled simultaneously without errors
|
|
- ✅ Caddy loads all routes successfully (36+ routes via admin API)
|
|
- ✅ Emergency routes bypass security features as designed on ALL hosts
|
|
- ✅ Main routes apply security features correctly on ALL hosts
|
|
- ✅ No false positives from validator for valid configs
|
|
- ✅ True duplicate routes still rejected appropriately
|
|
- ✅ Full reverse proxy functionality restored
|
|
- Slow queries on `security_configs` table
|
|
- May impact monitoring responsiveness but does not block proxy functionality
|
|
|
|
**Tasks**:
|
|
|
|
1. **Query Profiling**:
|
|
- [ ] Enable query logging in production
|
|
- [ ] Identify slowest queries with EXPLAIN ANALYZE
|
|
- [ ] Profile table sizes and row counts
|
|
- [ ] Check existing indexes
|
|
|
|
2. **Index Analysis**:
|
|
- [ ] Analyze missing indexes on `uptime_heartbeats`
|
|
- [ ] Analyze missing indexes on `security_configs`
|
|
- [ ] Propose index additions if needed
|
|
- [ ] Test index performance impact
|
|
|
|
3. **Optimization**:
|
|
- [ ] Add indexes if justified by query patterns
|
|
- [ ] Consider query optimization (LIMIT, pagination)
|
|
- [ ] Monitor performance after changes
|
|
- [ ] Document index strategy
|
|
|
|
**Priority**: LOW - Does not block proxy functionality
|
|
**Estimated Time**: Deferred until Phase 2 is complete
|
|
|
|
---
|
|
|
|
## Phase 7: Container Health Check In- SYSTEMIC SCOPE (30 min)
|
|
- [ ] Verify ALL 18 enabled hosts trigger validator error
|
|
- [ ] Test sequential failure pattern (disable one → next fails)
|
|
- [ ] Confirm Caddy has ZERO routes loaded (admin API check)
|
|
- [ ] Verify emergency+main route pattern exists for EVERY host
|
|
- [ ] Add debug logging to config generation and validator
|
|
- [ ] Document systemic findings in diagnosis report
|
|
|
|
### Phase 2: Fix Validator - SIMPLIFIED (30 min)
|
|
- [ ] Create simple tracking structure (withPaths/withoutPaths maps)
|
|
- [ ] Update validation logic to allow one-with-paths + one-without-paths
|
|
- [ ] Update error messages
|
|
- [ ] Write unit tests for emergency+main pattern
|
|
- [ ] Add multi-host test scenarios (5, 10, 18 hosts)
|
|
- [ ] Verify route ordering preserved
|
|
|
|
### Phase 3: Database Migration (0 min)
|
|
- [x] DEFERRED - Not needed for this bug fix
|
|
|
|
### Phase 4: Testing - MULTI-HOST SCENARIOS (60 min)
|
|
- [ ] Write/update validator unit tests (emergency+main pattern)
|
|
- [ ] Add multi-host test scenarios (5, 10, 18 hosts)
|
|
- [ ] Write/update config generation tests (route ordering, all hosts)
|
|
- [ ] Add performance benchmarks (validate handling 18+ hosts)
|
|
- [ ] Run integration tests with all hosts enabled
|
|
- [ ] Perform rollback test (verify sequential failures return)
|
|
- [ ] Re-enable ALL 18 hosts and verify Caddy loads all routes
|
|
- [ ] Verify Caddy admin API shows 36+ routes
|
|
|
|
### Phase 5: Documentation (30 min)
|
|
- [ ] Add code comments explaining simplified approach
|
|
- [ ] Update API documentation
|
|
- [ ] Create troubleshooting guide emphasizing systemic nature
|
|
- [ ] Update changelog with CRITICAL scope
|
|
- [ ] Document that full overlap detection is future work
|
|
- [ ] Document multi-host verification steps
|
|
|
|
### Phase 6: Performance Investigation (DEFERRED)
|
|
- [ ] DEFERRED - Slow SQL queries (uptime_heartbeats, security_configs)
|
|
- [ ] Track as separate issue if proxy functionality is restored
|
|
|
|
### Phase 7: Health Check Investigation (DEFERRED)
|
|
- [ ] DEFERRED - Container health check fails despite 200 OK
|
|
- [ ] Track as separate issue if proxy functionality is restored
|
|
|
|
**Total Estimated Time**: 2 hours 30 minutes (updated for systemic scope
|
|
|
|
---
|
|
|
|
##
|
|
|
|
## Success Metrics
|
|
|
|
### Functionality
|
|
- ✅ Host ID 24 (immaculaterr.hatfieldhosted.com) can be enabled without errors
|
|
- ✅ Emergency routes bypass security features as designed
|
|
- ✅ Main routes apply security features correctly
|
|
- ✅ No false positives from validator for valid configs
|
|
- ✅ True duplicate routes still rejected appropriately
|
|
|
|
### Performance
|
|
- ✅ Validation performance not significantly impacted (< 5% overhead)
|
|
- ✅ Config generation time unchanged
|
|
- ✅ Database query performance not affected by new index
|
|
|
|
### Quality
|
|
- ✅ Zero regressions in existing tests
|
|
- ✅ New test coverage for path-aware validation
|
|
- ✅ Clear error messages for validation failures
|
|
- ✅ Code is maintainable and well-documented
|
|
|
|
---
|
|
|
|
## Risk Assessment
|
|
|
|
| Risk | Impact | Mitigation |
|
|
|------|--------|------------|
|
|
| **Validator Too Permissive** | High | Comprehensive test suite with negative test cases |
|
|
| **Route Ordering Issues** | Medium | Integration tests verify emergency routes checked first |
|
|
| **Migration Failure** | Low | Reversible migration + pre-flight data validation |
|
|
| **Case Normalization Breaks Existing Domains** | Low | Normalization is idempotent (lowercase → lowercase) |
|
|
| **Performance Degradation** | Low | Profile validator changes, ensure <5% overhead |
|
|
|
|
---
|
|
|
|
## Implementation Checklist
|
|
|
|
### Phase 1: Root Cause Verification (20 min)
|
|
- [ ] Reproduce error on demand
|
|
- [ ] Add debug logging to config generation
|
|
- [ ] Add debug logging to validator
|
|
- [ ] Confirm emergency + main route pattern
|
|
- [ ] Document findings
|
|
|
|
### Phase 2: Fix Validator - SIMPLIFIED (30 min)
|
|
- [ ] Create simple tracking structure (withPaths/withoutPaths maps)
|
|
- [ ] Update validation logic to allow one-with-paths + one-without-paths
|
|
- [ ] Update error messages
|
|
- [ ] Write unit tests for emergency+main pattern
|
|
- [ ] Verify route ordering preserved
|
|
|
|
### Phase 3: Database Migration (0 min)
|
|
- [x] DEFERRED - Not needed for this bug fix
|
|
|
|
### Phase 4: Testing (45 min)
|
|
- [ ] Write/update validator unit tests (emergency+main pattern)
|
|
- [ ] Write/update config generation tests (route ordering)
|
|
- [ ] Add performance benchmarks
|
|
- [ ] Run integration tests
|
|
- [ ] Perform rollback test
|
|
- [ ] Re-enable Host ID 24 verification
|
|
|
|
### Phase 5: Documentation (30 min)
|
|
- [ ] Add code comments explaining simplified approach
|
|
- [ ] Update API documentation
|
|
- [ ] Create troubleshooting guide
|
|
- [ ] Update changelog
|
|
- [ ] Document that full overlap detection is future work
|
|
|
|
**T**Re-enable ALL proxy hosts** (not just Host ID 24)
|
|
4. Verify Caddy loads all routes successfully (admin API check)
|
|
5. Verify emergency routes work correctly on all hosts
|
|
|
|
### Post-Deployment
|
|
1. Verify ALL 18 proxy hosts are accessible
|
|
2. Verify Caddy admin API shows 36+ routes loaded
|
|
3. Test emergency endpoint bypasses security on multiple hosts
|
|
4. Monitor for "duplicate host matcher" errors (should be zero)
|
|
5. Verify full reverse proxy functionality restored
|
|
6. Monitor performance with all hosts enabled
|
|
|
|
### Rollback Plan
|
|
If issues arise:
|
|
1. Rollback backend to previous version
|
|
2. Document which hosts fail (expect sequential pattern)
|
|
3. Review validator logs to identify cause
|
|
4. Disable problematic hosts temporarily if needed
|
|
5. Re-apply fix after investigation
|
|
3. Re-enable Host ID 24 if still disabled
|
|
4. Verify emergency routes work correctly
|
|
|
|
### Post-Deployment
|
|
1. Verify Host ID 24 is accessible
|
|
2. Test emergency endpoint bypasses security
|
|
3. Monitor for "duplicate host matcher" errors
|
|
4. Check database constraint is enforcing uniqueness
|
|
|
|
### Rollback Plan
|
|
If issues arise:
|
|
1. Rollback backend to previous version
|
|
2. Re-disable Host ID 24 if necessary
|
|
3. Review validator logs to identify cause
|
|
4. Investigate unexpected route patterns
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Full Path Overlap Detection**:
|
|
- Current fix handles emergency+main pattern only (one-with-paths + one-without-paths)
|
|
- Future: Detect complex overlaps (e.g., `/api/*` vs `/api/v1/*`)
|
|
- Future: Validate path pattern specificity
|
|
- Future: Warn on ambiguous route priority
|
|
|
|
2. **Visual Route Debugger**:
|
|
- Admin UI component showing route tree
|
|
- Highlights potential conflicts
|
|
## Known Secondary Issues (Tracked Separately)
|
|
|
|
These issues were discovered during diagnosis but are NOT blocking proxy functionality:
|
|
|
|
1. **Slow SQL Queries (Phase 6 - DEFERRED)**:
|
|
- `uptime_heartbeats` table queries >200ms
|
|
- `security_configs` table queries >200ms
|
|
- Impacts monitoring responsiveness, not proxy functionality
|
|
- **Action**: Track as separate performance issue after Phase 2 complete
|
|
|
|
2. **Container Health Check Failure (Phase 7 - DEFERRED)**:
|
|
- Backend health endpoint returns 200 OK consistently
|
|
- Docker container marked as unhealthy
|
|
- May be timeout issue (3s too short?)
|
|
- Does not affect proxy functionality (backend is running)
|
|
- **Action**: Track as separate Docker configuration issue after Phase 2 complete
|
|
|
|
---
|
|
|
|
**Plan Status**: ✅ READY FOR IMPLEMENTATION (EXPANDED SCOPE)
|
|
**Next Action**: Begin Phase 1 - Root Cause Verification - SYSTEMIC SCOPE
|
|
**Assigned To**: Implementation Agent
|
|
**Priority**: CRITICAL 🔴🔴🔴 - ALL 18 PROXY HOSTS DOWN, ZERO CADDY ROUTES LOADED
|
|
**Scope**: Systemic bug affecting entire reverse proxy functionality (not single-host issue)
|
|
- Warn (don't error) on suspicious patterns
|
|
- Suggest route optimizations
|
|
- Show effective route priority
|
|
- Highlight overlapping matchers
|
|
|
|
4. **Database Domain Normalization** (if needed):
|
|
- Add case-insensitive uniqueness constraint
|
|
- BeforeSave hook for normalization
|
|
- Frontend validation hints
|
|
- Only if case duplicates become production issue
|
|
|
|
---
|
|
|
|
**Plan Status**: ✅ READY FOR IMPLEMENTATION
|
|
**Next Action**: Begin Phase 1 - Root Cause Verification
|
|
**Assigned To**: Implementation Agent
|
|
**Priority**: HIGH - Blocking Host ID 24 from being enabled
|