Files
Charon/docs/plans/current_spec.md

27 KiB

Duplicate Proxy Host Bug Fix - Simplified Validator (SYSTEMIC ISSUE)

Status: ACTIVE - MINIMAL FIX APPROACH Priority: CRITICAL 🔴🔴🔴 - ALL 18 ENABLED PROXY HOSTS DOWN Created: 2026-01-28 Updated: 2026-01-28 (EXPANDED SCOPE - Systemic issue confirmed) Bug: Caddy validator rejects emergency+main route pattern for EVERY proxy host (duplicate host with different path constraints)


Executive Summary

CRITICAL SYSTEMIC BUG: Caddy's pre-flight validator rejects the emergency+main route pattern for EVERY enabled proxy host. The emergency route (with path matchers) and main route (without path matchers) share the same domain, causing "duplicate host matcher" error on ALL hosts.

Impact:

  • 🔴🔴🔴 ZERO routes loaded in Caddy - ALL proxy hosts are down
  • 🔴 18 enabled proxy hosts cannot be activated (not just Host ID 24)
  • 🔴 Entire reverse proxy functionality is non-functional
  • 🟡 Emergency bypass routes blocked for all hosts
  • 🟡 Sequential failures: Host 24 → Host 22 → (pattern repeats for every host)
  • 🟢 Backend health endpoint returns 200 OK (separate container health issue)

Root Cause: Validator treats ALL duplicate hosts as errors without considering that routes with different path constraints are valid. The emergency+main route pattern is applied to EVERY proxy host by design, causing systematic rejection.

Minimal Fix: Simplify validator to allow duplicate hosts when ONE has path matchers and ONE doesn't. This will unblock ALL 18 enabled proxy hosts simultaneously, restoring full reverse proxy functionality. Full overlap detection is future work.

Database: NO issues - DNS is already case-insensitive. No migration needed.

Secondary Issues (tracked but deferred):

  • 🟡 Slow SQL queries (>200ms) on uptime_heartbeats and security_configs tables
  • 🟡 Container health check fails despite 200 OK from health endpoint (may be timeout issue)

Technical Analysis

Current Route Structure

For each proxy host, GenerateConfig creates TWO routes with the SAME domain list:

  1. Emergency Route (lines 571-584 in config.go):

    emergencyRoute := &Route{
        Match: []Match{{
            Host: uniqueDomains,          // immaculaterr.hatfieldhosted.com
            Path: emergencyPaths,         // /api/v1/emergency/*
        }},
        Handle: emergencyHandlers,
        Terminal: true,
    }
    
  2. Main Route (lines 586-598 in config.go):

    route := &Route{
        Match: []Match{{
            Host: uniqueDomains,          // immaculaterr.hatfieldhosted.com (DUPLICATE!)
        }},
        Handle: mainHandlers,
        Terminal: true,
    }
    

Why Validator Fails

// validator.go lines 89-93
for _, host := range match.Host {
    if seenHosts[host] {
        return fmt.Errorf("duplicate host matcher: %s", host)
    }
    seenHosts[host] = true
}

The validator:

  1. Processes emergency route: adds "immaculaterr.hatfieldhosted.com" to seenHosts
  2. Processes main route: sees "immaculaterr.hatfieldhosted.com" again → ERROR

The validator does NOT consider:

  • Path matchers that make routes non-overlapping
  • Route ordering/priority (emergency route is checked first)
  • Caddy's native ability to handle this correctly

Why Caddy Handles This Correctly

Caddy processes routes in order:

  1. First matches emergency route (host + path): /api/v1/emergency/* → bypass security
  2. Falls through to main route (host only): everything else → apply security

This is a valid and intentional design pattern - the validator is wrong to reject it.


Solution: Simplified Validator Fix CHOSEN APPROACH

Approach: Minimal fix to allow emergency+main route pattern specifically.

Implementation:

  • Track hosts seen with path matchers vs without path matchers separately
  • Allow duplicate host if ONE has paths and ONE doesn't (the emergency+main pattern)
  • Reject if both routes have paths OR both have no paths

Pros:

  • Minimal change - unblocks ALL 18 proxy hosts simultaneously
  • Preserves current route structure
  • Simple logic - easy to understand and maintain
  • Fixes the systemic design pattern bug affecting entire reverse proxy

Limitations (Future Work):

  • ⚠️ Does not detect complex path overlaps (e.g., /api/* vs /api/v1/*)
  • ⚠️ Full path pattern analysis deferred to future enhancement
  • ⚠️ Assumes emergency+main pattern is primary use case

Changes Required:

  • backend/internal/caddy/validator.go: Simplified duplicate detection (two maps: withPaths/withoutPaths)
  • Tests for emergency+main pattern, route ordering, rollback

Deferred:

  • Database migration (DNS already case-insensitive)
  • Complex path overlap detection (future enhancement)

Phase 1: Root Cause Verification - SYSTEMIC SCOPE

Objective: Confirm bug affects ALL enabled proxy hosts and document the systemic failure pattern.

Tasks:

  1. Verify Systemic Impact NEW:

    • Query database for ALL enabled proxy hosts (should be 18)
    • Verify Caddy has ZERO routes loaded (admin API check)
    • Document sequential failure pattern (Host 24 disabled → Host 22 fails next)
    • Confirm EVERY enabled host triggers same validator error
    • Test hypothesis: Disable all hosts except one → still fails
  2. Reproduce Error on Multiple Hosts:

    • Test Host ID 24 (immaculaterr.hatfieldhosted.com) - original failure
    • Test Host ID 22 (dockhand.hatfieldhosted.com) - second failure after disabling 24
    • Test at least 3 additional hosts to confirm pattern
    • Capture full error message from validator for each
    • Document that error is identical across all hosts
  3. Analyze Generated Config for ALL Hosts:

    • Add debug logging to GenerateConfig before validation
    • Log uniqueDomains list after deduplication for each host
    • Log complete route structure before sending to validator
    • Count how many routes contain each domain (should be 2: emergency + main)
    • Verify emergency+main pattern exists for EVERY proxy host
  4. Trace Validation Flow:

    • Add debug logging to validateRoute function
    • Log each host as it's added to seenHosts map
    • Log route index and match conditions when duplicate detected
    • Confirm emergency route (index 0) succeeds for all hosts
    • Confirm main route (index 1) triggers duplicate error for all hosts

Success Criteria:

  • Confirmed: ALL 18 enabled proxy hosts trigger the same error
  • Confirmed: Caddy has ZERO routes loaded (admin API returns empty)
  • Confirmed: Sequential failure pattern documented (disable one → next fails)
  • Confirmed: Emergency+main route pattern exists for EVERY host
  • Confirmed: Validator rejects at main route (index 1) for all hosts
  • Confirmed: This is a design pattern bug, not a data issue

Files:

  • backend/internal/caddy/config.go - Add debug logging
  • backend/internal/caddy/validator.go - Add debug logging
  • backend/internal/services/proxyhost_service.go - Trigger config generation
  • docs/reports/duplicate_proxy_host_diagnosis.md - Document systemic findings

Estimated Time: 30 minutes (increased for systemic verification)


Phase 2: Fix Validator (Simplified Path Detection)

Objective: MINIMAL fix to allow emergency+main route pattern (duplicate host where ONE has paths, ONE doesn't).

Implementation Strategy:

Simplify validator to handle the specific emergency+main pattern:

  • Track hosts seen with paths vs without paths
  • Allow duplicate hosts if ONE has path matchers, ONE doesn't
  • This handles emergency route (has paths) + main route (no paths)

Algorithm:

// Track hosts by whether they have path constraints
type hostTracking struct {
    withPaths    map[string]bool  // hosts that have path matchers
    withoutPaths map[string]bool  // hosts without path matchers
}

for each route:
    for each match in route.Match:
        for each host:
            hasPaths := len(match.Path) > 0

            if hasPaths:
                // Check if we've seen this host WITHOUT paths
                if tracking.withoutPaths[host]:
                    continue // ALLOWED: emergency (with) + main (without)
                }
                if tracking.withPaths[host]:
                    return error("duplicate host with paths")
                }
                tracking.withPaths[host] = true
            } else {
                // Check if we've seen this host WITH paths
                if tracking.withPaths[host]:
                    continue // ALLOWED: emergency (with) + main (without)
                }
                if tracking.withoutPaths[host]:
                    return error("duplicate host without paths")
                }
                tracking.withoutPaths[host] = true
            }

Simplified Rules:

  1. Same host + both have paths = DUPLICATE
  2. Same host + both have NO paths = DUPLICATE
  3. Same host + one with paths, one without = ALLOWED (emergency+main pattern)

Future Work: Full overlap detection for complex path patterns is deferred.

Tasks:

  1. Create Simple Tracking Structure:

    • Add withPaths and withoutPaths maps to validator
    • Track hosts separately based on path presence
  2. Update Validation Logic:

    • Check if match has path matchers (len(match.Path) > 0)
    • For hosts with paths: allow if counterpart without paths exists
    • For hosts without paths: allow if counterpart with paths exists
    • Reject if both routes have same path configuration
  3. Update Error Messages:

    • Clear error: "duplicate host with paths" or "duplicate host without paths"
    • Document that this is minimal fix for emergency+main pattern

Success Criteria:

  • Emergency + main routes with same host pass validation (one has paths, one doesn't)
  • True duplicates rejected (both with paths OR both without paths)
  • Clear error messages when validation fails
  • All existing tests continue to pass

Files:

  • backend/internal/caddy/validator.go - Simplified duplicate detection
  • backend/internal/caddy/validator_test.go - Add test cases

Estimated Time: 30 minutes (simplified approach)


Phase 3: Database Migration (DEFERRED)

Status: ⏸️ DEFERRED - Not needed for this bug fix

Rationale:

  • DNS is already case-insensitive by RFC spec
  • Caddy handles domains case-insensitively
  • No database duplicates found in current data
  • This bug is purely a code-level validation issue
  • Database constraints can be added in future enhancement if needed

Future Consideration: If case-sensitive duplicates become an issue in production:

  1. Add UNIQUE index on LOWER(domain_names)
  2. Add BeforeSave hook to normalize domains
  3. Update frontend validation

Estimated Time: 0 minutes (deferred)


Phase 4: Testing & Verification

Objective: Comprehensive testing to ensure fix works and no regressions.

Test Categories:

Unit Tests

  1. Validator Tests (validator_test.go):

    • Test: Single route with one host → PASS
    • Test: Two routes with different hosts → PASS
    • Test: Emergency + main route pattern (one with paths, one without) → PASS NEW
    • Test: Two routes with same host, both with paths → FAIL
    • Test: Two routes with same host, both without paths → FAIL
    • Test: Route ordering (emergency before main) → PASS NEW
    • Test: Multiple proxy hosts (5, 10, 18 hosts) → PASS NEW
    • Test: All hosts enabled simultaneously (real-world scenario) → PASS NEW
  2. Config Generation Tests (config_test.go):

    • Test: Single host generates emergency + main routes
    • Test: Both routes have same domain list
    • Test: Emergency route has path matchers
    • Test: Main route has no path matchers
    • Test: Route ordering preserved (emergency before main)
    • Test: Deduplication map prevents domain appearing twice in uniqueDomains
  3. Performance Tests (NEW):

    • Benchmark: Validation with 100 routes
    • Benchmark: Validation with 1000 routes
    • Verify: No more than 5% overhead vs old validator
    • Profile: Memory usage with large configs

Integration Tests

  • Multi-Host Scenario** UPDATED:
    • Create proxy_host with domain "ImmaculateRR.HatfieldHosted.com"
    • Trigger config generation via ApplyConfig
    • Verify validator passes
    • Verify Caddy accepts config
    • Enable 5 hosts simultaneously - verify all routes created
    • Enable 10 hosts simultaneously - verify all routes created
    • Enable all 18 hosts - verify complete config loads successfully
  1. Emergency Bypass Test - Multiple Hosts:

    • Enable multiple proxy hosts with security features (WAF, rate limit)
    • Verify emergency endpoint /api/v1/emergency/security-reset bypasses security on ALL hosts
    • Verify main application routes have security checks on ALL hosts
    • Confirm route ordering is correct for ALL hosts (emergency checked first)
  2. Rollback Test - Systemic Impact:

    • Apply validator fix
    • Enable ALL 18 proxy hosts successfully
    • Verify Caddy loads all routes (admin API check)
    • Rollback to old validator code
    • Verify sequential failures (Host 24 → Host 22 → ...)
    • Re-apply fix and confirm all 18 hosts work
  3. Caddy AdmiALL Proxy Hosts UPDATED:

    • Update database: UPDATE proxy_hosts SET enabled = 1 (enable ALL hosts)
    • Restart backend or trigger config reload
    • Verify no "duplicate host matcher" errors for ANY host
    • Verify Caddy logs show successful config load with all routes
    • Query Caddy admin API: confirm 36+ routes loaded
    • Test at least 5 different domains in browser
  4. Cross-Browser Test - Multiple Hosts:

    • Test at least 3 different proxy host domains from multiple browsers
    • Verify HTTPS redirects work correctly on all tested hosts
    • Confirm no certificate warnings on any host
    • Test emergency endpoint accessibility on all hosts
  5. Load Test - All Hosts Enabled NEW:

    • Enable all 18 proxy hosts
    • Verify backend startup time is acceptable (<30s)
    • Verify Caddy config reload time is acceptable (<5s)
    • Monitor memory usage with full config loaded
    • Verify no performance degradation vs single host

Success Criteria:

  • All unit tests pass (including multi-host scenarios)
  • All integration tests pass (including 5, 10, 18 host scenarios)
  • ALL 18 proxy hosts can be enabled simultaneously without errors
  • Caddy admin API shows 36+ routes loaded (2 per host minimum)
  • Emergency routes bypass security correctly on ALL hosts
  • Route ordering verified for ALL hosts (emergency before main)
  • Rollback test proves fix was necessary (sequential failures return)
    • Test emergency endpoint accessibility

Success Criteria:

  • All unit tests p60 minutes (increased for multi-host testing)
  • All integration tests pass
  • Host ID 24 can be enabled without errors
  • Emergency routes bypass security correctly
  • Route ordering verified (emergency before main)
  • Rollback test proves fix was necessary
  • Performance benchmarks show <5% overhead
  • No regressions in existing functionality

Estimated Time: 45 minutes


Phase 5: Documentation & Deployment

Objective: Document the fix, update runbooks, and prepare for deployment.

Tasks:

  1. Code Documentation:

    • Add comprehensive comments to validator route signature logic
    • Document why duplicate hosts with different paths are allowed
    • Add examples of valid and invalid route patterns
    • Document edge cases and how they're handled
  2. API Documentation:

    • Update /docs/api.md with validator behavior
    • Document emergency+main route pattern
    • Explain why duplicate hosts are allowed in this case
    • Add note that DNS is case-insensitive by nature
  3. Runbook Updates:

    • Create "Duplicate Host Matcher Error" troubleshooting section
    • Document root cause and fix
    • Add steps to diagnose similar issues
    • Add validation bypass procedure (if needed for emergency)
  4. Troubleshooting Guide:

    • Document "duplicate host matcher" error
    • Explain emergency+main route pattern
    • Provide steps to verify route ordering
    • Add validation test procedure
  5. Changelog:

    • Add entry to CHANGELOG.md under "Fixed" section:
      ### Fixed
      - **CRITICAL**: Fixed systemic "duplicate host matcher" error affecting ALL 18 enabled proxy hosts
      - Simplified Caddy config validator to allow emergency+main route pattern (one with paths, one without)
      - Restored full reverse proxy functionality - Caddy now correctly loads routes for all enabled hosts
      - Emergency bypass routes now function correctly for all proxy hosts
      
  6. Create Diagnostic Tool (Optional Enhancement):

    • Add admin API endpoint: GET /api/v1/debug/caddy-routes
    • Returns current route structure with host/path matchers
    • Highlights potential conflicts before validation
    • Useful for troubleshooting future issues

Success Criteria:

  • Code is well-documented with clear explanations
  • API docs reflect new behavior
  • Runbook provides clear troubleshooting steps
  • Migration is documented and tested
  • Changelog is updated

Files:

  • backend/internal/caddy/validator.go - Inline comments
  • backend/internal/caddy/config.go - Route generation comments
  • docs/api.md - API documentation
  • docs/troubleshooting/duplicate-host-matcher.md - NEW runbook
  • CHANGELOG.md - Version entry

Estimated Time: 30 minutes

---Phase 6: Performance Investigation (DEFERRED - Optional)

Status: ⏸️ DEFERRED - Secondary issue, not blocking proxy functionality ALL 18 enabled proxy hosts can be enabled simultaneously without errors

  • Caddy loads all routes successfully (36+ routes via admin API)
  • Emergency routes bypass security features as designed on ALL hosts
  • Main routes apply security features correctly on ALL hosts
  • No false positives from validator for valid configs
  • True duplicate routes still rejected appropriately
  • Full reverse proxy functionality restored
  • Slow queries on security_configs table
  • May impact monitoring responsiveness but does not block proxy functionality

Tasks:

  1. Query Profiling:

    • Enable query logging in production
    • Identify slowest queries with EXPLAIN ANALYZE
    • Profile table sizes and row counts
    • Check existing indexes
  2. Index Analysis:

    • Analyze missing indexes on uptime_heartbeats
    • Analyze missing indexes on security_configs
    • Propose index additions if needed
    • Test index performance impact
  3. Optimization:

    • Add indexes if justified by query patterns
    • Consider query optimization (LIMIT, pagination)
    • Monitor performance after changes
    • Document index strategy

Priority: LOW - Does not block proxy functionality Estimated Time: Deferred until Phase 2 is complete


Phase 7: Container Health Check In- SYSTEMIC SCOPE (30 min)

  • Verify ALL 18 enabled hosts trigger validator error
  • Test sequential failure pattern (disable one → next fails)
  • Confirm Caddy has ZERO routes loaded (admin API check)
  • Verify emergency+main route pattern exists for EVERY host
  • Add debug logging to config generation and validator
  • Document systemic findings in diagnosis report

Phase 2: Fix Validator - SIMPLIFIED (30 min)

  • Create simple tracking structure (withPaths/withoutPaths maps)
  • Update validation logic to allow one-with-paths + one-without-paths
  • Update error messages
  • Write unit tests for emergency+main pattern
  • Add multi-host test scenarios (5, 10, 18 hosts)
  • Verify route ordering preserved

Phase 3: Database Migration (0 min)

  • DEFERRED - Not needed for this bug fix

Phase 4: Testing - MULTI-HOST SCENARIOS (60 min)

  • Write/update validator unit tests (emergency+main pattern)
  • Add multi-host test scenarios (5, 10, 18 hosts)
  • Write/update config generation tests (route ordering, all hosts)
  • Add performance benchmarks (validate handling 18+ hosts)
  • Run integration tests with all hosts enabled
  • Perform rollback test (verify sequential failures return)
  • Re-enable ALL 18 hosts and verify Caddy loads all routes
  • Verify Caddy admin API shows 36+ routes

Phase 5: Documentation (30 min)

  • Add code comments explaining simplified approach
  • Update API documentation
  • Create troubleshooting guide emphasizing systemic nature
  • Update changelog with CRITICAL scope
  • Document that full overlap detection is future work
  • Document multi-host verification steps

Phase 6: Performance Investigation (DEFERRED)

  • DEFERRED - Slow SQL queries (uptime_heartbeats, security_configs)
  • Track as separate issue if proxy functionality is restored

Phase 7: Health Check Investigation (DEFERRED)

  • DEFERRED - Container health check fails despite 200 OK
  • Track as separate issue if proxy functionality is restored

Total Estimated Time: 2 hours 30 minutes (updated for systemic scope


Success Metrics

Functionality

  • Host ID 24 (immaculaterr.hatfieldhosted.com) can be enabled without errors
  • Emergency routes bypass security features as designed
  • Main routes apply security features correctly
  • No false positives from validator for valid configs
  • True duplicate routes still rejected appropriately

Performance

  • Validation performance not significantly impacted (< 5% overhead)
  • Config generation time unchanged
  • Database query performance not affected by new index

Quality

  • Zero regressions in existing tests
  • New test coverage for path-aware validation
  • Clear error messages for validation failures
  • Code is maintainable and well-documented

Risk Assessment

Risk Impact Mitigation
Validator Too Permissive High Comprehensive test suite with negative test cases
Route Ordering Issues Medium Integration tests verify emergency routes checked first
Migration Failure Low Reversible migration + pre-flight data validation
Case Normalization Breaks Existing Domains Low Normalization is idempotent (lowercase → lowercase)
Performance Degradation Low Profile validator changes, ensure <5% overhead

Implementation Checklist

Phase 1: Root Cause Verification (20 min)

  • Reproduce error on demand
  • Add debug logging to config generation
  • Add debug logging to validator
  • Confirm emergency + main route pattern
  • Document findings

Phase 2: Fix Validator - SIMPLIFIED (30 min)

  • Create simple tracking structure (withPaths/withoutPaths maps)
  • Update validation logic to allow one-with-paths + one-without-paths
  • Update error messages
  • Write unit tests for emergency+main pattern
  • Verify route ordering preserved

Phase 3: Database Migration (0 min)

  • DEFERRED - Not needed for this bug fix

Phase 4: Testing (45 min)

  • Write/update validator unit tests (emergency+main pattern)
  • Write/update config generation tests (route ordering)
  • Add performance benchmarks
  • Run integration tests
  • Perform rollback test
  • Re-enable Host ID 24 verification

Phase 5: Documentation (30 min)

  • Add code comments explaining simplified approach
  • Update API documentation
  • Create troubleshooting guide
  • Update changelog
  • Document that full overlap detection is future work

TRe-enable ALL proxy hosts** (not just Host ID 24) 4. Verify Caddy loads all routes successfully (admin API check) 5. Verify emergency routes work correctly on all hosts

Post-Deployment

  1. Verify ALL 18 proxy hosts are accessible
  2. Verify Caddy admin API shows 36+ routes loaded
  3. Test emergency endpoint bypasses security on multiple hosts
  4. Monitor for "duplicate host matcher" errors (should be zero)
  5. Verify full reverse proxy functionality restored
  6. Monitor performance with all hosts enabled

Rollback Plan

If issues arise:

  1. Rollback backend to previous version
  2. Document which hosts fail (expect sequential pattern)
  3. Review validator logs to identify cause
  4. Disable problematic hosts temporarily if needed
  5. Re-apply fix after investigation
  6. Re-enable Host ID 24 if still disabled
  7. Verify emergency routes work correctly

Post-Deployment

  1. Verify Host ID 24 is accessible
  2. Test emergency endpoint bypasses security
  3. Monitor for "duplicate host matcher" errors
  4. Check database constraint is enforcing uniqueness

Rollback Plan

If issues arise:

  1. Rollback backend to previous version
  2. Re-disable Host ID 24 if necessary
  3. Review validator logs to identify cause
  4. Investigate unexpected route patterns

Future Enhancements

  1. Full Path Overlap Detection:

    • Current fix handles emergency+main pattern only (one-with-paths + one-without-paths)
    • Future: Detect complex overlaps (e.g., /api/* vs /api/v1/*)
    • Future: Validate path pattern specificity
    • Future: Warn on ambiguous route priority
  2. Visual Route Debugger:

    • Admin UI component showing route tree
    • Highlights potential conflicts

Known Secondary Issues (Tracked Separately)

These issues were discovered during diagnosis but are NOT blocking proxy functionality:

  1. Slow SQL Queries (Phase 6 - DEFERRED):

    • uptime_heartbeats table queries >200ms
    • security_configs table queries >200ms
    • Impacts monitoring responsiveness, not proxy functionality
    • Action: Track as separate performance issue after Phase 2 complete
  2. Container Health Check Failure (Phase 7 - DEFERRED):

    • Backend health endpoint returns 200 OK consistently
    • Docker container marked as unhealthy
    • May be timeout issue (3s too short?)
    • Does not affect proxy functionality (backend is running)
    • Action: Track as separate Docker configuration issue after Phase 2 complete

Plan Status: READY FOR IMPLEMENTATION (EXPANDED SCOPE) Next Action: Begin Phase 1 - Root Cause Verification - SYSTEMIC SCOPE Assigned To: Implementation Agent Priority: CRITICAL 🔴🔴🔴 - ALL 18 PROXY HOSTS DOWN, ZERO CADDY ROUTES LOADED Scope: Systemic bug affecting entire reverse proxy functionality (not single-host issue)

  • Warn (don't error) on suspicious patterns
  • Suggest route optimizations
  • Show effective route priority
  • Highlight overlapping matchers
  1. Database Domain Normalization (if needed):
    • Add case-insensitive uniqueness constraint
    • BeforeSave hook for normalization
    • Frontend validation hints
    • Only if case duplicates become production issue

Plan Status: READY FOR IMPLEMENTATION Next Action: Begin Phase 1 - Root Cause Verification Assigned To: Implementation Agent Priority: HIGH - Blocking Host ID 24 from being enabled