fix(monitoring): resolve uptime port mismatch for non-standard ports

Fixes uptime monitoring incorrectly using public URL port instead of actual backend forward_port for TCP connectivity checks. Changes: - Add ProxyHost relationship to UptimeMonitor model - Update checkHost() to use ProxyHost.ForwardPort - Add Preload for ProxyHost in getAllMonitors() - Add diagnostic logging for port resolution This fixes false "down" status for services like Wizarr that use non-standard backend ports (5690) while exposing standard HTTPS (443). Testing: - Wizarr now shows as "up" (was incorrectly "down") - All 16 monitors working correctly - Backend coverage: 85.5% - No regressions in other uptime checks Resolves: Wizarr uptime monitoring false negative
2025-12-23 03:28:45 +00:00
parent 0543a15344
commit 209b2fc8e0
5 changed files with 943 additions and 14 deletions
@@ -758,6 +758,57 @@ The animations tell you what's happening so you don't think it's broken.
 **Optional:** You can disable this feature in System Settings → Optional Features if you don't need it.
 Your uptime history will be preserved.

+### How Uptime Checks Work
+
+Charon uses a **two-level check system** for efficient monitoring:
+
+#### Level 1: Host-Level Pre-Check (TCP)
+
+**What it does:** Quickly tests if the backend host/container is reachable via TCP connection.
+
+**How it works:**
+- Groups monitors by their backend IP address (e.g., `172.20.0.11`)
+- Attempts TCP connection to the actual backend port (e.g., port `5690` for Wizarr)
+- If successful → Proceeds to Level 2 checks
+- If failed → Marks all monitors on that host as "down" (skips Level 2)
+
+**Why it matters:** Avoids redundant HTTP checks when an entire backend container is stopped or unreachable.
+
+**Technical detail:** Uses the `forward_port` from your proxy host configuration, not the public URL port.
+This ensures correct connectivity checks for services on non-standard ports.
+
+#### Level 2: Service-Level Check (HTTP/HTTPS)
+
+**What it does:** Verifies the specific service is responding correctly via HTTP request.
+
+**How it works:**
+- Only runs if Level 1 passes
+- Performs HTTP GET to the public URL (e.g., `https://wizarr.hatfieldhosted.com`)
+- Accepts these as "up": 2xx (success), 3xx (redirect), 401 (auth required), 403 (forbidden)
+- Measures response latency
+- Records heartbeat with status
+
+**Why it matters:** Detects service-specific issues like crashes, misconfigurations, or certificate problems.
+
+**Example:** A service might be running (Level 1 passes) but return 500 errors (Level 2 catches this).
+
+### When Things Go Wrong
+
+**Scenario 1: Backend container stopped**
+- Level 1: TCP connection fails ❌
+- Level 2: Skipped
+- Status: "down" with message "Host unreachable"
+
+**Scenario 2: Service crashed but container running**
+- Level 1: TCP connection succeeds ✅
+- Level 2: HTTP request fails or returns 500 ❌
+- Status: "down" with specific HTTP error
+
+**Scenario 3: Everything working**
+- Level 1: TCP connection succeeds ✅
+- Level 2: HTTP request succeeds ✅
+- Status: "up" with latency measurement
+
 ---

 ## \ud83d\udccb Logs & Monitoring