fix(monitoring): resolve uptime port mismatch for non-standard ports

Fixes uptime monitoring incorrectly using public URL port instead of
actual backend forward_port for TCP connectivity checks.

Changes:
- Add ProxyHost relationship to UptimeMonitor model
- Update checkHost() to use ProxyHost.ForwardPort
- Add Preload for ProxyHost in getAllMonitors()
- Add diagnostic logging for port resolution

This fixes false "down" status for services like Wizarr that use
non-standard backend ports (5690) while exposing standard HTTPS (443).

Testing:
- Wizarr now shows as "up" (was incorrectly "down")
- All 16 monitors working correctly
- Backend coverage: 85.5%
- No regressions in other uptime checks

Resolves: Wizarr uptime monitoring false negative
This commit is contained in:
GitHub Actions
2025-12-23 03:28:45 +00:00
parent 0543a15344
commit 209b2fc8e0
5 changed files with 943 additions and 14 deletions
+51
View File
@@ -758,6 +758,57 @@ The animations tell you what's happening so you don't think it's broken.
**Optional:** You can disable this feature in System Settings → Optional Features if you don't need it.
Your uptime history will be preserved.
### How Uptime Checks Work
Charon uses a **two-level check system** for efficient monitoring:
#### Level 1: Host-Level Pre-Check (TCP)
**What it does:** Quickly tests if the backend host/container is reachable via TCP connection.
**How it works:**
- Groups monitors by their backend IP address (e.g., `172.20.0.11`)
- Attempts TCP connection to the actual backend port (e.g., port `5690` for Wizarr)
- If successful → Proceeds to Level 2 checks
- If failed → Marks all monitors on that host as "down" (skips Level 2)
**Why it matters:** Avoids redundant HTTP checks when an entire backend container is stopped or unreachable.
**Technical detail:** Uses the `forward_port` from your proxy host configuration, not the public URL port.
This ensures correct connectivity checks for services on non-standard ports.
#### Level 2: Service-Level Check (HTTP/HTTPS)
**What it does:** Verifies the specific service is responding correctly via HTTP request.
**How it works:**
- Only runs if Level 1 passes
- Performs HTTP GET to the public URL (e.g., `https://wizarr.hatfieldhosted.com`)
- Accepts these as "up": 2xx (success), 3xx (redirect), 401 (auth required), 403 (forbidden)
- Measures response latency
- Records heartbeat with status
**Why it matters:** Detects service-specific issues like crashes, misconfigurations, or certificate problems.
**Example:** A service might be running (Level 1 passes) but return 500 errors (Level 2 catches this).
### When Things Go Wrong
**Scenario 1: Backend container stopped**
- Level 1: TCP connection fails ❌
- Level 2: Skipped
- Status: "down" with message "Host unreachable"
**Scenario 2: Service crashed but container running**
- Level 1: TCP connection succeeds ✅
- Level 2: HTTP request fails or returns 500 ❌
- Status: "down" with specific HTTP error
**Scenario 3: Everything working**
- Level 1: TCP connection succeeds ✅
- Level 2: HTTP request succeeds ✅
- Status: "up" with latency measurement
---
## \ud83d\udccb Logs & Monitoring