10 KiB
Uptime Monitoring Diagnosis: Wizarr Host False "Down" Status
Summary
Issue: Newly created Wizarr Proxy Host shows as "down" in uptime monitoring, despite the domain working correctly when accessed by users.
Root Cause: Port mismatch in host-level TCP connectivity check. The checkHost() function extracts the port from the public URL (443 for HTTPS) but should be checking the actual backend forward_port (5690 for Wizarr).
Status: Identified - Fix Required
Detailed Analysis
1. Code Location
Primary Issue: backend/internal/services/uptime_service.go
- Function:
checkHost()(lines 359-402) - Logic Flow:
checkAllHosts()→checkHost()→CheckAll()→checkMonitor()
2. How Uptime Monitoring Works
Two-Level Check System
-
Host-Level Pre-Check (TCP connectivity)
- Runs first via
checkAllHosts()→checkHost() - Groups services by their backend
forward_host(e.g.,172.20.0.11) - Attempts TCP connection to determine if host is reachable
- If host is DOWN, marks all monitors on that host as down without checking individual services
- Runs first via
-
Service-Level Check (HTTP/HTTPS)
- Only runs if host-level check passes
- Performs actual HTTP GET to public URL
- Accepts 2xx, 3xx, 401, 403 as "up"
- Correctly handles redirects (302)
3. The Bug
In checkHost() at line 375:
for _, monitor := range monitors {
port := extractPort(monitor.URL) // Gets port from public URL
if port == "" {
continue
}
// Tries to connect using extracted port
addr := net.JoinHostPort(host.Host, port) // 172.20.0.11:443
conn, err := net.DialTimeout("tcp", addr, 5*time.Second)
Problem:
monitor.URLis the public URL:https://wizarr.hatfieldhosted.comextractPort()returns443(HTTPS default)- But Wizarr backend actually runs on
172.20.0.11:5690 - TCP connection to
172.20.0.11:443fails (no service listening) - Host marked as "down"
- All monitors on that host marked "down" without individual checks
4. Evidence from Logs and Database
Heartbeat Records (Most Recent First)
down|Host unreachable|0|2025-12-22 21:29:05
up|HTTP 200|64|2025-12-22 21:29:04
down|Host unreachable|0|2025-12-22 21:01:26
up|HTTP 200|47|2025-12-22 21:00:19
Pattern: Alternating between successful HTTP checks and host-level failures.
Database State
-- uptime_monitors
name: Wizarr
url: https://wizarr.hatfieldhosted.com
status: down
failure_count: 3
max_retries: 3
-- uptime_hosts
id: 0c764438-35ff-451f-822a-7297f39f39d4
name: Wizarr
host: 172.20.0.11
status: down ← This is causing the problem
-- proxy_hosts
name: Wizarr
domain_names: wizarr.hatfieldhosted.com
forward_host: 172.20.0.11
forward_port: 5690 ← This is the actual port!
Caddy Access Logs
Uptime check succeeds at HTTP level:
172.20.0.1 → GET / → 302 → /admin
172.20.0.1 → GET /admin → 302 → /login
172.20.0.1 → GET /login → 200 OK (16905 bytes)
5. Why Other Hosts Don't Have This Issue
Checking working hosts (using Radarr as example):
-- Radarr (working)
forward_host: 100.99.23.57
forward_port: 7878
url: https://radarr.hatfieldhosted.com
-- 302 redirect logic works correctly:
GET / → 302 → /login
Why it works: For services that redirect on root path, the HTTP check succeeds with 200-399 status codes. The port mismatch issue exists for all hosts, but:
- If the forward_port happens to be a standard port (80, 443, 8080) that the extractPort() function returns, it may work by coincidence
- If the host IP doesn't respond on that port, the TCP check fails
- Wizarr uses port 5690 - a non-standard port that extractPort() will never return
6. Additional Context
The uptime monitoring feature was recently enhanced with host-level grouping to:
- Reduce check overhead for multiple services on same host
- Provide consolidated DOWN notifications
- Avoid individual checks when host is unreachable
This is a good architectural decision, but the port extraction logic has a bug.
Root Cause Summary
The checkHost() function extracts the port from the monitor's public URL instead of using the actual backend forward_port from the proxy host configuration.
Why This Happens
UptimeMonitorstores the public URL (e.g.,https://wizarr.hatfieldhosted.com)UptimeHostonly stores theforward_hostIP, not the portcheckHost()tries to extract port from monitor URLs- For HTTPS URLs, it extracts 443
- Wizarr backend is on 172.20.0.11:5690, not :443
- TCP connection fails → host marked down → monitor marked down
Proposed Fixes
Option 1: Store Forward Port in UptimeHost (Recommended)
Changes Required:
-
Add
Portsfield toUptimeHostmodel:type UptimeHost struct { // ... existing fields Ports []int `json:"ports" gorm:"-"` // Not stored, computed on the fly } -
Modify
checkHost()to try all ports associated with monitors on that host:// Collect unique ports from all monitors for this host portSet := make(map[int]bool) for _, monitor := range monitors { if monitor.ProxyHostID != nil { var proxyHost models.ProxyHost if err := s.DB.First(&proxyHost, *monitor.ProxyHostID).Error; err == nil { portSet[proxyHost.ForwardPort] = true } } } // Try connecting to any of the ports success := false for port := range portSet { addr := net.JoinHostPort(host.Host, strconv.Itoa(port)) conn, err := net.DialTimeout("tcp", addr, 5*time.Second) // ... rest of logic }
Pros:
- Checks actual backend ports
- More accurate for non-standard ports
- Minimal schema changes
Cons:
- Requires database queries in check loop
- More complex logic
Option 2: Store ForwardPort Reference in UptimeMonitor
Changes Required:
-
Add
ForwardPortfield toUptimeMonitor:type UptimeMonitor struct { // ... existing fields ForwardPort int `json:"forward_port"` } -
Update
SyncMonitors()to populate it:monitor = models.UptimeMonitor{ // ... existing fields ForwardPort: host.ForwardPort, } -
Update
checkHost()to use stored forward port:for _, monitor := range monitors { port := monitor.ForwardPort if port == 0 { continue } addr := net.JoinHostPort(host.Host, strconv.Itoa(port)) // ... rest of logic }
Pros:
- Simple, no extra DB queries
- Forward port readily available
Cons:
- Schema migration required
- Duplication of data (port stored in both ProxyHost and UptimeMonitor)
Option 3: Skip Host-Level Check for Non-Standard Ports
Temporary workaround - not recommended for production.
Only perform host-level checks for monitors on standard ports (80, 443, 8080).
Option 4: Use ProxyHost Forward Port Directly (Simplest)
Changes Required:
Modify checkHost() to query the proxy host for each monitor to get the actual forward port:
// In checkHost(), replace the port extraction:
for _, monitor := range monitors {
var port int
if monitor.ProxyHostID != nil {
var proxyHost models.ProxyHost
if err := s.DB.First(&proxyHost, *monitor.ProxyHostID).Error; err == nil {
port = proxyHost.ForwardPort
}
} else {
// Fallback to URL extraction for non-proxy monitors
portStr := extractPort(monitor.URL)
if portStr != "" {
port, _ = strconv.Atoi(portStr)
}
}
if port == 0 {
continue
}
addr := net.JoinHostPort(host.Host, strconv.Itoa(port))
conn, err := net.DialTimeout("tcp", addr, 5*time.Second)
// ... rest of check
}
Pros:
- No schema changes
- Works immediately
- Handles both proxy hosts and standalone monitors
Cons:
- Database query in check loop (but monitors are already cached)
- Slight performance overhead
Recommended Solution
Option 4 (Use ProxyHost Forward Port Directly) is recommended because:
- No schema migration required
- Simple fix, easy to test
- Minimal performance impact (monitors already queried)
- Can be deployed immediately
- Handles edge cases (standalone monitors)
Testing Plan
- Unit Test: Add test case for non-standard port host check
- Integration Test:
- Create proxy host with non-standard forward port
- Verify host-level check uses correct port
- Verify monitor status updates correctly
- Manual Test:
- Apply fix
- Wait for next uptime check cycle (60 seconds)
- Verify Wizarr shows as "up"
- Verify no other monitors affected
Debugging Commands
# Check Wizarr monitor status
docker compose -f docker-compose.test.yml exec charon sh -c \
"sqlite3 /app/data/charon.db \"SELECT name, status, failure_count, url FROM uptime_monitors WHERE name = 'Wizarr';\""
# Check Wizarr host status
docker compose -f docker-compose.test.yml exec charon sh -c \
"sqlite3 /app/data/charon.db \"SELECT name, host, status FROM uptime_hosts WHERE name = 'Wizarr';\""
# Check recent heartbeats
docker compose -f docker-compose.test.yml exec charon sh -c \
"sqlite3 /app/data/charon.db \"SELECT status, message, created_at FROM uptime_heartbeats WHERE monitor_id = 'eed56336-e646-4cf5-a3fc-ac4d2dd8760e' ORDER BY created_at DESC LIMIT 5;\""
# Check Wizarr proxy host config
docker compose -f docker-compose.test.yml exec charon sh -c \
"sqlite3 /app/data/charon.db \"SELECT name, forward_host, forward_port FROM proxy_hosts WHERE name = 'Wizarr';\""
# Monitor real-time uptime checks in logs
docker compose -f docker-compose.test.yml logs -f charon | grep -i "wizarr\|uptime"
Related Files
backend/internal/services/uptime_service.go- Main uptime servicebackend/internal/models/uptime.go- UptimeMonitor modelbackend/internal/models/uptime_host.go- UptimeHost modelbackend/internal/services/uptime_service_test.go- Unit tests
References
- Issue created: 2025-12-23
- Related feature: Host-level uptime grouping
- Related PR: [Reference to ACL/permission changes if applicable]
Next Steps: Implement Option 4 fix and add test coverage for non-standard port scenarios.