Files

GitHub Actions 27c252600a chore: git cache cleanup

2026-03-04 18:34:49 +00:00

10 KiB

Raw Blame History

Uptime Monitoring Diagnosis: Wizarr Host False "Down" Status

Summary

Issue: Newly created Wizarr Proxy Host shows as "down" in uptime monitoring, despite the domain working correctly when accessed by users.

Root Cause: Port mismatch in host-level TCP connectivity check. The checkHost() function extracts the port from the public URL (443 for HTTPS) but should be checking the actual backend forward_port (5690 for Wizarr).

Status: Identified - Fix Required

Detailed Analysis

1. Code Location

Primary Issue: backend/internal/services/uptime_service.go

Function: checkHost() (lines 359-402)
Logic Flow: checkAllHosts() → checkHost() → CheckAll() → checkMonitor()

2. How Uptime Monitoring Works

Two-Level Check System

Host-Level Pre-Check (TCP connectivity)
- Runs first via checkAllHosts() → checkHost()
- Groups services by their backend forward_host (e.g., 172.20.0.11)
- Attempts TCP connection to determine if host is reachable
- If host is DOWN, marks all monitors on that host as down without checking individual services
Service-Level Check (HTTP/HTTPS)
- Only runs if host-level check passes
- Performs actual HTTP GET to public URL
- Accepts 2xx, 3xx, 401, 403 as "up"
- Correctly handles redirects (302)

3. The Bug

In checkHost() at line 375:

for _, monitor := range monitors {
    port := extractPort(monitor.URL)  // Gets port from public URL
    if port == "" {
        continue
    }

    // Tries to connect using extracted port
    addr := net.JoinHostPort(host.Host, port)  // 172.20.0.11:443
    conn, err := net.DialTimeout("tcp", addr, 5*time.Second)

Problem:

monitor.URL is the public URL: https://wizarr.hatfieldhosted.com
extractPort() returns 443 (HTTPS default)
But Wizarr backend actually runs on 172.20.0.11:5690
TCP connection to 172.20.0.11:443 fails (no service listening)
Host marked as "down"
All monitors on that host marked "down" without individual checks

4. Evidence from Logs and Database

Heartbeat Records (Most Recent First)

down|Host unreachable|0|2025-12-22 21:29:05
up|HTTP 200|64|2025-12-22 21:29:04
down|Host unreachable|0|2025-12-22 21:01:26
up|HTTP 200|47|2025-12-22 21:00:19

Pattern: Alternating between successful HTTP checks and host-level failures.

Database State

-- uptime_monitors
name: Wizarr
url: https://wizarr.hatfieldhosted.com
status: down
failure_count: 3
max_retries: 3

-- uptime_hosts
id: 0c764438-35ff-451f-822a-7297f39f39d4
name: Wizarr
host: 172.20.0.11
status: down  ← This is causing the problem

-- proxy_hosts
name: Wizarr
domain_names: wizarr.hatfieldhosted.com
forward_host: 172.20.0.11
forward_port: 5690  ← This is the actual port!

Caddy Access Logs

Uptime check succeeds at HTTP level:

172.20.0.1 → GET / → 302 → /admin
172.20.0.1 → GET /admin → 302 → /login
172.20.0.1 → GET /login → 200 OK (16905 bytes)

5. Why Other Hosts Don't Have This Issue

Checking working hosts (using Radarr as example):

-- Radarr (working)
forward_host: 100.99.23.57
forward_port: 7878
url: https://radarr.hatfieldhosted.com

-- 302 redirect logic works correctly:
GET / → 302 → /login

Why it works: For services that redirect on root path, the HTTP check succeeds with 200-399 status codes. The port mismatch issue exists for all hosts, but:

If the forward_port happens to be a standard port (80, 443, 8080) that the extractPort() function returns, it may work by coincidence
If the host IP doesn't respond on that port, the TCP check fails
Wizarr uses port 5690 - a non-standard port that extractPort() will never return

6. Additional Context

The uptime monitoring feature was recently enhanced with host-level grouping to:

Reduce check overhead for multiple services on same host
Provide consolidated DOWN notifications
Avoid individual checks when host is unreachable

This is a good architectural decision, but the port extraction logic has a bug.

Root Cause Summary

The checkHost() function extracts the port from the monitor's public URL instead of using the actual backend forward_port from the proxy host configuration.

Why This Happens

UptimeMonitor stores the public URL (e.g., https://wizarr.hatfieldhosted.com)
UptimeHost only stores the forward_host IP, not the port
checkHost() tries to extract port from monitor URLs
For HTTPS URLs, it extracts 443
Wizarr backend is on 172.20.0.11:5690, not :443
TCP connection fails → host marked down → monitor marked down

Proposed Fixes

Option 1: Store Forward Port in UptimeHost (Recommended)

Changes Required:

Add Ports field to UptimeHost model:

type UptimeHost struct {
    // ... existing fields
    Ports []int `json:"ports" gorm:"-"` // Not stored, computed on the fly
}

Modify checkHost() to try all ports associated with monitors on that host:

// Collect unique ports from all monitors for this host
portSet := make(map[int]bool)
for _, monitor := range monitors {
    if monitor.ProxyHostID != nil {
        var proxyHost models.ProxyHost
        if err := s.DB.First(&proxyHost, *monitor.ProxyHostID).Error; err == nil {
            portSet[proxyHost.ForwardPort] = true
        }
    }
}

// Try connecting to any of the ports
success := false
for port := range portSet {
    addr := net.JoinHostPort(host.Host, strconv.Itoa(port))
    conn, err := net.DialTimeout("tcp", addr, 5*time.Second)
    // ... rest of logic
}

Pros:

Checks actual backend ports
More accurate for non-standard ports
Minimal schema changes

Cons:

Requires database queries in check loop
More complex logic

Option 2: Store ForwardPort Reference in UptimeMonitor

Changes Required:

Add ForwardPort field to UptimeMonitor:

type UptimeMonitor struct {
    // ... existing fields
    ForwardPort int `json:"forward_port"`
}

Update SyncMonitors() to populate it:

monitor = models.UptimeMonitor{
    // ... existing fields
    ForwardPort: host.ForwardPort,
}

Update checkHost() to use stored forward port:

for _, monitor := range monitors {
    port := monitor.ForwardPort
    if port == 0 {
        continue
    }
    addr := net.JoinHostPort(host.Host, strconv.Itoa(port))
    // ... rest of logic
}

Pros:

Simple, no extra DB queries
Forward port readily available

Cons:

Schema migration required
Duplication of data (port stored in both ProxyHost and UptimeMonitor)

Option 3: Skip Host-Level Check for Non-Standard Ports

Temporary workaround - not recommended for production.

Only perform host-level checks for monitors on standard ports (80, 443, 8080).

Option 4: Use ProxyHost Forward Port Directly (Simplest)

Changes Required:

Modify checkHost() to query the proxy host for each monitor to get the actual forward port:

// In checkHost(), replace the port extraction:
for _, monitor := range monitors {
    var port int

    if monitor.ProxyHostID != nil {
        var proxyHost models.ProxyHost
        if err := s.DB.First(&proxyHost, *monitor.ProxyHostID).Error; err == nil {
            port = proxyHost.ForwardPort
        }
    } else {
        // Fallback to URL extraction for non-proxy monitors
        portStr := extractPort(monitor.URL)
        if portStr != "" {
            port, _ = strconv.Atoi(portStr)
        }
    }

    if port == 0 {
        continue
    }

    addr := net.JoinHostPort(host.Host, strconv.Itoa(port))
    conn, err := net.DialTimeout("tcp", addr, 5*time.Second)
    // ... rest of check
}

Pros:

No schema changes
Works immediately
Handles both proxy hosts and standalone monitors

Cons:

Database query in check loop (but monitors are already cached)
Slight performance overhead

Testing Plan

Unit Test: Add test case for non-standard port host check
Integration Test:
- Create proxy host with non-standard forward port
- Verify host-level check uses correct port
- Verify monitor status updates correctly
Manual Test:
- Apply fix
- Wait for next uptime check cycle (60 seconds)
- Verify Wizarr shows as "up"
- Verify no other monitors affected

Debugging Commands

# Check Wizarr monitor status
docker compose -f docker-compose.test.yml exec charon sh -c \
  "sqlite3 /app/data/charon.db \"SELECT name, status, failure_count, url FROM uptime_monitors WHERE name = 'Wizarr';\""

# Check Wizarr host status
docker compose -f docker-compose.test.yml exec charon sh -c \
  "sqlite3 /app/data/charon.db \"SELECT name, host, status FROM uptime_hosts WHERE name = 'Wizarr';\""

# Check recent heartbeats
docker compose -f docker-compose.test.yml exec charon sh -c \
  "sqlite3 /app/data/charon.db \"SELECT status, message, created_at FROM uptime_heartbeats WHERE monitor_id = 'eed56336-e646-4cf5-a3fc-ac4d2dd8760e' ORDER BY created_at DESC LIMIT 5;\""

# Check Wizarr proxy host config
docker compose -f docker-compose.test.yml exec charon sh -c \
  "sqlite3 /app/data/charon.db \"SELECT name, forward_host, forward_port FROM proxy_hosts WHERE name = 'Wizarr';\""

# Monitor real-time uptime checks in logs
docker compose -f docker-compose.test.yml logs -f charon | grep -i "wizarr\|uptime"

backend/internal/services/uptime_service.go - Main uptime service
backend/internal/models/uptime.go - UptimeMonitor model
backend/internal/models/uptime_host.go - UptimeHost model
backend/internal/services/uptime_service_test.go - Unit tests

References

Issue created: 2025-12-23
Related feature: Host-level uptime grouping
Related PR: [Reference to ACL/permission changes if applicable]

Next Steps: Implement Option 4 fix and add test coverage for non-standard port scenarios.

10 KiB Raw Blame History

Uptime Monitoring Diagnosis: Wizarr Host False "Down" Status

Summary

Detailed Analysis

1. Code Location

2. How Uptime Monitoring Works

Two-Level Check System

3. The Bug

4. Evidence from Logs and Database

Heartbeat Records (Most Recent First)

Database State

Caddy Access Logs

5. Why Other Hosts Don't Have This Issue

6. Additional Context

Root Cause Summary

Why This Happens

Proposed Fixes

Option 1: Store Forward Port in UptimeHost (Recommended)

Option 2: Store ForwardPort Reference in UptimeMonitor

Option 3: Skip Host-Level Check for Non-Standard Ports

Option 4: Use ProxyHost Forward Port Directly (Simplest)

Recommended Solution

Testing Plan

Debugging Commands

Related Files

References

10 KiB

Raw Blame History