Complete lint remediation addressing errcheck, gosec, and staticcheck violations across backend test files. Tighten pre-commit configuration to prevent future blind spots. Key Changes: - Fix 61 Go linting issues (errcheck, gosec G115/G301/G304/G306, bodyclose) - Add proper error handling for json.Unmarshal, os.Setenv, db.Close(), w.Write() - Fix gosec G115 integer overflow with strconv.FormatUint - Add #nosec annotations with justifications for test fixtures - Fix SecurityService goroutine leaks (add Close() calls) - Fix CrowdSec tar.gz non-deterministic ordering with sorted keys Pre-commit Hardening: - Remove test file exclusion from golangci-lint hook - Add gosec to .golangci-fast.yml with critical checks (G101, G110, G305) - Replace broad .golangci.yml exclusions with targeted path-specific rules - Test files now linted on every commit Test Fixes: - Fix emergency route count assertions (1→2 for dual-port setup) - Fix DNS provider service tests with proper mock setup - Fix certificate service tests with deterministic behavior Backend: 27 packages pass, 83.5% coverage Frontend: 0 lint warnings, 0 TypeScript errors Pre-commit: All 14 hooks pass (~37s)
1275 lines
39 KiB
Markdown
1275 lines
39 KiB
Markdown
# Lint Remediation & Monitoring Plan
|
||
|
||
**Status:** Planning
|
||
**Created:** 2026-02-02
|
||
**Target Completion:** 2026-02-03
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
This plan addresses 40 Go linting issues (18 errcheck, 22 gosec warnings from `full_lint_output.txt`), 6 TypeScript warnings, and establishes monitoring for retry attempt frequency to ensure it remains below 5%.
|
||
|
||
### Goals
|
||
|
||
1. **Go Linting:** Fix all 40 reported issues (18 errcheck, 22 gosec)
|
||
2. **TypeScript:** Resolve 6 ESLint warnings (no-explicit-any, no-unused-vars)
|
||
3. **Monitoring:** Implement retry attempt frequency tracking (<5% threshold)
|
||
|
||
---
|
||
|
||
## Research Findings
|
||
|
||
### 1. Go Linting Issues (40 total from full_lint_output.txt)
|
||
|
||
**Source Files:**
|
||
- `backend/final_lint.txt` (34 issues - subset)
|
||
- `backend/full_lint_output.txt` (40 issues - complete list)
|
||
|
||
#### 1.1 Errcheck Issues (18 total)
|
||
|
||
**Category A: Unchecked json.Unmarshal in Tests (6)**
|
||
|
||
| File | Line | Issue |
|
||
|------|------|-------|
|
||
| `internal/api/handlers/security_handler_audit_test.go` | 581 | `json.Unmarshal(w.Body.Bytes(), &resp)` |
|
||
| `internal/api/handlers/security_handler_coverage_test.go` | 525, 589 | `json.Unmarshal(w.Body.Bytes(), &resp)` (2 locations) |
|
||
| `internal/api/handlers/settings_handler_test.go` | 895, 923, 1081 | `json.Unmarshal(w.Body.Bytes(), &resp)` (3 locations) |
|
||
|
||
**Root Cause:** Test code not checking JSON unmarshaling errors
|
||
**Impact:** Tests may pass with invalid JSON responses, false positives
|
||
**Fix:** Add error checking: `require.NoError(t, json.Unmarshal(...))`
|
||
|
||
**Category B: Unchecked Environment Variable Operations (11)**
|
||
|
||
| File | Line | Issue |
|
||
|------|------|-------|
|
||
| `internal/caddy/config_test.go` | 1794 | `os.Unsetenv(v)` |
|
||
| `internal/config/config_test.go` | 56, 57, 72, 74, 75, 82 | `os.Setenv(...)` (6 instances) |
|
||
| `internal/config/config_test.go` | 157, 158, 159, 175, 196 | `os.Unsetenv(...)` (5 instances total) |
|
||
|
||
**Root Cause:** Environment variable setup/cleanup without error handling
|
||
**Impact:** Test isolation failures, flaky tests
|
||
**Fix:** Wrap with `require.NoError(t, os.Setenv/Unsetenv(...))`
|
||
|
||
**Category C: Unchecked Database Close Operations (4)**
|
||
|
||
| File | Line | Issue |
|
||
|------|------|-------|
|
||
| `internal/services/dns_provider_service_test.go` | 1446, 1466, 1493, 1531, 1549 | `sqlDB.Close()` (4 locations) |
|
||
| `internal/database/errors_test.go` | 230 | `sqlDB.Close()` |
|
||
|
||
**Root Cause:** Resource cleanup without error handling
|
||
**Impact:** Resource leaks in tests
|
||
**Fix:** `defer func() { _ = sqlDB.Close() }()` or explicit error check
|
||
|
||
**Category D: Unchecked w.Write in Tests (3)**
|
||
|
||
| File | Line | Issue |
|
||
|------|------|-------|
|
||
| `internal/caddy/manager_additional_test.go` | 1467, 1522 | `w.Write([]byte(...))` (2 locations) |
|
||
| `internal/caddy/manager_test.go` | 133 | `w.Write([]byte(...))` |
|
||
|
||
**Root Cause:** HTTP response writing without error handling
|
||
**Impact:** Silent failures in mock HTTP servers
|
||
**Fix:** `_, _ = w.Write(...)` or check error if critical
|
||
|
||
**Category E: Unchecked db.AutoMigrate in Tests (3)**
|
||
|
||
| File | Line | Issue |
|
||
|------|------|-------|
|
||
| `internal/api/handlers/notification_coverage_test.go` | 22 | `db.AutoMigrate(...)` |
|
||
| `internal/api/handlers/pr_coverage_test.go` | 404, 438 | `db.AutoMigrate(...)` (2 locations) |
|
||
|
||
**Root Cause:** Database schema migration without error handling
|
||
**Impact:** Tests may run with incorrect schema
|
||
**Fix:** `require.NoError(t, db.AutoMigrate(...))`
|
||
|
||
#### 1.2 Gosec Security Issues (22 total - unchanged from final_lint.txt)
|
||
|
||
*(Same 22 gosec issues as documented in final_lint.txt)*
|
||
|
||
### 2. TypeScript Linting Issues (6 warnings - unchanged)
|
||
|
||
*(Same 6 ESLint warnings as documented earlier)*
|
||
|
||
### 3. Retry Monitoring Analysis
|
||
|
||
**Current State:**
|
||
|
||
**Retry Logic Location:** `backend/internal/services/uptime_service.go`
|
||
|
||
**Configuration:**
|
||
- `MaxRetries` in `UptimeServiceConfig` (default: 2)
|
||
- `MaxRetries` in `models.UptimeMonitor` (default: 3)
|
||
|
||
**Current Behavior:**
|
||
```go
|
||
for retry := 0; retry <= s.config.MaxRetries && !success; retry++ {
|
||
if retry > 0 {
|
||
logger.Log().Info("Retrying TCP check")
|
||
}
|
||
// Try connection...
|
||
}
|
||
```
|
||
|
||
**Metrics Gaps:**
|
||
- No retry frequency tracking
|
||
- No alerting on excessive retries
|
||
- No historical data for analysis
|
||
|
||
**Requirements:**
|
||
- Track retry attempts vs first-try successes
|
||
- Alert if retry rate >5% over rolling 1000 checks
|
||
- Expose Prometheus metrics for dashboarding
|
||
|
||
---
|
||
|
||
## Technical Specifications
|
||
|
||
### Phase 1: Backend Go Linting Fixes
|
||
|
||
#### 1.1 Errcheck Fixes (18 issues)
|
||
|
||
**JSON Unmarshal (6 fixes):**
|
||
|
||
```go
|
||
// Pattern to apply across 6 locations
|
||
// BEFORE:
|
||
json.Unmarshal(w.Body.Bytes(), &resp)
|
||
|
||
// AFTER:
|
||
err := json.Unmarshal(w.Body.Bytes(), &resp)
|
||
require.NoError(t, err, "Failed to unmarshal response")
|
||
```
|
||
|
||
**Files:**
|
||
- `internal/api/handlers/security_handler_audit_test.go:581`
|
||
- `internal/api/handlers/security_handler_coverage_test.go:525, 589`
|
||
- `internal/api/handlers/settings_handler_test.go:895, 923, 1081`
|
||
|
||
**Environment Variables (11 fixes):**
|
||
|
||
```go
|
||
// BEFORE:
|
||
os.Setenv("VAR_NAME", "value")
|
||
|
||
// AFTER:
|
||
require.NoError(t, os.Setenv("VAR_NAME", "value"))
|
||
```
|
||
|
||
**Files:**
|
||
- `internal/config/config_test.go:56, 57, 72, 74, 75, 82, 157, 158, 159, 175, 196`
|
||
- `internal/caddy/config_test.go:1794`
|
||
|
||
**Database Close (4 fixes):**
|
||
|
||
```go
|
||
// BEFORE:
|
||
sqlDB.Close()
|
||
|
||
// AFTER (Pattern 1 - Immediate cleanup with error reporting):
|
||
if err := sqlDB.Close(); err != nil {
|
||
t.Errorf("Failed to close database connection: %v", err)
|
||
}
|
||
|
||
// AFTER (Pattern 2 - Deferred cleanup with error reporting):
|
||
defer func() {
|
||
if err := sqlDB.Close(); err != nil {
|
||
t.Errorf("Failed to close database connection: %v", err)
|
||
}
|
||
}()
|
||
```
|
||
|
||
**Rationale:**
|
||
- Tests must report resource cleanup failures for debugging
|
||
- Using `_` silences legitimate errors that could indicate resource leaks
|
||
- `t.Errorf` doesn't stop test execution but records the failure
|
||
- Pattern 1 for immediate cleanup (end of test)
|
||
- Pattern 2 for deferred cleanup (start of test)
|
||
|
||
**Files:**
|
||
- `internal/services/dns_provider_service_test.go:1446, 1466, 1493, 1531, 1549`
|
||
- `internal/database/errors_test.go:230`
|
||
|
||
**HTTP Write (3 fixes):**
|
||
|
||
```go
|
||
// BEFORE:
|
||
w.Write([]byte(`{"data": "value"}`))
|
||
|
||
// AFTER (Enhanced with error handling):
|
||
if _, err := w.Write([]byte(`{"data": "value"}`)); err != nil {
|
||
t.Errorf("Failed to write HTTP response: %v", err)
|
||
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
|
||
return
|
||
}
|
||
```
|
||
|
||
**Rationale:**
|
||
- Mock servers should fail fast on write errors to avoid misleading test results
|
||
- `http.Error` ensures client sees error response, not partial data
|
||
- Early return prevents further processing with invalid state
|
||
- Critical for tests that validate response content
|
||
|
||
**Files:**
|
||
- `internal/caddy/manager_additional_test.go:1467, 1522`
|
||
- `internal/caddy/manager_test.go:133`
|
||
|
||
**AutoMigrate (3 fixes):**
|
||
|
||
```go
|
||
// BEFORE:
|
||
db.AutoMigrate(&models.Model{})
|
||
|
||
// AFTER:
|
||
require.NoError(t, db.AutoMigrate(&models.Model{}))
|
||
```
|
||
|
||
**Files:**
|
||
- `internal/api/handlers/notification_coverage_test.go:22`
|
||
- `internal/api/handlers/pr_coverage_test.go:404, 438`
|
||
|
||
#### 1.2 Gosec Security Fixes (22 issues)
|
||
|
||
**G101: Hardcoded Credentials (1 issue)**
|
||
|
||
**Location:** Test fixtures containing example API tokens
|
||
|
||
```go
|
||
// BEFORE:
|
||
apiKey := "sk_test_1234567890abcdef"
|
||
|
||
// AFTER:
|
||
// #nosec G101 -- Test fixture with non-functional API key for validation testing
|
||
apiKey := "sk_test_1234567890abcdef"
|
||
```
|
||
|
||
**Security Analysis:**
|
||
- **Risk Level:** LOW (test-only code)
|
||
- **Validation:** Verify value is non-functional, documented as test fixture
|
||
- **Impact:** None if properly annotated, prevents false positives
|
||
|
||
---
|
||
|
||
**G110: Decompression Bomb (2 issues)**
|
||
|
||
**Locations:**
|
||
- `internal/crowdsec/hub_cache.go`
|
||
- `internal/crowdsec/hub_sync.go`
|
||
|
||
```go
|
||
// BEFORE:
|
||
reader, err := gzip.NewReader(resp.Body)
|
||
if err != nil {
|
||
return err
|
||
}
|
||
defer reader.Close()
|
||
io.Copy(dest, reader) // Unbounded read
|
||
|
||
// AFTER:
|
||
const maxDecompressedSize = 100 * 1024 * 1024 // 100MB limit
|
||
|
||
reader, err := gzip.NewReader(resp.Body)
|
||
if err != nil {
|
||
return fmt.Errorf("gzip reader init failed: %w", err)
|
||
}
|
||
defer reader.Close()
|
||
|
||
// Limit decompressed size to prevent decompression bombs
|
||
limitedReader := io.LimitReader(reader, maxDecompressedSize)
|
||
written, err := io.Copy(dest, limitedReader)
|
||
if err != nil {
|
||
return fmt.Errorf("decompression failed: %w", err)
|
||
}
|
||
|
||
// Verify we didn't hit the limit (which would indicate potential attack)
|
||
if written >= maxDecompressedSize {
|
||
return fmt.Errorf("decompression size exceeded limit (%d bytes), potential decompression bomb", maxDecompressedSize)
|
||
}
|
||
```
|
||
|
||
**Security Analysis:**
|
||
- **Risk Level:** HIGH (remote code execution vector)
|
||
- **Attack Vector:** Malicious CrowdSec hub response with crafted gzip bomb
|
||
- **Mitigation:**
|
||
- Hard limit at 100MB (CrowdSec hub files are typically <10MB)
|
||
- Early termination on limit breach
|
||
- Error returned prevents further processing
|
||
- **Impact:** Prevents memory exhaustion DoS attacks
|
||
|
||
---
|
||
|
||
**G305: File Traversal (1 issue)**
|
||
|
||
**Location:** File path handling in backup/restore operations
|
||
|
||
```go
|
||
// BEFORE:
|
||
filePath := filepath.Join(baseDir, userInput)
|
||
file, err := os.Open(filePath)
|
||
|
||
// AFTER:
|
||
// Sanitize and validate file path to prevent directory traversal
|
||
func SafeJoinPath(baseDir, userPath string) (string, error) {
|
||
// Clean the user-provided path
|
||
cleanPath := filepath.Clean(userPath)
|
||
|
||
// Reject absolute paths and parent directory references
|
||
if filepath.IsAbs(cleanPath) {
|
||
return "", fmt.Errorf("absolute paths not allowed: %s", cleanPath)
|
||
}
|
||
if strings.Contains(cleanPath, "..") {
|
||
return "", fmt.Errorf("parent directory traversal not allowed: %s", cleanPath)
|
||
}
|
||
|
||
// Join with base directory
|
||
fullPath := filepath.Join(baseDir, cleanPath)
|
||
|
||
// Verify the resolved path is still within base directory
|
||
absBase, err := filepath.Abs(baseDir)
|
||
if err != nil {
|
||
return "", fmt.Errorf("failed to resolve base directory: %w", err)
|
||
}
|
||
|
||
absPath, err := filepath.Abs(fullPath)
|
||
if err != nil {
|
||
return "", fmt.Errorf("failed to resolve file path: %w", err)
|
||
}
|
||
|
||
if !strings.HasPrefix(absPath, absBase) {
|
||
return "", fmt.Errorf("path escape attempt detected: %s", userPath)
|
||
}
|
||
|
||
return fullPath, nil
|
||
}
|
||
|
||
// Usage:
|
||
safePath, err := SafeJoinPath(baseDir, userInput)
|
||
if err != nil {
|
||
return fmt.Errorf("invalid file path: %w", err)
|
||
}
|
||
file, err := os.Open(safePath)
|
||
```
|
||
|
||
**Security Analysis:**
|
||
- **Risk Level:** CRITICAL (arbitrary file read/write)
|
||
- **Attack Vectors:**
|
||
- `../../etc/passwd` - Read sensitive system files
|
||
- `../../../root/.ssh/id_rsa` - Steal credentials
|
||
- Symlink attacks to escape sandbox
|
||
- **Mitigation:**
|
||
- Reject absolute paths
|
||
- Block `..` sequences
|
||
- Verify resolved path stays within base directory
|
||
- Works even with symlinks (uses `filepath.Abs`)
|
||
- **Impact:** Prevents unauthorized file system access
|
||
|
||
---
|
||
|
||
**G306/G302: File Permissions (8 issues)**
|
||
|
||
**Permission Security Matrix:**
|
||
|
||
| Permission | Octal | Use Case | Justification |
|
||
|------------|-------|----------|---------------|
|
||
| **0600** | rw------- | SQLite database files, private keys | Contains sensitive data; only process owner needs access |
|
||
| **0640** | rw-r----- | Log files, config files | Owner writes, group reads for monitoring/debugging |
|
||
| **0644** | rw-r--r-- | Public config templates, documentation | World-readable reference data, no sensitive content |
|
||
| **0700** | rwx------ | Backup directories, data directories | Process-owned workspace, no group/world access needed |
|
||
| **0750** | rwxr-x--- | Binary directories, script directories | Owner manages, group executes; prevents tampering |
|
||
|
||
**Implementation Pattern:**
|
||
|
||
```go
|
||
// BEFORE:
|
||
os.OpenFile(path, os.O_CREATE|os.O_WRONLY, 0644) // Too permissive for sensitive data
|
||
os.MkdirAll(path, 0755) // Too permissive for private directories
|
||
|
||
// AFTER - Database files (0600):
|
||
// Rationale: Contains user credentials, tokens, PII
|
||
// Risk if compromised: Full system access, credential theft
|
||
os.OpenFile(dbPath, os.O_CREATE|os.O_WRONLY, 0600)
|
||
|
||
// AFTER - Log files (0640):
|
||
// Rationale: Monitoring tools run in same group, need read access
|
||
// Risk if compromised: Information disclosure, system reconnaissance
|
||
os.OpenFile(logPath, os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0640)
|
||
|
||
// AFTER - Backup directories (0700):
|
||
// Rationale: Contains complete database dumps with sensitive data
|
||
// Risk if compromised: Mass data exfiltration
|
||
os.MkdirAll(backupDir, 0700)
|
||
|
||
// AFTER - Config templates (0644):
|
||
// Rationale: Reference documentation, no secrets or user data
|
||
// Risk if compromised: None (public information)
|
||
os.OpenFile(tplPath, os.O_CREATE|os.O_RDONLY, 0644)
|
||
```
|
||
|
||
**Security Analysis by File Type:**
|
||
|
||
| File Type | Current | Required | Risk If Wrong | Affected Files |
|
||
|-----------|---------|----------|---------------|----------------|
|
||
| SQLite DB | 0644 | **0600** | Credential theft | `internal/database/*.go` |
|
||
| Backup tar | 0644 | **0600** | Mass data leak | `internal/services/backup_service.go` |
|
||
| Data dirs | 0755 | **0700** | Unauthorized writes | `internal/config/config.go` |
|
||
| Log files | 0644 | **0640** | Info disclosure | `internal/caddy/config.go` |
|
||
| Test temp | 0777 | **0700** | Test pollution | `*_test.go` files |
|
||
|
||
**Files Requiring Updates (8 total):**
|
||
1. `cmd/seed/seed_smoke_test.go` - Test DB files (0600)
|
||
2. `internal/caddy/config.go` - Log files (0640)
|
||
3. `internal/config/config.go` - Data dirs (0700), DB files (0600)
|
||
4. `internal/database/database_test.go` - Test DB (0600)
|
||
5. `internal/services/backup_service.go` - Backup files (0600)
|
||
6. `internal/services/backup_service_test.go` - Test backups (0600)
|
||
7. `internal/services/uptime_service_test.go` - Test DB (0600)
|
||
8. `internal/util/crypto_test.go` - Test temp files (0600)
|
||
|
||
---
|
||
|
||
**G115: Integer Overflow (3 issues)**
|
||
|
||
```go
|
||
// BEFORE:
|
||
intValue := int(int64Value) // Unchecked conversion
|
||
|
||
// AFTER:
|
||
import "math"
|
||
|
||
func SafeInt64ToInt(val int64) (int, error) {
|
||
if val > math.MaxInt || val < math.MinInt {
|
||
return 0, fmt.Errorf("integer overflow: value %d exceeds int range", val)
|
||
}
|
||
return int(val), nil
|
||
}
|
||
|
||
// Usage:
|
||
intValue, err := SafeInt64ToInt(int64Value)
|
||
if err != nil {
|
||
return fmt.Errorf("invalid integer value: %w", err)
|
||
}
|
||
```
|
||
|
||
**Security Analysis:**
|
||
- **Risk Level:** MEDIUM (logic errors, potential bypass)
|
||
- **Impact:** Array bounds violations, incorrect calculations
|
||
- **Affected:** Timeout values, retry counts, array indices
|
||
|
||
---
|
||
|
||
**G304: File Inclusion (3 issues)**
|
||
|
||
```go
|
||
// BEFORE:
|
||
content, err := ioutil.ReadFile(userInput) // Arbitrary file read
|
||
|
||
// AFTER:
|
||
// Use SafeJoinPath from G305 fix above
|
||
safePath, err := SafeJoinPath(allowedDir, userInput)
|
||
if err != nil {
|
||
return fmt.Errorf("invalid file path: %w", err)
|
||
}
|
||
|
||
// Additional validation: Check file extension whitelist
|
||
allowedExts := map[string]bool{".json": true, ".yaml": true, ".yml": true}
|
||
ext := filepath.Ext(safePath)
|
||
if !allowedExts[ext] {
|
||
return fmt.Errorf("file type not allowed: %s", ext)
|
||
}
|
||
|
||
content, err := os.ReadFile(safePath)
|
||
```
|
||
|
||
**Security Analysis:**
|
||
- **Risk Level:** HIGH (arbitrary file read)
|
||
- **Mitigation:** Path validation + extension whitelist
|
||
- **Impact:** Limits read access to configuration files only
|
||
|
||
---
|
||
|
||
**G404: Weak Random (Informational)**
|
||
|
||
*(Using crypto/rand for security-sensitive operations, math/rand for non-security randomness - no changes needed)*
|
||
|
||
### Phase 2: Frontend TypeScript Linting Fixes (6 warnings)
|
||
|
||
*(Apply the same 6 TypeScript fixes as documented in the original plan)*
|
||
|
||
### Phase 3: Retry Monitoring Implementation
|
||
|
||
#### 3.1 Data Model & Persistence
|
||
|
||
**Database Schema Extensions:**
|
||
|
||
```go
|
||
// Add to models/uptime_monitor.go
|
||
type UptimeMonitor struct {
|
||
// ... existing fields ...
|
||
|
||
// Retry statistics (new fields)
|
||
TotalChecks uint64 `gorm:"default:0" json:"total_checks"`
|
||
RetryAttempts uint64 `gorm:"default:0" json:"retry_attempts"`
|
||
RetryRate float64 `gorm:"-" json:"retry_rate"` // Computed field
|
||
LastRetryAt time.Time `json:"last_retry_at,omitempty"`
|
||
}
|
||
|
||
// Add computed field method
|
||
func (m *UptimeMonitor) CalculateRetryRate() float64 {
|
||
if m.TotalChecks == 0 {
|
||
return 0.0
|
||
}
|
||
return float64(m.RetryAttempts) / float64(m.TotalChecks) * 100.0
|
||
}
|
||
```
|
||
|
||
**Migration:**
|
||
```sql
|
||
-- Add retry tracking columns
|
||
ALTER TABLE uptime_monitors ADD COLUMN total_checks INTEGER DEFAULT 0;
|
||
ALTER TABLE uptime_monitors ADD COLUMN retry_attempts INTEGER DEFAULT 0;
|
||
ALTER TABLE uptime_monitors ADD COLUMN last_retry_at DATETIME;
|
||
```
|
||
|
||
#### 3.2 Thread-Safe Metrics Collection
|
||
|
||
**New File: `backend/internal/metrics/uptime_metrics.go`**
|
||
|
||
```go
|
||
package metrics
|
||
|
||
import (
|
||
"sync"
|
||
"time"
|
||
|
||
"github.com/prometheus/client_golang/prometheus"
|
||
"github.com/prometheus/client_golang/prometheus/promauto"
|
||
)
|
||
|
||
// UptimeMetrics provides thread-safe retry tracking
|
||
type UptimeMetrics struct {
|
||
mu sync.RWMutex
|
||
|
||
// Per-monitor statistics
|
||
monitorStats map[uint]*MonitorStats
|
||
|
||
// Prometheus metrics
|
||
checksTotal *prometheus.CounterVec
|
||
retriesTotal *prometheus.CounterVec
|
||
retryRate *prometheus.GaugeVec
|
||
}
|
||
|
||
type MonitorStats struct {
|
||
TotalChecks uint64
|
||
RetryAttempts uint64
|
||
LastRetryAt time.Time
|
||
}
|
||
|
||
// Global instance
|
||
var (
|
||
once sync.Once
|
||
instance *UptimeMetrics
|
||
)
|
||
|
||
// GetMetrics returns singleton instance
|
||
func GetMetrics() *UptimeMetrics {
|
||
once.Do(func() {
|
||
instance = &UptimeMetrics{
|
||
monitorStats: make(map[uint]*MonitorStats),
|
||
checksTotal: promauto.NewCounterVec(
|
||
prometheus.CounterOpts{
|
||
Name: "charon_uptime_checks_total",
|
||
Help: "Total number of uptime checks performed",
|
||
},
|
||
[]string{"monitor_id", "monitor_name", "check_type"},
|
||
),
|
||
retriesTotal: promauto.NewCounterVec(
|
||
prometheus.CounterOpts{
|
||
Name: "charon_uptime_retries_total",
|
||
Help: "Total number of retry attempts",
|
||
},
|
||
[]string{"monitor_id", "monitor_name", "check_type"},
|
||
),
|
||
retryRate: promauto.NewGaugeVec(
|
||
prometheus.GaugeOpts{
|
||
Name: "charon_uptime_retry_rate_percent",
|
||
Help: "Percentage of checks requiring retries (over last 1000 checks)",
|
||
},
|
||
[]string{"monitor_id", "monitor_name"},
|
||
),
|
||
}
|
||
})
|
||
return instance
|
||
}
|
||
|
||
// RecordCheck records a successful first-try check
|
||
func (m *UptimeMetrics) RecordCheck(monitorID uint, monitorName, checkType string) {
|
||
m.mu.Lock()
|
||
defer m.mu.Unlock()
|
||
|
||
if _, exists := m.monitorStats[monitorID]; !exists {
|
||
m.monitorStats[monitorID] = &MonitorStats{}
|
||
}
|
||
|
||
m.monitorStats[monitorID].TotalChecks++
|
||
|
||
// Update Prometheus counter
|
||
m.checksTotal.WithLabelValues(
|
||
fmt.Sprintf("%d", monitorID),
|
||
monitorName,
|
||
checkType,
|
||
).Inc()
|
||
|
||
// Update retry rate gauge
|
||
m.updateRetryRate(monitorID, monitorName)
|
||
}
|
||
|
||
// RecordRetry records a retry attempt
|
||
func (m *UptimeMetrics) RecordRetry(monitorID uint, monitorName, checkType string) {
|
||
m.mu.Lock()
|
||
defer m.mu.Unlock()
|
||
|
||
if _, exists := m.monitorStats[monitorID]; !exists {
|
||
m.monitorStats[monitorID] = &MonitorStats{}
|
||
}
|
||
|
||
stats := m.monitorStats[monitorID]
|
||
stats.RetryAttempts++
|
||
stats.LastRetryAt = time.Now()
|
||
|
||
// Update Prometheus counter
|
||
m.retriesTotal.WithLabelValues(
|
||
fmt.Sprintf("%d", monitorID),
|
||
monitorName,
|
||
checkType,
|
||
).Inc()
|
||
|
||
// Update retry rate gauge
|
||
m.updateRetryRate(monitorID, monitorName)
|
||
}
|
||
|
||
// updateRetryRate calculates and updates the retry rate gauge
|
||
func (m *UptimeMetrics) updateRetryRate(monitorID uint, monitorName string) {
|
||
stats := m.monitorStats[monitorID]
|
||
if stats.TotalChecks == 0 {
|
||
return
|
||
}
|
||
|
||
rate := float64(stats.RetryAttempts) / float64(stats.TotalChecks) * 100.0
|
||
|
||
m.retryRate.WithLabelValues(
|
||
fmt.Sprintf("%d", monitorID),
|
||
monitorName,
|
||
).Set(rate)
|
||
}
|
||
|
||
// GetStats returns current statistics (thread-safe)
|
||
func (m *UptimeMetrics) GetStats(monitorID uint) *MonitorStats {
|
||
m.mu.RLock()
|
||
defer m.mu.RUnlock()
|
||
|
||
if stats, exists := m.monitorStats[monitorID]; exists {
|
||
// Return a copy to prevent mutation
|
||
return &MonitorStats{
|
||
TotalChecks: stats.TotalChecks,
|
||
RetryAttempts: stats.RetryAttempts,
|
||
LastRetryAt: stats.LastRetryAt,
|
||
}
|
||
}
|
||
return nil
|
||
}
|
||
|
||
// GetAllStats returns all monitor statistics
|
||
func (m *UptimeMetrics) GetAllStats() map[uint]*MonitorStats {
|
||
m.mu.RLock()
|
||
defer m.mu.RUnlock()
|
||
|
||
// Return deep copy
|
||
result := make(map[uint]*MonitorStats)
|
||
for id, stats := range m.monitorStats {
|
||
result[id] = &MonitorStats{
|
||
TotalChecks: stats.TotalChecks,
|
||
RetryAttempts: stats.RetryAttempts,
|
||
LastRetryAt: stats.LastRetryAt,
|
||
}
|
||
}
|
||
return result
|
||
}
|
||
```
|
||
|
||
#### 3.3 Integration with Uptime Service
|
||
|
||
**Update: `backend/internal/services/uptime_service.go`**
|
||
|
||
```go
|
||
import "github.com/yourusername/charon/internal/metrics"
|
||
|
||
func (s *UptimeService) performCheck(monitor *models.UptimeMonitor) error {
|
||
metrics := metrics.GetMetrics()
|
||
success := false
|
||
|
||
for retry := 0; retry <= s.config.MaxRetries && !success; retry++ {
|
||
if retry > 0 {
|
||
// Record retry attempt
|
||
metrics.RecordRetry(
|
||
monitor.ID,
|
||
monitor.Name,
|
||
string(monitor.Type),
|
||
)
|
||
logger.Log().Info("Retrying check",
|
||
zap.Uint("monitor_id", monitor.ID),
|
||
zap.Int("attempt", retry))
|
||
}
|
||
|
||
// Perform actual check
|
||
var err error
|
||
switch monitor.Type {
|
||
case models.HTTPMonitor:
|
||
err = s.checkHTTP(monitor)
|
||
case models.TCPMonitor:
|
||
err = s.checkTCP(monitor)
|
||
// ... other check types
|
||
}
|
||
|
||
if err == nil {
|
||
success = true
|
||
// Record successful check
|
||
metrics.RecordCheck(
|
||
monitor.ID,
|
||
monitor.Name,
|
||
string(monitor.Type),
|
||
)
|
||
}
|
||
}
|
||
|
||
return nil
|
||
}
|
||
```
|
||
|
||
#### 3.4 API Endpoint for Statistics
|
||
|
||
**New Handler: `backend/internal/api/handlers/uptime_stats_handler.go`**
|
||
|
||
```go
|
||
package handlers
|
||
|
||
import (
|
||
"net/http"
|
||
"github.com/gin-gonic/gin"
|
||
"github.com/yourusername/charon/internal/metrics"
|
||
"github.com/yourusername/charon/internal/models"
|
||
)
|
||
|
||
type UptimeStatsResponse struct {
|
||
MonitorID uint `json:"monitor_id"`
|
||
MonitorName string `json:"monitor_name"`
|
||
TotalChecks uint64 `json:"total_checks"`
|
||
RetryAttempts uint64 `json:"retry_attempts"`
|
||
RetryRate float64 `json:"retry_rate_percent"`
|
||
LastRetryAt string `json:"last_retry_at,omitempty"`
|
||
Status string `json:"status"` // "healthy" or "warning"
|
||
}
|
||
|
||
func GetUptimeStats(c *gin.Context) {
|
||
m := metrics.GetMetrics()
|
||
allStats := m.GetAllStats()
|
||
|
||
// Fetch monitor names from database
|
||
var monitors []models.UptimeMonitor
|
||
if err := models.DB.Find(&monitors).Error; err != nil {
|
||
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to fetch monitors"})
|
||
return
|
||
}
|
||
|
||
monitorMap := make(map[uint]string)
|
||
for _, mon := range monitors {
|
||
monitorMap[mon.ID] = mon.Name
|
||
}
|
||
|
||
// Build response
|
||
response := make([]UptimeStatsResponse, 0, len(allStats))
|
||
for id, stats := range allStats {
|
||
retryRate := 0.0
|
||
if stats.TotalChecks > 0 {
|
||
retryRate = float64(stats.RetryAttempts) / float64(stats.TotalChecks) * 100.0
|
||
}
|
||
|
||
status := "healthy"
|
||
if retryRate > 5.0 {
|
||
status = "warning"
|
||
}
|
||
|
||
resp := UptimeStatsResponse{
|
||
MonitorID: id,
|
||
MonitorName: monitorMap[id],
|
||
TotalChecks: stats.TotalChecks,
|
||
RetryAttempts: stats.RetryAttempts,
|
||
RetryRate: retryRate,
|
||
Status: status,
|
||
}
|
||
|
||
if !stats.LastRetryAt.IsZero() {
|
||
resp.LastRetryAt = stats.LastRetryAt.Format(time.RFC3339)
|
||
}
|
||
|
||
response = append(response, resp)
|
||
}
|
||
|
||
c.JSON(http.StatusOK, response)
|
||
}
|
||
```
|
||
|
||
**Register Route:**
|
||
```go
|
||
// In internal/api/routes.go
|
||
api.GET("/uptime/stats", handlers.GetUptimeStats)
|
||
```
|
||
|
||
#### 3.5 Prometheus Metrics Exposition
|
||
|
||
**Metrics Output Format:**
|
||
|
||
```prometheus
|
||
# HELP charon_uptime_checks_total Total number of uptime checks performed
|
||
# TYPE charon_uptime_checks_total counter
|
||
charon_uptime_checks_total{monitor_id="1",monitor_name="example.com",check_type="http"} 1247
|
||
|
||
# HELP charon_uptime_retries_total Total number of retry attempts
|
||
# TYPE charon_uptime_retries_total counter
|
||
charon_uptime_retries_total{monitor_id="1",monitor_name="example.com",check_type="http"} 34
|
||
|
||
# HELP charon_uptime_retry_rate_percent Percentage of checks requiring retries
|
||
# TYPE charon_uptime_retry_rate_percent gauge
|
||
charon_uptime_retry_rate_percent{monitor_id="1",monitor_name="example.com"} 2.73
|
||
```
|
||
|
||
**Access:** `GET /metrics` (existing Prometheus endpoint)
|
||
|
||
#### 3.6 Alert Integration
|
||
|
||
**Prometheus Alert Rule:**
|
||
|
||
```yaml
|
||
# File: configs/prometheus/alerts.yml
|
||
groups:
|
||
- name: uptime_monitoring
|
||
rules:
|
||
- alert: HighRetryRate
|
||
expr: charon_uptime_retry_rate_percent > 5
|
||
for: 10m
|
||
labels:
|
||
severity: warning
|
||
annotations:
|
||
summary: "High retry rate detected for monitor {{ $labels.monitor_name }}"
|
||
description: "Monitor {{ $labels.monitor_name }} (ID: {{ $labels.monitor_id }}) has a retry rate of {{ $value }}% over the last 1000 checks."
|
||
```
|
||
|
||
**Application-Level Logging:**
|
||
|
||
```go
|
||
// In uptime_service.go - Add to performCheck after retry loop
|
||
if retry > 0 {
|
||
stats := metrics.GetMetrics().GetStats(monitor.ID)
|
||
if stats != nil {
|
||
retryRate := float64(stats.RetryAttempts) / float64(stats.TotalChecks) * 100.0
|
||
if retryRate > 5.0 {
|
||
logger.Log().Warn("High retry rate detected",
|
||
zap.Uint("monitor_id", monitor.ID),
|
||
zap.String("monitor_name", monitor.Name),
|
||
zap.Float64("retry_rate", retryRate),
|
||
zap.Uint64("total_checks", stats.TotalChecks),
|
||
zap.Uint64("retry_attempts", stats.RetryAttempts),
|
||
)
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
#### 3.7 Frontend Dashboard Widget
|
||
|
||
**New Component: `frontend/src/components/RetryStatsCard.tsx`**
|
||
|
||
```tsx
|
||
import React, { useEffect, useState } from 'react';
|
||
import axios from 'axios';
|
||
|
||
interface RetryStats {
|
||
monitor_id: number;
|
||
monitor_name: string;
|
||
total_checks: number;
|
||
retry_attempts: number;
|
||
retry_rate_percent: number;
|
||
status: 'healthy' | 'warning';
|
||
last_retry_at?: string;
|
||
}
|
||
|
||
export const RetryStatsCard: React.FC = () => {
|
||
const [stats, setStats] = useState<RetryStats[]>([]);
|
||
const [loading, setLoading] = useState(true);
|
||
|
||
useEffect(() => {
|
||
const fetchStats = async () => {
|
||
try {
|
||
const response = await axios.get('/api/v1/uptime/stats');
|
||
setStats(response.data);
|
||
} catch (error) {
|
||
console.error('Failed to fetch retry stats:', error);
|
||
} finally {
|
||
setLoading(false);
|
||
}
|
||
};
|
||
|
||
fetchStats();
|
||
const interval = setInterval(fetchStats, 30000); // Refresh every 30s
|
||
|
||
return () => clearInterval(interval);
|
||
}, []);
|
||
|
||
if (loading) return <div>Loading retry statistics...</div>;
|
||
|
||
const warningMonitors = stats.filter(s => s.status === 'warning');
|
||
|
||
return (
|
||
<div className="retry-stats-card">
|
||
<h2>Uptime Retry Statistics</h2>
|
||
|
||
{warningMonitors.length > 0 && (
|
||
<div className="alert alert-warning">
|
||
<strong>⚠️ High Retry Rate Detected</strong>
|
||
<p>{warningMonitors.length} monitor(s) exceeding 5% retry threshold</p>
|
||
</div>
|
||
)}
|
||
|
||
<table className="table">
|
||
<thead>
|
||
<tr>
|
||
<th>Monitor</th>
|
||
<th>Total Checks</th>
|
||
<th>Retries</th>
|
||
<th>Retry Rate</th>
|
||
<th>Status</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
{stats.map(stat => (
|
||
<tr key={stat.monitor_id} className={stat.status === 'warning' ? 'warning' : ''}>
|
||
<td>{stat.monitor_name}</td>
|
||
<td>{stat.total_checks.toLocaleString()}</td>
|
||
<td>{stat.retry_attempts.toLocaleString()}</td>
|
||
<td>{stat.retry_rate_percent.toFixed(2)}%</td>
|
||
<td>
|
||
<span className={`badge ${stat.status === 'warning' ? 'badge-warning' : 'badge-success'}`}>
|
||
{stat.status}
|
||
</span>
|
||
</td>
|
||
</tr>
|
||
))}
|
||
</tbody>
|
||
</table>
|
||
</div>
|
||
);
|
||
};
|
||
```
|
||
|
||
#### 3.8 Race Condition Prevention
|
||
|
||
**Thread Safety Guarantees:**
|
||
|
||
1. **Read-Write Mutex:** `sync.RWMutex` in `UptimeMetrics`
|
||
- Multiple readers can access stats concurrently
|
||
- Writers get exclusive access during updates
|
||
- No data races on `monitorStats` map
|
||
|
||
2. **Atomic Operations:** Prometheus client library handles internal atomicity
|
||
- Counter increments are atomic
|
||
- Gauge updates are atomic
|
||
- No manual synchronization needed for Prometheus metrics
|
||
|
||
3. **Immutable Returns:** `GetStats()` returns a copy, not reference
|
||
- Prevents external mutation of internal state
|
||
- Safe to use returned values without locking
|
||
|
||
4. **Singleton Pattern:** `sync.Once` ensures single initialization
|
||
- No race during metrics instance creation
|
||
- Safe for concurrent first access
|
||
|
||
**Stress Test:**
|
||
|
||
```go
|
||
// File: backend/internal/metrics/uptime_metrics_test.go
|
||
func TestConcurrentAccess(t *testing.T) {
|
||
m := GetMetrics()
|
||
|
||
// Simulate 100 monitors with concurrent updates
|
||
var wg sync.WaitGroup
|
||
for i := 0; i < 100; i++ {
|
||
wg.Add(2)
|
||
monitorID := uint(i)
|
||
|
||
// Concurrent check recordings
|
||
go func() {
|
||
defer wg.Done()
|
||
for j := 0; j < 1000; j++ {
|
||
m.RecordCheck(monitorID, fmt.Sprintf("monitor-%d", monitorID), "http")
|
||
}
|
||
}()
|
||
|
||
// Concurrent retry recordings
|
||
go func() {
|
||
defer wg.Done()
|
||
for j := 0; j < 50; j++ {
|
||
m.RecordRetry(monitorID, fmt.Sprintf("monitor-%d", monitorID), "http")
|
||
}
|
||
}()
|
||
}
|
||
|
||
wg.Wait()
|
||
|
||
// Verify no data corruption
|
||
for i := 0; i < 100; i++ {
|
||
stats := m.GetStats(uint(i))
|
||
require.NotNil(t, stats)
|
||
assert.Equal(t, uint64(1000), stats.TotalChecks)
|
||
assert.Equal(t, uint64(50), stats.RetryAttempts)
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Implementation Plan
|
||
|
||
### Phase 1: Backend Go Linting Fixes
|
||
|
||
**Estimated Time:** 3-4 hours
|
||
|
||
**Tasks:**
|
||
|
||
1. **Errcheck Fixes** (60 min)
|
||
- [ ] Fix 6 JSON unmarshal errors
|
||
- [ ] Fix 11 environment variable operations
|
||
- [ ] Fix 4 database close operations
|
||
- [ ] Fix 3 HTTP write operations
|
||
- [ ] Fix 3 AutoMigrate calls
|
||
|
||
2. **Gosec Fixes** (2-3 hours)
|
||
- [ ] Fix 8 permission issues
|
||
- [ ] Fix 3 integer overflow issues
|
||
- [ ] Fix 3 file inclusion issues
|
||
- [ ] Fix 1 slice bounds issue
|
||
- [ ] Fix 2 decompression bomb issues
|
||
- [ ] Fix 1 file traversal issue
|
||
- [ ] Fix 2 Slowloris issues
|
||
- [ ] Fix 1 hardcoded credential (add #nosec comment)
|
||
|
||
**Verification:**
|
||
```bash
|
||
cd backend && golangci-lint run ./...
|
||
# Expected: 0 issues
|
||
```
|
||
|
||
### Phase 2: Frontend TypeScript Linting Fixes
|
||
|
||
**Estimated Time:** 1-2 hours
|
||
|
||
*(Same as original plan)*
|
||
|
||
### Phase 3: Retry Monitoring Implementation
|
||
|
||
**Estimated Time:** 4-5 hours
|
||
|
||
*(Same as original plan)*
|
||
|
||
---
|
||
|
||
## Acceptance Criteria
|
||
|
||
**Phase 1 Complete:**
|
||
- [ ] All 40 Go linting issues resolved (18 errcheck + 22 gosec)
|
||
- [ ] `golangci-lint run ./...` exits with code 0
|
||
- [ ] All unit tests pass
|
||
- [ ] Code coverage ≥85%
|
||
- [ ] **Security validation:**
|
||
- [ ] G110 (decompression bomb): Verify 100MB limit enforced
|
||
- [ ] G305 (path traversal): Test with `../../etc/passwd` attack input
|
||
- [ ] G306 (file permissions): Verify database files are 0600
|
||
- [ ] G304 (file inclusion): Verify extension whitelist blocks `.exe` files
|
||
- [ ] Database close errors: Verify `t.Errorf` is called on close failure
|
||
- [ ] HTTP write errors: Verify mock server returns 500 on write failure
|
||
|
||
**Phase 2 Complete:**
|
||
- [ ] All 6 TypeScript warnings resolved
|
||
- [ ] `npm run lint` shows 0 warnings
|
||
- [ ] All unit tests pass
|
||
- [ ] Code coverage ≥85%
|
||
|
||
**Phase 3 Complete:**
|
||
- [ ] Retry rate metric exposed at `/metrics`
|
||
- [ ] API endpoint `/api/v1/uptime/stats` returns correct data
|
||
- [ ] Dashboard displays retry rate widget
|
||
- [ ] Alert logged when retry rate >5%
|
||
- [ ] E2E test validates monitoring flow
|
||
- [ ] **Thread safety validation:**
|
||
- [ ] Concurrent access test passes (100 monitors, 1000 ops each)
|
||
- [ ] Race detector (`go test -race`) shows no data races
|
||
- [ ] Prometheus metrics increment correctly under load
|
||
- [ ] `GetStats()` returns consistent data during concurrent updates
|
||
- [ ] **Monitoring validation:**
|
||
- [ ] Prometheus `/metrics` endpoint exposes all 3 metric types
|
||
- [ ] Retry rate gauge updates within 1 second of retry event
|
||
- [ ] Dashboard widget refreshes every 30 seconds
|
||
- [ ] Alert triggers when retry rate >5% for 10 minutes
|
||
- [ ] Database persistence: Stats survive application restart
|
||
|
||
---
|
||
|
||
## File Changes Summary
|
||
|
||
### Backend Files (21 total)
|
||
|
||
#### Errcheck (14 files):
|
||
1. `internal/api/handlers/security_handler_audit_test.go` (1)
|
||
2. `internal/api/handlers/security_handler_coverage_test.go` (2)
|
||
3. `internal/api/handlers/settings_handler_test.go` (3)
|
||
4. `internal/config/config_test.go` (13)
|
||
5. `internal/caddy/config_test.go` (1)
|
||
6. `internal/services/dns_provider_service_test.go` (5)
|
||
7. `internal/database/errors_test.go` (1)
|
||
8. `internal/caddy/manager_additional_test.go` (2)
|
||
9. `internal/caddy/manager_test.go` (1)
|
||
10. `internal/api/handlers/notification_coverage_test.go` (1)
|
||
11. `internal/api/handlers/pr_coverage_test.go` (2)
|
||
|
||
#### Gosec (18 files):
|
||
12. `cmd/seed/seed_smoke_test.go`
|
||
13. `internal/api/handlers/manual_challenge_handler.go`
|
||
14. `internal/api/handlers/security_handler_rules_decisions_test.go`
|
||
15. `internal/caddy/config.go`
|
||
16. `internal/config/config.go`
|
||
17. `internal/crowdsec/hub_cache.go`
|
||
18. `internal/crowdsec/hub_sync.go`
|
||
19. `internal/database/database_test.go`
|
||
20. `internal/services/backup_service.go`
|
||
21. `internal/services/backup_service_test.go`
|
||
22. `internal/services/uptime_service_test.go`
|
||
23. `internal/util/crypto_test.go`
|
||
|
||
### Frontend Files (5 total):
|
||
1. `src/components/ImportSitesModal.test.tsx`
|
||
2. `src/components/ImportSitesModal.tsx`
|
||
3. `src/components/__tests__/DNSProviderForm.test.tsx`
|
||
4. `src/context/AuthContext.tsx`
|
||
5. `src/hooks/__tests__/useImport.test.ts`
|
||
|
||
## Security Impact Analysis Summary
|
||
|
||
### Critical Fixes
|
||
|
||
| Issue | Pre-Fix Risk | Post-Fix Risk | Mitigation Effectiveness |
|
||
|-------|-------------|---------------|-------------------------|
|
||
| **G110 - Decompression Bomb** | HIGH (Memory exhaustion DoS) | LOW | 100MB hard limit prevents attacks |
|
||
| **G305 - Path Traversal** | CRITICAL (Arbitrary file access) | LOW | Multi-layer validation blocks escapes |
|
||
| **G306 - File Permissions** | HIGH (Data exfiltration) | LOW | Restrictive permissions (0600/0700) |
|
||
| **G304 - File Inclusion** | HIGH (Config poisoning) | MEDIUM | Extension whitelist limits exposure |
|
||
| **Database Close** | LOW (Resource leak) | MINIMAL | Error logging aids debugging |
|
||
| **HTTP Write** | MEDIUM (Silent test failure) | LOW | Fast-fail prevents false positives |
|
||
|
||
### Attack Vector Coverage
|
||
|
||
**Blocked Attacks:**
|
||
- ✅ Gzip bomb (G110) - 100MB limit
|
||
- ✅ Directory traversal (G305) - Path validation
|
||
- ✅ Credential theft (G306) - Database files secured
|
||
- ✅ Config injection (G304) - Extension filtering
|
||
|
||
**Remaining Considerations:**
|
||
- Symlink attacks mitigated by `filepath.Abs()` resolution
|
||
- Integer overflow (G115) caught before array access
|
||
- Test fixtures (G101) properly annotated as non-functional
|
||
|
||
---
|
||
|
||
## Monitoring Technical Specification
|
||
|
||
### Architecture
|
||
|
||
```
|
||
┌─────────────────┐
|
||
│ Uptime Service │
|
||
│ (Goroutines) │──┐
|
||
└─────────────────┘ │
|
||
│ Record metrics
|
||
┌─────────────────┐ │ (thread-safe)
|
||
│ HTTP Checks │──┤
|
||
└─────────────────┘ │
|
||
│
|
||
┌─────────────────┐ │
|
||
│ TCP Checks │──┤
|
||
└─────────────────┘ │
|
||
▼
|
||
┌──────────────────┐
|
||
│ UptimeMetrics │
|
||
│ (Singleton) │
|
||
│ sync.RWMutex │
|
||
└──────────────────┘
|
||
│
|
||
┌────────────┼────────────┐
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
Prometheus Database REST API
|
||
/metrics Persistence /api/v1/uptime/stats
|
||
│ │ │
|
||
▼ ▼ ▼
|
||
Grafana Auto-backup React Dashboard
|
||
Dashboard (SQLite) (Real-time)
|
||
```
|
||
|
||
### Data Flow
|
||
|
||
1. **Collection:** `RecordCheck()` / `RecordRetry()` called after each uptime check
|
||
2. **Storage:** In-memory map + Prometheus counters/gauges updated atomically
|
||
3. **Persistence:** Database updated every 5 minutes via background goroutine
|
||
4. **Exposition:**
|
||
- Prometheus: Scraped every 15s by external monitoring
|
||
- REST API: Polled every 30s by frontend dashboard
|
||
5. **Alerting:** Prometheus evaluates rules every 1m, triggers webhook on breach
|
||
|
||
### Performance Characteristics
|
||
|
||
- **Memory:** ~50 bytes per monitor (100 monitors = 5KB)
|
||
- **CPU:** < 0.1% overhead (mutex contention minimal)
|
||
- **Disk:** 1 write/5min (negligible I/O impact)
|
||
- **Network:** 3 Prometheus metrics per monitor (300 bytes/scrape for 100 monitors)
|
||
|
||
---
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
- **Go Lint Output:** `backend/final_lint.txt` (34 issues), `backend/full_lint_output.txt` (40 issues)
|
||
- **TypeScript Lint Output:** `npm run lint` output (6 warnings)
|
||
- **Gosec:** https://github.com/securego/gosec
|
||
- **golangci-lint:** https://golangci-lint.run/
|
||
- **Prometheus Best Practices:** https://prometheus.io/docs/practices/naming/
|
||
- **OWASP Secure Coding:** https://owasp.org/www-project-secure-coding-practices-quick-reference-guide/
|
||
- **CWE-409 Decompression Bomb:** https://cwe.mitre.org/data/definitions/409.html
|
||
- **CWE-22 Path Traversal:** https://cwe.mitre.org/data/definitions/22.html
|
||
|
||
---
|
||
|
||
**Plan Status:** ✅ Ready for Implementation (Post-Supervisor Review)
|
||
**Changes Made:**
|
||
- ✅ Database close pattern updated (use `t.Errorf`)
|
||
- ✅ HTTP write errors with proper handling
|
||
- ✅ Gosec G101 annotation added
|
||
- ✅ Decompression bomb mitigation (100MB limit)
|
||
- ✅ Path traversal validation logic
|
||
- ✅ File permission security matrix documented
|
||
- ✅ Complete monitoring technical specification
|
||
- ✅ Thread safety guarantees documented
|
||
- ✅ Security acceptance criteria added
|
||
|
||
**Next Step:** Begin Phase 1 - Backend Go Linting Fixes (Errcheck first, then Gosec)
|