Files
Charon/docs/plans/phase4_security_toggles_spec.md
2026-01-26 19:22:05 +00:00

59 KiB

Phase 4: Security Module Toggle Actions - Implementation Specification

Status: IMPLEMENTED Created: 2026-01-23 Last Updated: 2026-01-24 Implementation Completed: 2026-01-24 Estimated Effort: 13-15 hours (2 days) Priority: P0 - Critical (Unblocks 8 skipped E2E tests) Dependencies: None (can start immediately)

⚠️ CRITICAL FIXES APPLIED: This spec has been updated to address P0 issues identified in supervisor review:

  • Frontend optimistic update preserves required fields (mode)
  • Cerberus DB injection pattern documented
  • Config reload trigger requirements added
  • Performance cache layer specified
  • Switch component uses onCheckedChange (not onChange)

FINAL REVIEW 2026-01-24: Supervisor verified implementation prerequisites:

  • Phase 0 (Cerberus DB injection) is ALREADY COMPLETE - Cerberus struct already has db *gorm.DB field
  • Only routes.go:107 instantiates Cerberus in production code
  • Revised effort: 13-15 hours (reduced from 16-20h due to Phase 0 skip)
  • All prerequisite files verified to exist

Executive Summary

This specification provides a detailed implementation plan for enabling toggle functionality for three security modules (ACL, WAF, Rate Limiting) in the Charon SecurityDashboard. Currently, these modules display status but cannot be toggled on/off through the UI. The frontend already has toggle UI components in place with proper data-testid attributes; they are currently disabled and non-functional. This phase implements the backend logic, frontend handlers, and middleware integration to make these toggles fully operational.

Tests to Enable: 8 E2E tests in tests/security/security-dashboard.spec.ts and tests/security/rate-limiting.spec.ts

Current State:

  • Frontend UI: Toggle switches exist with proper test IDs
  • Backend Status API: /api/v1/security/status returns enabled/disabled states
  • Database Schema: settings table stores per-module settings
  • Missing: Backend toggle endpoints (no POST routes for enable/disable)
  • Missing: Frontend mutation handlers are non-functional (call generic updateSetting API)
  • Missing: Middleware does not fully honor settings-based enabled/disabled states

Table of Contents

  1. Architecture Overview
  2. Database Schema
  3. Backend Implementation
  4. Frontend Implementation
  5. Middleware Updates
  6. Testing Strategy
  7. Implementation Phases
  8. File Modification Checklist
  9. Validation Criteria

Architecture Overview

Current Flow (Read-Only Status)

┌─────────────────────────┐
│  Frontend UI            │
│  - SecurityDashboard    │
│  - Toggle switches      │
│  - (Disabled)           │
└───────────┬─────────────┘
            │
            ▼
┌─────────────────────────┐
│  GET /security/status   │
│  - security_handler.go  │
│  - Reads DB settings    │
│  - Returns JSON status  │
└─────────────────────────┘
            │
            ▼
┌─────────────────────────┐
│  Database               │
│  - settings table       │
│  - security.*.enabled   │
└─────────────────────────┘

Target Flow (Toggle Actions)

┌─────────────────────────┐
│  Frontend UI            │
│  - Toggle ACL           │──┐
│  - Toggle WAF           │  │
│  - Toggle Rate Limit    │  │
└─────────────────────────┘  │
                              │ (onChange)
                              ▼
┌─────────────────────────────────────┐
│  POST /settings                     │
│  - settings_handler.go              │
│  - UpdateSetting()                  │
│  - Validates key/value              │
│  - Upserts to settings table        │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│  Database                           │
│  - settings.key = "security.*.enabled" │
│  - settings.value = "true"/"false"  │
└─────────────┬───────────────────────┘
              │
              ▼
┌─────────────────────────────────────┐
│  Middleware / Caddy Config          │
│  - Cerberus.Middleware()            │
│  - caddy/config.go                  │
│  - Honors settings                  │
└─────────────────────────────────────┘

Key Insight: The backend /settings endpoint and database schema already exist. We are reusing existing infrastructure rather than creating new endpoints. The challenge is:

  1. Frontend needs to send correct setting keys
  2. Middleware needs to check these settings consistently
  3. Caddy config generation needs to respect runtime settings

Database Schema

Existing Schema (No Changes Required)

settings Table

Already supports all required keys:

Column Type Index Description
id INTEGER PK Auto-increment primary key
key VARCHAR UNIQUE Setting key (e.g., security.acl.enabled)
value TEXT Setting value ("true" or "false")
type VARCHAR INDEX Type hint ("bool")
category VARCHAR INDEX Category ("security")
updated_at TIMESTAMP Last update timestamp

Existing Settings Keys:

  • security.acl.enabled - ACL module toggle
  • security.waf.enabled - WAF module toggle
  • security.rate_limit.enabled - Rate limiting toggle
  • security.crowdsec.enabled - CrowdSec toggle (already working)

No migration needed - schema supports all requirements out of the box.


Backend Implementation

1. Settings Handler (Already Exists - No Changes)

File: backend/internal/api/handlers/settings_handler.go

Current Implementation:

// UpdateSetting updates or creates a setting.
func (h *SettingsHandler) UpdateSetting(c *gin.Context) {
	var req UpdateSettingRequest
	if err := c.ShouldBindJSON(&req); err != nil {
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

	setting := models.Setting{
		Key:   req.Key,
		Value: req.Value,
	}

	if req.Category != "" {
		setting.Category = req.Category
	}
	if req.Type != "" {
		setting.Type = req.Type
	}

	// Upsert
	if err := h.DB.Where(models.Setting{Key: req.Key}).Assign(setting).FirstOrCreate(&setting).Error; err != nil {
		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save setting"})
		return
	}

	c.JSON(http.StatusOK, setting)
}

Route: POST /api/v1/settings (already registered in routes.go:200)

1. Settings Handler (Requires Config Reload Trigger)

⚠️ CRITICAL ADDITION: SettingsHandler must trigger Caddy config reload when security settings change.

File: backend/internal/api/handlers/settings_handler.go

Current Implementation ( Missing reload trigger):

// UpdateSetting updates or creates a setting.
func (h *SettingsHandler) UpdateSetting(c *gin.Context) {
	var req UpdateSettingRequest
	if err := c.ShouldBindJSON(&req); err != nil {
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

	setting := models.Setting{
		Key:   req.Key,
		Value: req.Value,
	}

	if req.Category != "" {
		setting.Category = req.Category
	}
	if req.Type != "" {
		setting.Type = req.Type
	}

	// Upsert
	if err := h.DB.Where(models.Setting{Key: req.Key}).Assign(setting).FirstOrCreate(&setting).Error; err != nil {
		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save setting"})
		return
	}

	c.JSON(http.StatusOK, setting)
	// ❌ MISSING: Caddy config reload for security.* settings
}

Updated Implementation ( With config reload):

import (
	"strings"
	"context"
	"time"
	// ... other imports ...
)

type SettingsHandler struct {
	DB            *gorm.DB
	CaddyManager  CaddyConfigManager  // ✅ Add CaddyManager interface
}

// CaddyConfigManager interface for reload triggering
type CaddyConfigManager interface {
	ApplyConfig(ctx context.Context) error
}

// UpdateSetting updates or creates a setting.
func (h *SettingsHandler) UpdateSetting(c *gin.Context) {
	var req UpdateSettingRequest
	if err := c.ShouldBindJSON(&req); err != nil {
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

	setting := models.Setting{
		Key:   req.Key,
		Value: req.Value,
	}

	if req.Category != "" {
		setting.Category = req.Category
	}
	if req.Type != "" {
		setting.Type = req.Type
	}

	// Upsert
	if err := h.DB.Where(models.Setting{Key: req.Key}).Assign(setting).FirstOrCreate(&setting).Error; err != nil {
		c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save setting"})
		return
	}

	// ✅ Trigger Caddy config reload for security settings
	if h.CaddyManager != nil && strings.HasPrefix(req.Key, "security.") {
		go func() {
			ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
			defer cancel()

			if err := h.CaddyManager.ApplyConfig(ctx); err != nil {
				// Log error but don't fail the setting update
				logger.Log().WithError(err).Warn("Failed to reload Caddy config after security setting change")
			}
		}()
	}

	c.JSON(http.StatusOK, setting)
}

Key Changes:

  1. Add CaddyManager field to SettingsHandler struct
  2. Define CaddyConfigManager interface with ApplyConfig method
  3. Trigger async config reload when security.* settings change
  4. Use goroutine with timeout to avoid blocking HTTP response
  5. Log reload errors but don't fail the setting update

Constructor Update Required:

// In server.go or wherever SettingsHandler is created:
func NewSettingsHandler(db *gorm.DB, caddyMgr *caddy.Manager) *SettingsHandler {
	return &SettingsHandler{
		DB:           db,
		CaddyManager: caddyMgr,  // ✅ Inject CaddyManager
	}
}

Why Async: Config reload can take 1-2 seconds; we don't want to block the HTTP response. The setting is saved immediately, and config reload happens in the background.

Error Handling: If reload fails, the setting is still saved. Users can manually retry the toggle or trigger a manual config reload.

Route: POST /api/v1/settings (already registered in routes.go:200)

2. Security Status Endpoint ( ZERO CHANGES NEEDED)

⚠️ IMPORTANT: This endpoint is already 100% correct and reads runtime settings with highest priority.

File: backend/internal/api/handlers/security_handler.go

Current Implementation (lines 54-189) - DO NOT MODIFY:

func (h *SecurityHandler) GetStatus(c *gin.Context) {
	// Priority chain:
	// 1. Settings table (highest - runtime overrides)
	// 2. SecurityConfig DB record (middle - user configuration)
	// 3. Static config (lowest - defaults)

	// ... loads from SecurityConfig first ...

	// Settings table overrides (PRIORITY 1 - highest)
	var setting struct{ Value string }

	// WAF enabled override
	setting = struct{ Value string }{}
	if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.waf.enabled").Scan(&setting).Error; err == nil && setting.Value != "" {
		if strings.EqualFold(setting.Value, "true") {
			wafMode = "enabled"
		} else {
			wafMode = "disabled"
		}
	}

	// Rate Limit enabled override
	setting = struct{ Value string }{}
	if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.rate_limit.enabled").Scan(&setting).Error; err == nil && setting.Value != "" {
		if strings.EqualFold(setting.Value, "true") {
			rateLimitMode = "enabled"
		} else {
			rateLimitMode = "disabled"
		}
	}

	// ACL enabled override
	setting = struct{ Value string }{}
	if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.acl.enabled").Scan(&setting).Error; err == nil && setting.Value != "" {
		if strings.EqualFold(setting.Value, "true") {
			aclMode = "enabled"
		} else {
			aclMode = "disabled"
		}
	}

	// ... continues to build response ...
}

Already implemented - Backend correctly reads runtime settings with highest priority.

Action Item: None - endpoint is fully functional.


Frontend Implementation

1. Update Security.tsx Toggle Handlers

File: frontend/src/pages/Security.tsx (lines 100-160)

Current Issue: The toggleServiceMutation uses a generic updateSetting call, but the implementation doesn't correctly trigger optimistic updates or invalidate queries properly.

Current Code (lines 100-160):

// Generic toggle mutation for per-service settings
const toggleServiceMutation = useMutation({
  mutationFn: async ({ key, enabled }: { key: string; enabled: boolean }) => {
    await updateSetting(key, enabled ? 'true' : 'false', 'security', 'bool')
  },
  onMutate: async ({ key, enabled }: { key: string; enabled: boolean }) => {
    await queryClient.cancelQueries({ queryKey: ['security-status'] })
    const previous = queryClient.getQueryData(['security-status'])
    queryClient.setQueryData(['security-status'], (old: unknown) => {
      if (!old || typeof old !== 'object') return old
      const parts = key.split('.')
      const section = parts[1] as keyof SecurityStatus
      const field = parts[2]
      const copy = { ...(old as SecurityStatus) }
      if (copy[section] && typeof copy[section] === 'object') {
        copy[section] = { ...copy[section], [field]: enabled } as never
      }
      return copy
    })
    return { previous }
  },
  onError: (_err, _vars, context: unknown) => {
    if (context && typeof context === 'object' && 'previous' in context) {
      queryClient.setQueryData(['security-status'], context.previous)
    }
    const msg = _err instanceof Error ? _err.message : String(_err)
    toast.error(`Failed to update setting: ${msg}`)
  },
  onSuccess: () => {
    queryClient.invalidateQueries({ queryKey: ['settings'] })
    queryClient.invalidateQueries({ queryKey: ['security-status'] })
    toast.success('Security setting updated')
  },
})

Problem: The optimistic update logic assumes the SecurityStatus shape has section[field], but the actual shape is:

  • status.acl.enabled
  • status.waf.enabled
  • status.rate_limit.enabled

The current code tries to parse key = "security.acl.enabled" into section = "acl", field = "enabled", which is correct, but then assigns copy[section][field] which may fail if the section object structure is wrong.

Solution: Fix the optimistic update to preserve all required fields, especially mode for WAF and rate_limit.

⚠️ CRITICAL BUG FIX: The old code would drop the mode field from WAF and rate_limit sections, breaking the UI.

SecurityStatus Interface (for reference):

interface SecurityStatus {
  acl: { enabled: boolean }
  waf: { enabled: boolean; mode: string }  // ⚠️ mode is REQUIRED
  rate_limit: { enabled: boolean; mode: string }  // ⚠️ mode is REQUIRED
  cerberus?: { enabled: boolean }
}

Updated Code (replace lines 100-160):

// Generic toggle mutation for per-service settings
const toggleServiceMutation = useMutation({
  mutationFn: async ({ key, enabled }: { key: string; enabled: boolean }) => {
    await updateSetting(key, enabled ? 'true' : 'false', 'security', 'bool')
  },
  onMutate: async ({ key, enabled }: { key: string; enabled: boolean }) => {
    // Cancel ongoing queries to avoid race conditions
    await queryClient.cancelQueries({ queryKey: ['security-status'] })

    // Snapshot current state for rollback
    const previous = queryClient.getQueryData(['security-status'])

    // Optimistic update: parse key like "security.acl.enabled" -> section "acl"
    queryClient.setQueryData(['security-status'], (old: unknown) => {
      if (!old || typeof old !== 'object') return old

      const oldStatus = old as SecurityStatus
      const copy = { ...oldStatus }

      // Extract section from key (e.g., "security.acl.enabled" -> "acl")
      const parts = key.split('.')
      const section = parts[1] as keyof SecurityStatus

      // ✅ CRITICAL: Spread existing section data to preserve fields like 'mode'
      // Update ONLY the enabled field, keep everything else intact
      if (section === 'acl') {
        copy.acl = { ...copy.acl, enabled }
      } else if (section === 'waf') {
        // ⚠️ Preserve mode field (detection/prevention)
        copy.waf = { ...copy.waf, enabled }
      } else if (section === 'rate_limit') {
        // ⚠️ Preserve mode field (log/block)
        copy.rate_limit = { ...copy.rate_limit, enabled }
      }

      return copy
    })

    return { previous }
  },
  onError: (_err, _vars, context: unknown) => {
    // Rollback on error
    if (context && typeof context === 'object' && 'previous' in context) {
      queryClient.setQueryData(['security-status'], context.previous)
    }
    const msg = _err instanceof Error ? _err.message : String(_err)
    toast.error(`Failed to update setting: ${msg}`)
  },
  onSuccess: () => {
    // Refresh data from server
    queryClient.invalidateQueries({ queryKey: ['settings'] })
    queryClient.invalidateQueries({ queryKey: ['security-status'] })
    toast.success('Security setting updated')
  },
})

Why This Matters: WAF and rate_limit have a mode field (e.g., {enabled: true, mode: "detection"}) that must be preserved during optimistic updates. The spread operator ...copy.waf ensures we only update enabled while keeping mode intact.

File Changes:

  • frontend/src/pages/Security.tsx (lines 100-160)
  • No API client changes needed - updateSetting in frontend/src/api/settings.ts already correct

2. Verify Toggle Component Integration

File: frontend/src/pages/Security.tsx (lines 420-520)

Current Implementation:

{/* ACL - Layer 2: Access Control */}
<Card variant="interactive" className="flex flex-col">
  <CardFooter className="justify-between pt-4">
    <Tooltip>
      <TooltipTrigger asChild>
        <div>
          <Switch
            checked={status.acl.enabled}
            disabled={!status.cerberus?.enabled}
            onCheckedChange={(checked) => toggleServiceMutation.mutate({
              key: 'security.acl.enabled',
              enabled: checked
            })}
            data-testid="toggle-acl"
          />
        </div>
      </TooltipTrigger>
      <TooltipContent>
        <p>{cerberusDisabled ? t('security.enableCerberusFirst') : t('security.toggleAcl')}</p>
      </TooltipContent>
    </Tooltip>
    {/* ... Configure button ... */}
  </CardFooter>
</Card>

⚠️ CRITICAL FIX: Use onCheckedChange (not onChange) for Switch component:

  • onCheckedChange receives boolean directly
  • onChange receives Event object (legacy pattern)

Apply to all toggles:

  • ACL: security.acl.enabled
  • WAF: security.waf.enabled
  • Rate Limit: security.rate_limit.enabled

Action Items:

  1. Fix optimistic update logic (see section 1 above)
  2. Replace onChange with onCheckedChange in all three toggle components

3. Update Switch Component (If Needed)

File: frontend/src/components/ui/Switch.tsx

Current Implementation (lines 1-50):

const Switch = React.forwardRef<HTMLInputElement, SwitchProps>(
  ({ className, onCheckedChange, onChange, id, disabled, ...props }, ref) => {
    return (
      <label
        htmlFor={id}
        className={cn(
          'relative inline-flex items-center',
          disabled ? 'cursor-not-allowed opacity-50' : 'cursor-pointer',
          className
        )}
      >
        <input
          id={id}
          type="checkbox"
          className="sr-only peer"
          ref={ref}
          disabled={disabled}
          onChange={(e) => {
            onChange?.(e)
            onCheckedChange?.(e.target.checked)
          }}
          {...props}
        />
        {/* ... visual toggle styling ... */}
      </label>
    )
  }
)

No changes needed - Component correctly:

  1. Accepts onChange and onCheckedChange props
  2. Supports disabled state
  3. Renders accessible checkbox with visual toggle

Middleware Updates

0. Cerberus Struct DB Injection (PREREQUISITE)

ALREADY COMPLETE: Cerberus already has access to *gorm.DB to query runtime settings.

File: backend/internal/cerberus/cerberus.go (lines 20-32)

Current Struct (verified 2026-01-24):

type Cerberus struct {
	cfg               config.SecurityConfig
	db                *gorm.DB        // ✅ Already exists
	accessSvc         *services.AccessListService
	securityNotifySvc *services.SecurityNotificationService
}

func New(cfg config.SecurityConfig, db *gorm.DB) *Cerberus {  // ✅ Already accepts db
	return &Cerberus{
		cfg: cfg,
		db:  db,
	}
}

No Changes Required - The prerequisite is already satisfied.

Instantiation Sites (verified):

  • backend/internal/api/routes/routes.go:107 - Primary instantiation site
  • Test files use their own mock instances

Validation Complete:

# ✅ Verified 2026-01-24
grep -rn "cerberus.New(" backend/
# routes/routes.go:107: cerb := cerberus.New(cfg.Security, db)

1. Cerberus Middleware ACL Check

File: backend/internal/cerberus/cerberus.go (lines 85-148)

Prerequisites: DB field must be added (see section 0 above)

Current Implementation (lines 105-135):

func (c *Cerberus) Middleware() gin.HandlerFunc {
	return func(ctx *gin.Context) {
		if !c.IsEnabled() {
			ctx.Next()
			return
		}

		// WAF tracking
		if c.cfg.WAFMode != "" && c.cfg.WAFMode != "disabled" {
			metrics.IncWAFRequest()
		}

		// ACL: simple per-request evaluation against all access lists if enabled
		if c.cfg.ACLMode == "enabled" {
			acls, err := c.accessSvc.List()
			if err == nil {
				clientIP := ctx.ClientIP()
				for _, acl := range acls {
					if !acl.Enabled {
						continue
					}
					allowed, _, err := c.accessSvc.TestIP(acl.ID, clientIP)
					if err == nil && !allowed {
						// Send security notification
						_ = c.securityNotifySvc.Send(context.Background(), models.SecurityEvent{
							EventType: "acl_deny",
							Severity:  "warn",
							Message:   "Access control list blocked request",
							ClientIP:  clientIP,
							Path:      ctx.Request.URL.Path,
							Timestamp: time.Now(),
							Metadata: map[string]any{
								"acl_name": acl.Name,
								"acl_id":   acl.ID,
							},
						})

						ctx.AbortWithStatusJSON(http.StatusForbidden, gin.H{"error": "Blocked by access control list"})
						return
					}
				}
			}
		}

		ctx.Next()
	}
}

Issue: Reads c.cfg.ACLMode (static config), not runtime setting from DB.

Fix: Query settings table for security.acl.enabled before checking ACLs.

Updated Code:

func (c *Cerberus) Middleware() gin.HandlerFunc {
	return func(ctx *gin.Context) {
		if !c.IsEnabled() {
			ctx.Next()
			return
		}

		// WAF tracking - check runtime setting
		wafEnabled := c.cfg.WAFMode != "" && c.cfg.WAFMode != "disabled"
		if c.db != nil {
			var s models.Setting
			if err := c.db.Where("key = ?", "security.waf.enabled").First(&s).Error; err == nil {
				wafEnabled = strings.EqualFold(s.Value, "true")
			}
		}
		if wafEnabled {
			metrics.IncWAFRequest()
		}

		// ACL: check runtime setting before evaluating access lists
		aclEnabled := c.cfg.ACLMode == "enabled"
		if c.db != nil {
			var s models.Setting
			if err := c.db.Where("key = ?", "security.acl.enabled").First(&s).Error; err == nil {
				aclEnabled = strings.EqualFold(s.Value, "true")
			}
		}

		if aclEnabled {
			acls, err := c.accessSvc.List()
			if err == nil {
				clientIP := ctx.ClientIP()
				for _, acl := range acls {
					if !acl.Enabled {
						continue
					}
					allowed, _, err := c.accessSvc.TestIP(acl.ID, clientIP)
					if err == nil && !allowed {
						// Send security notification
						_ = c.securityNotifySvc.Send(context.Background(), models.SecurityEvent{
							EventType: "acl_deny",
							Severity:  "warn",
							Message:   "Access control list blocked request",
							ClientIP:  clientIP,
							Path:      ctx.Request.URL.Path,
							Timestamp: time.Now(),
							Metadata: map[string]any{
								"acl_name": acl.Name,
								"acl_id":   acl.ID,
							},
						})

						ctx.AbortWithStatusJSON(http.StatusForbidden, gin.H{"error": "Blocked by access control list"})
						return
					}
				}
			}
		}

		// CrowdSec integration (already correct - checks mode)
		if c.cfg.CrowdSecMode == "local" {
			metrics.IncCrowdSecRequest()
			logger.Log().WithField("client_ip", ctx.ClientIP()).WithField("path", ctx.Request.URL.Path).Debug("Request evaluated by CrowdSec bouncer at Caddy layer")
		}

		ctx.Next()
	}
}

File Changes:

  • backend/internal/cerberus/cerberus.go (lines 85-148)

2. Caddy Config Generation (WAF and Rate Limit)

File: backend/internal/caddy/config.go

Current Implementation (lines 1-300):

func GenerateConfig(hosts []models.ProxyHost, storageDir, acmeEmail, frontendDir, sslProvider string, acmeStaging, crowdsecEnabled, wafEnabled, rateLimitEnabled, aclEnabled bool, adminWhitelist string, rulesets []models.SecurityRuleSet, rulesetPaths map[string]string, decisions []models.SecurityDecision, secCfg *models.SecurityConfig, dnsProviderConfigs []DNSProviderConfig) (*Config, error) {
	// ... config generation ...
}

Issue: Function parameters wafEnabled, rateLimitEnabled, aclEnabled are static booleans passed from static config, not runtime settings.

Fix: Before calling GenerateConfig, query runtime settings and pass correct values.

Caller: backend/internal/caddy/manager.go (ApplyConfig method)

Current Code (approximate):

func (m *Manager) ApplyConfig(ctx context.Context) error {
	// ... fetch hosts, rulesets, etc. ...

	// Get static config flags
	wafEnabled := m.secCfg.WAFMode != "" && m.secCfg.WAFMode != "disabled"
	rateLimitEnabled := m.secCfg.RateLimitMode == "enabled"
	aclEnabled := m.secCfg.ACLMode == "enabled"

	config, err := GenerateConfig(
		hosts,
		m.storageDir,
		acmeEmail,
		m.frontendDir,
		sslProvider,
		acmeStaging,
		crowdsecEnabled,
		wafEnabled,           // ❌ Static
		rateLimitEnabled,     // ❌ Static
		aclEnabled,           // ❌ Static
		adminWhitelist,
		rulesets,
		rulesetPaths,
		decisions,
		secCfg,
		dnsProviderConfigs,
	)
	// ... apply to Caddy ...
}

Updated Code:

func (m *Manager) ApplyConfig(ctx context.Context) error {
	// ... fetch hosts, rulesets, etc. ...

	// Get runtime settings (priority 1) or fallback to static config
	wafEnabled := m.secCfg.WAFMode != "" && m.secCfg.WAFMode != "disabled"
	rateLimitEnabled := m.secCfg.RateLimitMode == "enabled"
	aclEnabled := m.secCfg.ACLMode == "enabled"

	// Override with runtime settings from DB
	if m.db != nil {
		var s models.Setting

		// WAF runtime setting
		if err := m.db.Where("key = ?", "security.waf.enabled").First(&s).Error; err == nil {
			wafEnabled = strings.EqualFold(s.Value, "true")
		}

		// Rate Limit runtime setting
		s = models.Setting{} // Reset
		if err := m.db.Where("key = ?", "security.rate_limit.enabled").First(&s).Error; err == nil {
			rateLimitEnabled = strings.EqualFold(s.Value, "true")
		}

		// ACL runtime setting
		s = models.Setting{} // Reset
		if err := m.db.Where("key = ?", "security.acl.enabled").First(&s).Error; err == nil {
			aclEnabled = strings.EqualFold(s.Value, "true")
		}
	}

	config, err := GenerateConfig(
		hosts,
		m.storageDir,
		acmeEmail,
		m.frontendDir,
		sslProvider,
		acmeStaging,
		crowdsecEnabled,
		wafEnabled,           // ✅ Runtime
		rateLimitEnabled,     // ✅ Runtime
		aclEnabled,           // ✅ Runtime
		adminWhitelist,
		rulesets,
		rulesetPaths,
		decisions,
		secCfg,
		dnsProviderConfigs,
	)
	// ... apply to Caddy ...
}

File Changes:

  • backend/internal/caddy/manager.go (ApplyConfig method, ~line 150-250)

3. Performance: Settings Cache Layer

⚠️ CRITICAL PERFORMANCE FIX: Querying settings table on every request causes unnecessary DB load.

File: backend/internal/cerberus/cerberus.go

Problem: Current implementation queries settings table on every HTTP request in middleware (lines 105-135). For high-traffic sites, this adds ~1-2ms per request and increases DB load.

Solution: Add in-memory cache with 60-second TTL.

Cache Implementation:

import (
	"sync"
	"time"
)

type Cerberus struct {
	cfg               config.SecurityConfig
	db                *gorm.DB
	accessSvc         AccessService
	securityNotifySvc SecurityNotificationService

	// ✅ Add cache fields
	settingsCache     map[string]string  // key -> value
	settingsCacheMu   sync.RWMutex
	settingsCacheTime time.Time
	settingsCacheTTL  time.Duration
}

func New(cfg config.SecurityConfig, db *gorm.DB, accessSvc AccessService, securityNotifySvc SecurityNotificationService) *Cerberus {
	return &Cerberus{
		cfg:               cfg,
		db:                db,
		accessSvc:         accessSvc,
		securityNotifySvc: securityNotifySvc,
		settingsCache:     make(map[string]string),
		settingsCacheTTL:  60 * time.Second,  // ✅ 60-second TTL
	}
}

// getSetting retrieves a setting with in-memory caching.
func (c *Cerberus) getSetting(key string) (string, bool) {
	// Fast path: check cache with read lock
	c.settingsCacheMu.RLock()
	if time.Since(c.settingsCacheTime) < c.settingsCacheTTL {
		val, ok := c.settingsCache[key]
		c.settingsCacheMu.RUnlock()
		return val, ok
	}
	c.settingsCacheMu.RUnlock()

	// Slow path: refresh cache with write lock
	c.settingsCacheMu.Lock()
	defer c.settingsCacheMu.Unlock()

	// Double-check: another goroutine might have refreshed cache
	if time.Since(c.settingsCacheTime) < c.settingsCacheTTL {
		val, ok := c.settingsCache[key]
		return val, ok
	}

	// Refresh entire cache from DB (batch query is faster than individual queries)
	var settings []models.Setting
	if err := c.db.Where("key LIKE ?", "security.%").Find(&settings).Error; err != nil {
		return "", false
	}

	// Update cache
	c.settingsCache = make(map[string]string)
	for _, s := range settings {
		c.settingsCache[s.Key] = s.Value
	}
	c.settingsCacheTime = time.Now()

	val, ok := c.settingsCache[key]
	return val, ok
}

// InvalidateCache forces cache refresh on next access.
// Call this after updating security settings.
func (c *Cerberus) InvalidateCache() {
	c.settingsCacheMu.Lock()
	c.settingsCacheTime = time.Time{}  // Zero time forces refresh
	c.settingsCacheMu.Unlock()
}

Usage in Middleware (replace individual queries):

func (c *Cerberus) Middleware() gin.HandlerFunc {
	return func(ctx *gin.Context) {
		if !c.IsEnabled() {
			ctx.Next()
			return
		}

		// ✅ Use cached settings instead of direct DB queries
		wafEnabled := c.cfg.WAFMode != "" && c.cfg.WAFMode != "disabled"
		if val, ok := c.getSetting("security.waf.enabled"); ok {
			wafEnabled = strings.EqualFold(val, "true")
		}
		if wafEnabled {
			metrics.IncWAFRequest()
		}

		aclEnabled := c.cfg.ACLMode == "enabled"
		if val, ok := c.getSetting("security.acl.enabled"); ok {
			aclEnabled = strings.EqualFold(val, "true")
		}

		if aclEnabled {
			// ... ACL logic ...
		}

		ctx.Next()
	}
}

Cache Invalidation (in SettingsHandler):

// In UpdateSetting, after saving to DB:
if strings.HasPrefix(req.Key, "security.") {
	// Invalidate Cerberus cache
	if h.Cerberus != nil {
		h.Cerberus.InvalidateCache()
	}

	// Trigger config reload (async)
	if h.CaddyManager != nil {
		go func() {
			ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
			defer cancel()
			h.CaddyManager.ApplyConfig(ctx)
		}()
	}
}

Performance Impact:

  • Before: 3 DB queries per request (~3-6ms DB time)
  • After: 0 DB queries per request (cache hit), 1 batch query per 60s (cache refresh)
  • Expected Improvement: ~5ms per request reduction at high traffic

Benchmark Requirement:

// Add benchmark test to verify performance improvement
func BenchmarkCerberus_Middleware_WithCache(b *testing.B) {
	// ... benchmark setup ...
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		// ... call middleware ...
	}
}

File Changes:

  • backend/internal/cerberus/cerberus.go (add cache struct fields and methods, ~100 lines)
  • backend/internal/api/handlers/settings_handler.go (add cache invalidation, ~5 lines)
  • backend/internal/cerberus/cerberus_test.go (add cache tests, ~50 lines)
  • backend/internal/cerberus/cerberus_bench_test.go (new file, benchmark, ~30 lines)

Testing Strategy

1. Backend Unit Tests

Test Settings Handler (Already Covered)

File: backend/internal/api/handlers/settings_handler_test.go (if exists)

Tests to Add/Verify:

  • UpdateSetting creates new setting
  • UpdateSetting updates existing setting
  • UpdateSetting validates required fields
  • ⚠️ Add test: UpdateSetting handles security.*.enabled keys

New Test:

func TestSettingsHandler_UpdateSetting_SecurityToggles(t *testing.T) {
	db := setupTestDB(t)
	handler := NewSettingsHandler(db)
	router := setupTestRouter()
	router.POST("/settings", handler.UpdateSetting)

	testCases := []struct {
		name     string
		key      string
		value    string
		category string
		typ      string
	}{
		{"ACL Enable", "security.acl.enabled", "true", "security", "bool"},
		{"WAF Enable", "security.waf.enabled", "true", "security", "bool"},
		{"Rate Limit Enable", "security.rate_limit.enabled", "true", "security", "bool"},
		{"ACL Disable", "security.acl.enabled", "false", "security", "bool"},
	}

	for _, tc := range testCases {
		t.Run(tc.name, func(t *testing.T) {
			payload := map[string]string{
				"key":      tc.key,
				"value":    tc.value,
				"category": tc.category,
				"type":     tc.typ,
			}
			body, _ := json.Marshal(payload)

			w := httptest.NewRecorder()
			req, _ := http.NewRequest("POST", "/settings", bytes.NewBuffer(body))
			req.Header.Set("Content-Type", "application/json")
			router.ServeHTTP(w, req)

			assert.Equal(t, http.StatusOK, w.Code)

			// Verify in DB
			var setting models.Setting
			err := db.Where("key = ?", tc.key).First(&setting).Error
			require.NoError(t, err)
			assert.Equal(t, tc.value, setting.Value)
		})
	}
}

Test Cerberus Middleware

File: backend/internal/cerberus/cerberus_test.go (new or existing)

Tests to Add:

  • Middleware checks runtime security.acl.enabled setting
  • Middleware blocks request when ACL enabled and IP not allowed
  • Middleware allows request when ACL disabled
  • Middleware blocks request when ACL enabled and IP blocked

New Test:

func TestCerberus_Middleware_ACLRuntimeSetting(t *testing.T) {
	db := setupTestDB(t)
	require.NoError(t, db.AutoMigrate(&models.Setting{}, &models.AccessList{}))

	// Create ACL that blocks all IPs except 127.0.0.1
	acl := models.AccessList{
		Name:    "Test ACL",
		Type:    "whitelist",
		Enabled: true,
		IPRules: `[{"cidr":"127.0.0.1/32"}]`,
	}
	require.NoError(t, db.Create(&acl).Error)

	cfg := config.SecurityConfig{
		CerberusEnabled: true,
		ACLMode:         "enabled", // Static config enables ACL
	}
	cerb := New(cfg, db)

	router := gin.New()
	router.Use(cerb.Middleware())
	router.GET("/test", func(c *gin.Context) {
		c.JSON(200, gin.H{"ok": true})
	})

	// Test 1: ACL disabled via runtime setting - should allow request
	db.Create(&models.Setting{Key: "security.acl.enabled", Value: "false"})
	w := httptest.NewRecorder()
	req, _ := http.NewRequest("GET", "/test", nil)
	req.RemoteAddr = "192.168.1.100:1234" // Blocked IP
	router.ServeHTTP(w, req)
	assert.Equal(t, http.StatusOK, w.Code, "ACL disabled, should allow")

	// Test 2: ACL enabled via runtime setting - should block request
	db.Model(&models.Setting{}).Where("key = ?", "security.acl.enabled").Update("value", "true")
	w = httptest.NewRecorder()
	req, _ = http.NewRequest("GET", "/test", nil)
	req.RemoteAddr = "192.168.1.100:1234" // Blocked IP
	router.ServeHTTP(w, req)
	assert.Equal(t, http.StatusForbidden, w.Code, "ACL enabled, should block")
}

Test Caddy Manager

File: backend/internal/caddy/manager_test.go (existing)

Tests to Add:

  • ApplyConfig reads runtime security.waf.enabled setting
  • ApplyConfig reads runtime security.rate_limit.enabled setting
  • ApplyConfig reads runtime security.acl.enabled setting
  • Config generation includes WAF handler only when enabled
  • Config generation includes rate limit handler only when enabled

New Test:

func TestCaddyManager_ApplyConfig_RuntimeSettings(t *testing.T) {
	db := setupTestDB(t)
	require.NoError(t, db.AutoMigrate(&models.Setting{}, &models.ProxyHost{}, &models.SecurityConfig{}))

	// Create proxy host
	host := models.ProxyHost{
		DomainNames:   "test.example.com",
		Enabled:       true,
		ForwardScheme: "http",
		ForwardHost:   "localhost",
		ForwardPort:   8080,
	}
	require.NoError(t, db.Create(&host).Error)

	// Create static security config (WAF disabled by default)
	secCfg := models.SecurityConfig{
		Name:    "default",
		Enabled: true,
		WAFMode: "disabled",
	}
	require.NoError(t, db.Create(&secCfg).Error)

	mgr := &Manager{
		db:         db,
		storageDir: t.TempDir(),
		secCfg:     config.SecurityConfig{WAFMode: "disabled"},
	}

	// Test 1: Runtime setting enables WAF - should include WAF handler
	db.Create(&models.Setting{Key: "security.waf.enabled", Value: "true"})

	err := mgr.ApplyConfig(context.Background())
	require.NoError(t, err)

	// Verify config includes WAF handler
	// (Implementation depends on how you verify generated config)
}

2. Frontend Unit Tests

Test Security.tsx Toggle Mutation

File: frontend/src/pages/Security.test.tsx (new or existing)

Tests to Add:

  • toggleServiceMutation calls updateSetting with correct key
  • toggleServiceMutation updates optimistic state correctly
  • toggleServiceMutation rolls back on error
  • toggleServiceMutation invalidates queries on success

New Test (using Vitest + React Testing Library):

import { describe, it, expect, vi, beforeEach } from 'vitest'
import { render, screen, waitFor } from '@testing-library/react'
import userEvent from '@testing-library/user-event'
import { QueryClient, QueryClientProvider } from '@tanstack/react-query'
import Security from './Security'
import * as settingsAPI from '../api/settings'

vi.mock('../api/settings')
vi.mock('../api/security')

describe('Security Toggle Actions', () => {
  let queryClient: QueryClient

  beforeEach(() => {
    queryClient = new QueryClient({
      defaultOptions: { queries: { retry: false } },
    })
  })

  it('should call updateSetting when ACL toggle is clicked', async () => {
    const updateSettingMock = vi.spyOn(settingsAPI, 'updateSetting').mockResolvedValue()

    render(
      <QueryClientProvider client={queryClient}>
        <Security />
      </QueryClientProvider>
    )

    const aclToggle = await screen.findByTestId('toggle-acl')
    await userEvent.click(aclToggle)

    await waitFor(() => {
      expect(updateSettingMock).toHaveBeenCalledWith(
        'security.acl.enabled',
        'true',
        'security',
        'bool'
      )
    })
  })

  it('should show error toast when toggle fails', async () => {
    vi.spyOn(settingsAPI, 'updateSetting').mockRejectedValue(new Error('Network error'))

    render(
      <QueryClientProvider client={queryClient}>
        <Security />
      </QueryClientProvider>
    )

    const wafToggle = await screen.findByTestId('toggle-waf')
    await userEvent.click(wafToggle)

    await waitFor(() => {
      expect(screen.getByText(/failed to update setting/i)).toBeInTheDocument()
    })
  })
})

3. E2E Tests (Playwright)

File: tests/security/security-dashboard.spec.ts (already written)

Tests to Enable (currently skipped with runtime check):

  • should toggle ACL enabled/disabled (lines 118-138)
  • should toggle WAF enabled/disabled (lines 140-160)
  • should toggle Rate Limiting enabled/disabled (lines 162-182)
  • should persist toggle state after page reload (lines 184-216)

Current Skip Logic:

test('should toggle ACL enabled/disabled', async ({ page }) => {
  const toggle = page.getByTestId('toggle-acl');

  // Check if toggle is disabled (Cerberus must be enabled for toggles to work)
  const isDisabled = await toggle.isDisabled();
  if (isDisabled) {
    test.info().annotations.push({
      type: 'skip-reason',
      description: 'Toggle is disabled because Cerberus security is not enabled'
    });
    test.skip();
    return;
  }

  // ... test logic ...
});

After Implementation: These tests will automatically pass once toggles are functional (no code changes needed).

File: tests/security/rate-limiting.spec.ts (already written)

Tests to Enable:

  • should toggle rate limiting on/off (lines 42-67)

Implementation Phases

Phase 0: Cerberus DB Injection (2 hours) ALREADY COMPLETE

Objective: Add DB field to Cerberus struct and update all instantiation sites.

STATUS: SKIP THIS PHASE - Verified complete as of 2026-01-24

The Supervisor review confirmed that:

  • Cerberus struct already has db *gorm.DB field (lines 20-32)
  • Constructor New() already accepts *gorm.DB parameter
  • Only one production instantiation site exists: routes.go:107
  • Test files manage their own mock instances

Time Saved: 2 hours

Proceed directly to Phase 1.


Phase 1: Backend Middleware Updates (5 hours)

Objective: Make middleware honor runtime settings and add performance cache layer.

Prerequisites: Phase 0 already complete (DB injection verified in place).

Tasks:

  1. Update backend/internal/cerberus/cerberus.go:

    • Add cache fields (settingsCache, mutex, TTL)
    • Implement getSetting() method with 60s TTL cache
    • Implement InvalidateCache() method
    • Update Middleware() to use cached settings
    • Add unit tests for cache behavior
    • Add benchmark tests for cache performance
  2. Update backend/internal/api/handlers/settings_handler.go:

    • Add CaddyManager field to struct
    • Add Cerberus field to struct (for cache invalidation)
    • Update UpdateSetting() to trigger config reload for security.* keys
    • Add async reload with 30s timeout
    • Add cache invalidation call
    • Add unit tests for reload trigger
  3. Update backend/internal/caddy/manager.go:

    • Query runtime settings before calling GenerateConfig()
    • Pass runtime-enabled flags to GenerateConfig()
    • Add unit tests for runtime setting integration
  4. Update constructor injection:

    • NewSettingsHandler() receives CaddyManager and Cerberus
    • Update all handler instantiation sites

Files to Modify:

  • backend/internal/cerberus/cerberus.go (~120 lines changed/added)
  • backend/internal/api/handlers/settings_handler.go (~40 lines changed/added)
  • backend/internal/caddy/manager.go (~30 lines added)
  • backend/internal/cerberus/cerberus_test.go (~150 lines new tests)
  • backend/internal/cerberus/cerberus_bench_test.go (~30 lines new file)
  • backend/internal/api/handlers/settings_handler_test.go (~100 lines new tests)
  • backend/internal/caddy/manager_test.go (~50 lines added)
  • backend/internal/api/server.go (~10 lines handler setup)

Validation:

# Run backend unit tests
cd backend
go test ./internal/cerberus/...
go test ./internal/caddy/...
go test ./internal/api/handlers/...

# Run benchmarks
go test -bench=. ./internal/cerberus/...

Phase 2: Frontend Toggle Handlers (2 hours)

Objective: Fix optimistic update logic and Switch component usage in Security.tsx.

Tasks:

  1. Update frontend/src/pages/Security.tsx:

    • Replace optimistic update logic in toggleServiceMutation (preserve mode field)
    • Fix all three toggle components to use onCheckedChange instead of onChange
    • Ensure correct SecurityStatus type handling with spread operators
    • Add TypeScript type guards for safety
    • Add unit tests for optimistic update logic
  2. Verify Switch component is correct:

    • Confirm onCheckedChange prop exists and works
    • No changes needed to Switch component itself

Files to Modify:

  • frontend/src/pages/Security.tsx (~80 lines changed)
  • frontend/src/pages/Security.test.tsx (~100 lines new tests)

Critical Fixes:

  1. Preserve mode field: WAF and rate_limit have {enabled: boolean, mode: string} - must use spread operator
  2. Use onCheckedChange: Receives boolean directly, not Event object
  3. Apply to all toggles: ACL, WAF, Rate Limit

Validation:

# Run frontend unit tests
cd frontend
npm test -- Security.test.tsx

Phase 3: Integration Testing (4 hours)

Objective: Validate end-to-end toggle functionality.

Tasks:

  1. Run E2E tests against Docker container:

    npx playwright test tests/security/security-dashboard.spec.ts --project=chromium
    npx playwright test tests/security/rate-limiting.spec.ts --project=chromium
    
  2. Verify all 8 previously skipped tests now pass

  3. Manual testing:

    • Toggle ACL on/off, verify status persists
    • Toggle WAF on/off, verify status persists
    • Toggle Rate Limit on/off, verify status persists
    • Refresh page, verify state persists
    • Verify middleware blocks requests when ACL enabled
    • Verify middleware allows requests when ACL disabled
  4. Test edge cases:

    • Toggle while Cerberus disabled (should be disabled)
    • Toggle during pending state (should be disabled)
    • Network error during toggle (should rollback)
    • ⚠️ NEW: Config reload failure (setting should still save)
    • ⚠️ NEW: Concurrent toggles (100 simultaneous toggles)
    • ⚠️ NEW: Cache refresh (verify 60s TTL works)
    • ⚠️ NEW: Mode field preservation (WAF and rate_limit)

Validation:

  • All 8 E2E tests pass
  • Manual toggle works in UI
  • Settings persist across page reloads
  • Middleware respects runtime settings

Phase 4: Documentation and Cleanup (2 hours)

Objective: Update documentation and finalize implementation.

Tasks:

  1. Update docs/plans/skipped-tests-remediation.md:

    • Mark Phase 4 as complete
    • Update test count (63 → 55 skipped)
    • Add Phase 4 completion summary
  2. Update docs/features.md:

    • Document security module toggle functionality
    • Add screenshots if needed
  3. Update CHANGELOG.md:

    • Add Phase 4 completion entry
  4. Code cleanup:

    • Remove debug logging
    • Add JSDoc comments to new functions
    • Run linters and fix issues

Files to Modify:

  • docs/plans/skipped-tests-remediation.md (update progress)
  • docs/features.md (add toggle documentation)
  • CHANGELOG.md (add entry)

File Modification Checklist

Backend Files

File Lines Changed Effort Status
backend/internal/cerberus/cerberus.go ~135 (struct, cache, middleware) 2.5h TODO
backend/internal/api/handlers/settings_handler.go ~40 (reload trigger) 1h TODO
backend/internal/caddy/manager.go ~30 (runtime settings) 1h TODO
backend/internal/api/server.go ~15 (handler setup) 0.5h TODO
backend/internal/cerberus/cerberus_test.go ~150 (new tests) 2.5h TODO
backend/internal/cerberus/cerberus_bench_test.go ~30 (new file) 0.5h TODO
backend/internal/api/handlers/settings_handler_test.go ~100 (new tests) 1.5h TODO
backend/internal/caddy/manager_test.go ~50 (add tests) 1h TODO

Total Backend: 8 files, ~550 lines, 10.5 hours

Frontend Files

File Lines Changed Effort Status
frontend/src/pages/Security.tsx ~80 (optimistic update + onCheckedChange) 1.5h TODO
frontend/src/pages/Security.test.tsx ~120 (new tests) 1.5h TODO

Total Frontend: 2 files, ~200 lines, 3 hours

Test Files

File Lines Changed Effort Status
tests/security/security-dashboard.spec.ts 0 (already written) 2h (validation) TODO
tests/security/rate-limiting.spec.ts 0 (already written) 0.5h (validation) TODO

Total Test: 2 files, 0 lines changed, 2.5 hours validation

Documentation Files

File Lines Changed Effort Status
docs/plans/skipped-tests-remediation.md ~50 0.5h TODO
docs/features.md ~30 0.5h TODO
CHANGELOG.md ~10 0.25h TODO

Total Documentation: 3 files, ~90 lines, 1.25 hours

Grand Total

Category Files Lines Effort
Backend 8 ~550 8.5h
Frontend 2 ~200 3h
Tests 2 0 2.5h
Docs 3 ~90 1h
TOTAL 15 ~840 15h

With buffer: 13-15 hours (2 days)

Revised Effort (2026-01-24 Supervisor Review):

  • DB injection prerequisite: +2hSKIP (already complete, saves 2h)
  • Cache layer implementation: +3h
  • Config reload trigger: +1.5h
  • Enhanced testing (concurrent, cache, reload failures): +1.5h
  • Frontend fixes (mode preservation, onCheckedChange): +1h
  • Documentation streamlined: -0.25h

Validation Criteria

Phase 0 Complete (Prerequisites) VERIFIED COMPLETE

  • Cerberus struct has db *gorm.DB field (verified 2026-01-24)
  • Cerberus New() constructor accepts *gorm.DB parameter (verified 2026-01-24)
  • All instantiation sites already pass db (routes.go:107)
  • Compilation successful (go build ./...)
  • Import for "strings" package added (needed for Phase 1 middleware updates)

Phase 1 Complete (Backend) COMPLETE 2026-01-24

  • Cerberus has cache fields (settingsCache, mutex, TTL)
  • Cerberus implements getSetting() with 60s TTL
  • Cerberus implements InvalidateCache() method
  • Cerberus middleware uses cached settings (not direct DB queries)
  • SettingsHandler has CaddyManager and Cerberus fields
  • SettingsHandler triggers config reload for security.* keys
  • SettingsHandler invalidates Cerberus cache on update
  • Config reload is async with 30s timeout
  • Caddy manager queries runtime settings before config generation
  • All backend unit tests pass (go test ./...)
  • Benchmark tests show cache performance improvement
  • No staticcheck errors (staticcheck ./...)

Phase 2 Complete (Frontend) COMPLETE 2026-01-24

  • Security.tsx optimistic update preserves mode field for WAF and rate_limit
  • All toggle components use onCheckedChange (not onChange)
  • Toggle mutations call updateSetting with correct keys
  • Error handling rolls back optimistic updates
  • Success handler invalidates queries correctly
  • Spread operator used correctly: { ...copy.waf, enabled }
  • All frontend unit tests pass (npm test)
  • Unit tests verify mode field preservation
  • No TypeScript errors (npm run type-check)
  • No ESLint errors (npm run lint)

Phase 3 Complete (E2E) COMPLETE 2026-01-24

  • Test: should toggle ACL enabled/disabled passes
  • Test: should toggle WAF enabled/disabled passes
  • Test: should toggle Rate Limiting enabled/disabled passes
  • Test: should persist toggle state after page reload passes
  • Test: should toggle rate limiting on/off passes (rate-limiting.spec.ts)
  • Manual test: Toggle ACL, verify middleware blocks/allows requests
  • Manual test: Toggle state persists across browser refresh
  • Manual test: Error toast displays on network failure
  • Manual test: Config reload failure doesn't block UI toggle
  • Manual test: Concurrent toggles (stress test with 100 toggles)
  • Manual test: Cache refresh (wait 60s, verify new queries)
  • Manual test: Mode field preserved (WAF/rate_limit still show mode after toggle)

Phase 4 Complete (Documentation) COMPLETE 2026-01-24

  • skipped-tests-remediation.md updated with Phase 4 completion
  • features.md documents toggle functionality
  • CHANGELOG.md includes Phase 4 entry
  • All linters pass
  • Code review complete

Final Acceptance COMPLETE 2026-01-24

  • 8 E2E tests passing (down from 7 skipped)
  • Total skipped tests: 55 (down from 63)
  • Backend coverage ≥85% (no regression)
  • Frontend coverage ≥85% (no regression)
  • Zero staticcheck errors
  • Zero TypeScript errors
  • Zero ESLint errors
  • PR approved and merged

Risk Mitigation

Risk 1: Middleware Performance Impact

Risk: Querying settings table on every request may slow down Cerberus middleware.

Likelihood: Low (DB queries are fast, <1ms)

Mitigation:

  1. Add in-memory cache for settings with 60-second TTL
  2. Invalidate cache when setting is updated
  3. Profile middleware with and without cache

Fallback: If performance degrades >10ms per request, implement caching layer.

Risk 2: Race Condition Between Toggle and Status Refresh

Risk: User toggles switch while status query is in flight, causing stale UI state.

Likelihood: Medium (fast users or slow networks)

Mitigation:

  1. Optimistic updates handle this gracefully
  2. Query invalidation ensures eventual consistency
  3. Disable toggle during mutation

Fallback: Add version/timestamp to settings and reject stale updates.

Risk 3: Caddy Config Not Applied After Toggle

Risk: User toggles setting but Caddy config isn't regenerated, so WAF/rate limit don't reflect new state.

Likelihood: High (config generation is manual)

Mitigation:

  1. ApplyConfig is called automatically on toggle via query invalidation
  2. Add explicit Caddy config reload trigger after settings update
  3. Document that config reload may take 1-2 seconds

Fallback: Add "Apply Changes" button to manually trigger config reload.


Appendix A: API Endpoint Reference

Existing Endpoints (No Changes)

Method Endpoint Description Handler
GET /api/v1/security/status Get security module status security_handler.go:GetStatus()
POST /api/v1/settings Update a setting settings_handler.go:UpdateSetting()
GET /api/v1/settings Get all settings settings_handler.go:GetSettings()

Settings Keys Used

Key Type Category Description
security.acl.enabled bool security ACL module enabled/disabled
security.waf.enabled bool security WAF module enabled/disabled
security.rate_limit.enabled bool security Rate limit enabled/disabled
security.crowdsec.enabled bool security CrowdSec enabled/disabled (already working)

Appendix B: Test Coverage Goals

Backend Unit Tests

Target: 85% minimum coverage for modified files

File Current Coverage Target Gap
cerberus/cerberus.go ~70% 85% +15%
caddy/manager.go ~80% 85% +5%

New Tests Required:

  • Cerberus middleware with runtime settings (5 tests)
  • Caddy manager runtime setting integration (3 tests)

Frontend Unit Tests

Target: 85% minimum coverage for modified files

File Current Coverage Target Gap
pages/Security.tsx ~60% 85% +25%

New Tests Required:

  • Toggle mutation logic (4 tests)
  • Optimistic update logic (3 tests)
  • Error handling (2 tests)

E2E Tests

Target: All previously skipped tests pass

Test Suite Tests to Pass Current Passing Gap
security-dashboard.spec.ts 4 0 +4
rate-limiting.spec.ts 1 0 +1
TOTAL 5 0 +5

Appendix C: Debugging Guide

Issue: Toggle Doesn't Update UI

Symptoms: Clicking toggle doesn't change visual state.

Diagnosis:

  1. Check browser console for errors
  2. Verify mutation is called: console.log in toggleServiceMutation
  3. Check network tab: POST /api/v1/settings should return 200
  4. Verify optimistic update logic updates correct section

Fix:

  • If no mutation call: Check Switch onChange handler
  • If no network request: Check mutation function signature
  • If network error: Check backend logs
  • If UI doesn't update: Check optimistic update logic

Issue: Toggle Updates UI But Doesn't Persist

Symptoms: Toggle works, but state resets on page reload.

Diagnosis:

  1. Check DB: SELECT * FROM settings WHERE key LIKE 'security.%.enabled'
  2. Verify POST /api/v1/settings returns 200 with updated setting
  3. Check GET /api/v1/security/status returns correct enabled state

Fix:

  • If setting not in DB: Check UpdateSetting handler
  • If setting in DB but status wrong: Check GetStatus priority chain
  • If status correct but UI wrong: Check React Query cache

Issue: Middleware Doesn't Block Requests

Symptoms: ACL enabled but requests still go through.

Diagnosis:

  1. Check Cerberus middleware logs: Should see DB query
  2. Verify setting exists: SELECT * FROM settings WHERE key = 'security.acl.enabled'
  3. Check access list exists and is enabled
  4. Verify client IP matches blocked range

Fix:

  • If no DB query logged: Middleware not reading runtime setting
  • If setting not found: Create setting via UI toggle
  • If ACL not enabled: Enable ACL in UI
  • If IP not blocked: Check access list CIDR ranges

Conclusion

This specification provides a complete, actionable plan for implementing security module toggle actions in Phase 4. The implementation leverages existing infrastructure (Settings table, UpdateSetting endpoint) rather than creating new APIs, minimizing scope and complexity.

Key Success Factors:

  1. Minimal Backend Changes: Only middleware and Caddy manager need updates
  2. Frontend Fix: Simple optimistic update logic correction
  3. Zero New Endpoints: Reuse /api/v1/settings for all toggles
  4. Tests Already Written: E2E tests will pass once toggles work
  5. Clear Validation: 8 tests passing = Phase 4 complete

Next Steps:

  1. Review this spec with team
  2. Begin Phase 1: Backend middleware updates
  3. Test each phase incrementally
  4. Enable E2E tests after Phase 3
  5. Update documentation in Phase 4

Estimated Timeline: 2 days (13-15 hours) for complete implementation and validation.

Revised Phases (Phase 0 skipped):

  1. Phase 1: Backend Middleware Updates (5h) - START HERE
  2. Phase 2: Frontend Toggle Handlers (2h) - Can parallelize with Phase 1
  3. Phase 3: Integration Testing (4h)
  4. Phase 4: Documentation and Cleanup (2h)