Files
Charon/docs/plans/archive/phase4_security_toggles_spec.md
2026-02-19 16:34:10 +00:00

1817 lines
59 KiB
Markdown

# Phase 4: Security Module Toggle Actions - Implementation Specification
> **Status**: ✅ IMPLEMENTED
> **Created**: 2026-01-23
> **Last Updated**: 2026-01-24
> **Implementation Completed**: 2026-01-24
> **Estimated Effort**: 13-15 hours (2 days)
> **Priority**: P0 - Critical (Unblocks 8 skipped E2E tests)
> **Dependencies**: None (can start immediately)
>
> ⚠️ **CRITICAL FIXES APPLIED**: This spec has been updated to address P0 issues identified in supervisor review:
> - Frontend optimistic update preserves required fields (mode)
> - Cerberus DB injection pattern documented
> - Config reload trigger requirements added
> - Performance cache layer specified
> - Switch component uses onCheckedChange (not onChange)
>
> ✅ **FINAL REVIEW 2026-01-24**: Supervisor verified implementation prerequisites:
> - Phase 0 (Cerberus DB injection) is **ALREADY COMPLETE** - Cerberus struct already has `db *gorm.DB` field
> - Only `routes.go:107` instantiates Cerberus in production code
> - Revised effort: 13-15 hours (reduced from 16-20h due to Phase 0 skip)
> - All prerequisite files verified to exist
## Executive Summary
This specification provides a detailed implementation plan for enabling toggle functionality for three security modules (ACL, WAF, Rate Limiting) in the Charon SecurityDashboard. Currently, these modules display status but cannot be toggled on/off through the UI. The frontend already has toggle UI components in place with proper `data-testid` attributes; they are currently **disabled** and non-functional. This phase implements the backend logic, frontend handlers, and middleware integration to make these toggles fully operational.
**Tests to Enable**: 8 E2E tests in `tests/security/security-dashboard.spec.ts` and `tests/security/rate-limiting.spec.ts`
**Current State**:
- ✅ Frontend UI: Toggle switches exist with proper test IDs
- ✅ Backend Status API: `/api/v1/security/status` returns enabled/disabled states
- ✅ Database Schema: `settings` table stores per-module settings
-**Missing**: Backend toggle endpoints (no POST routes for enable/disable)
-**Missing**: Frontend mutation handlers are non-functional (call generic `updateSetting` API)
-**Missing**: Middleware does not fully honor settings-based enabled/disabled states
---
## Table of Contents
1. [Architecture Overview](#architecture-overview)
2. [Database Schema](#database-schema)
3. [Backend Implementation](#backend-implementation)
4. [Frontend Implementation](#frontend-implementation)
5. [Middleware Updates](#middleware-updates)
6. [Testing Strategy](#testing-strategy)
7. [Implementation Phases](#implementation-phases)
8. [File Modification Checklist](#file-modification-checklist)
9. [Validation Criteria](#validation-criteria)
---
## Architecture Overview
### Current Flow (Read-Only Status)
```
┌─────────────────────────┐
│ Frontend UI │
│ - SecurityDashboard │
│ - Toggle switches │
│ - (Disabled) │
└───────────┬─────────────┘
┌─────────────────────────┐
│ GET /security/status │
│ - security_handler.go │
│ - Reads DB settings │
│ - Returns JSON status │
└─────────────────────────┘
┌─────────────────────────┐
│ Database │
│ - settings table │
│ - security.*.enabled │
└─────────────────────────┘
```
### Target Flow (Toggle Actions)
```
┌─────────────────────────┐
│ Frontend UI │
│ - Toggle ACL │──┐
│ - Toggle WAF │ │
│ - Toggle Rate Limit │ │
└─────────────────────────┘ │
│ (onChange)
┌─────────────────────────────────────┐
│ POST /settings │
│ - settings_handler.go │
│ - UpdateSetting() │
│ - Validates key/value │
│ - Upserts to settings table │
└─────────────┬───────────────────────┘
┌─────────────────────────────────────┐
│ Database │
│ - settings.key = "security.*.enabled" │
│ - settings.value = "true"/"false" │
└─────────────┬───────────────────────┘
┌─────────────────────────────────────┐
│ Middleware / Caddy Config │
│ - Cerberus.Middleware() │
│ - caddy/config.go │
│ - Honors settings │
└─────────────────────────────────────┘
```
**Key Insight**: The backend `/settings` endpoint and database schema already exist. We are **reusing existing infrastructure** rather than creating new endpoints. The challenge is:
1. Frontend needs to send correct setting keys
2. Middleware needs to check these settings consistently
3. Caddy config generation needs to respect runtime settings
---
## Database Schema
### Existing Schema (No Changes Required)
#### `settings` Table
Already supports all required keys:
| Column | Type | Index | Description |
|-----------|-----------|------------|------------------------------------------|
| id | INTEGER | PK | Auto-increment primary key |
| key | VARCHAR | UNIQUE | Setting key (e.g., `security.acl.enabled`) |
| value | TEXT | | Setting value (`"true"` or `"false"`) |
| type | VARCHAR | INDEX | Type hint (`"bool"`) |
| category | VARCHAR | INDEX | Category (`"security"`) |
| updated_at| TIMESTAMP | | Last update timestamp |
**Existing Settings Keys**:
- `security.acl.enabled` - ACL module toggle
- `security.waf.enabled` - WAF module toggle
- `security.rate_limit.enabled` - Rate limiting toggle
- `security.crowdsec.enabled` - CrowdSec toggle (already working)
**No migration needed** - schema supports all requirements out of the box.
---
## Backend Implementation
### 1. Settings Handler (Already Exists - No Changes)
**File**: `backend/internal/api/handlers/settings_handler.go`
**Current Implementation**:
```go
// UpdateSetting updates or creates a setting.
func (h *SettingsHandler) UpdateSetting(c *gin.Context) {
var req UpdateSettingRequest
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
setting := models.Setting{
Key: req.Key,
Value: req.Value,
}
if req.Category != "" {
setting.Category = req.Category
}
if req.Type != "" {
setting.Type = req.Type
}
// Upsert
if err := h.DB.Where(models.Setting{Key: req.Key}).Assign(setting).FirstOrCreate(&setting).Error; err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save setting"})
return
}
c.JSON(http.StatusOK, setting)
}
```
**Route**: `POST /api/v1/settings` (already registered in `routes.go:200`)
### 1. Settings Handler (Requires Config Reload Trigger)
**⚠️ CRITICAL ADDITION**: SettingsHandler must trigger Caddy config reload when security settings change.
**File**: `backend/internal/api/handlers/settings_handler.go`
**Current Implementation** (❌ Missing reload trigger):
```go
// UpdateSetting updates or creates a setting.
func (h *SettingsHandler) UpdateSetting(c *gin.Context) {
var req UpdateSettingRequest
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
setting := models.Setting{
Key: req.Key,
Value: req.Value,
}
if req.Category != "" {
setting.Category = req.Category
}
if req.Type != "" {
setting.Type = req.Type
}
// Upsert
if err := h.DB.Where(models.Setting{Key: req.Key}).Assign(setting).FirstOrCreate(&setting).Error; err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save setting"})
return
}
c.JSON(http.StatusOK, setting)
// ❌ MISSING: Caddy config reload for security.* settings
}
```
**Updated Implementation** (✅ With config reload):
```go
import (
"strings"
"context"
"time"
// ... other imports ...
)
type SettingsHandler struct {
DB *gorm.DB
CaddyManager CaddyConfigManager // ✅ Add CaddyManager interface
}
// CaddyConfigManager interface for reload triggering
type CaddyConfigManager interface {
ApplyConfig(ctx context.Context) error
}
// UpdateSetting updates or creates a setting.
func (h *SettingsHandler) UpdateSetting(c *gin.Context) {
var req UpdateSettingRequest
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
setting := models.Setting{
Key: req.Key,
Value: req.Value,
}
if req.Category != "" {
setting.Category = req.Category
}
if req.Type != "" {
setting.Type = req.Type
}
// Upsert
if err := h.DB.Where(models.Setting{Key: req.Key}).Assign(setting).FirstOrCreate(&setting).Error; err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to save setting"})
return
}
// ✅ Trigger Caddy config reload for security settings
if h.CaddyManager != nil && strings.HasPrefix(req.Key, "security.") {
go func() {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := h.CaddyManager.ApplyConfig(ctx); err != nil {
// Log error but don't fail the setting update
logger.Log().WithError(err).Warn("Failed to reload Caddy config after security setting change")
}
}()
}
c.JSON(http.StatusOK, setting)
}
```
**Key Changes**:
1. ✅ Add `CaddyManager` field to `SettingsHandler` struct
2. ✅ Define `CaddyConfigManager` interface with `ApplyConfig` method
3. ✅ Trigger async config reload when `security.*` settings change
4. ✅ Use goroutine with timeout to avoid blocking HTTP response
5. ✅ Log reload errors but don't fail the setting update
**Constructor Update Required**:
```go
// In server.go or wherever SettingsHandler is created:
func NewSettingsHandler(db *gorm.DB, caddyMgr *caddy.Manager) *SettingsHandler {
return &SettingsHandler{
DB: db,
CaddyManager: caddyMgr, // ✅ Inject CaddyManager
}
}
```
**Why Async**: Config reload can take 1-2 seconds; we don't want to block the HTTP response. The setting is saved immediately, and config reload happens in the background.
**Error Handling**: If reload fails, the setting is still saved. Users can manually retry the toggle or trigger a manual config reload.
**Route**: `POST /api/v1/settings` (already registered in `routes.go:200`)
### 2. Security Status Endpoint (✅ ZERO CHANGES NEEDED)
**⚠️ IMPORTANT**: This endpoint is already **100% correct** and reads runtime settings with highest priority.
**File**: `backend/internal/api/handlers/security_handler.go`
**Current Implementation** (lines 54-189) - **DO NOT MODIFY**:
```go
func (h *SecurityHandler) GetStatus(c *gin.Context) {
// Priority chain:
// 1. Settings table (highest - runtime overrides)
// 2. SecurityConfig DB record (middle - user configuration)
// 3. Static config (lowest - defaults)
// ... loads from SecurityConfig first ...
// Settings table overrides (PRIORITY 1 - highest)
var setting struct{ Value string }
// WAF enabled override
setting = struct{ Value string }{}
if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.waf.enabled").Scan(&setting).Error; err == nil && setting.Value != "" {
if strings.EqualFold(setting.Value, "true") {
wafMode = "enabled"
} else {
wafMode = "disabled"
}
}
// Rate Limit enabled override
setting = struct{ Value string }{}
if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.rate_limit.enabled").Scan(&setting).Error; err == nil && setting.Value != "" {
if strings.EqualFold(setting.Value, "true") {
rateLimitMode = "enabled"
} else {
rateLimitMode = "disabled"
}
}
// ACL enabled override
setting = struct{ Value string }{}
if err := h.db.Raw("SELECT value FROM settings WHERE key = ? LIMIT 1", "security.acl.enabled").Scan(&setting).Error; err == nil && setting.Value != "" {
if strings.EqualFold(setting.Value, "true") {
aclMode = "enabled"
} else {
aclMode = "disabled"
}
}
// ... continues to build response ...
}
```
**✅ Already implemented** - Backend correctly reads runtime settings with highest priority.
**Action Item**: None - endpoint is fully functional.
---
## Frontend Implementation
### 1. Update Security.tsx Toggle Handlers
**File**: `frontend/src/pages/Security.tsx` (lines 100-160)
**Current Issue**: The `toggleServiceMutation` uses a generic `updateSetting` call, but the implementation doesn't correctly trigger optimistic updates or invalidate queries properly.
**Current Code** (lines 100-160):
```typescript
// Generic toggle mutation for per-service settings
const toggleServiceMutation = useMutation({
mutationFn: async ({ key, enabled }: { key: string; enabled: boolean }) => {
await updateSetting(key, enabled ? 'true' : 'false', 'security', 'bool')
},
onMutate: async ({ key, enabled }: { key: string; enabled: boolean }) => {
await queryClient.cancelQueries({ queryKey: ['security-status'] })
const previous = queryClient.getQueryData(['security-status'])
queryClient.setQueryData(['security-status'], (old: unknown) => {
if (!old || typeof old !== 'object') return old
const parts = key.split('.')
const section = parts[1] as keyof SecurityStatus
const field = parts[2]
const copy = { ...(old as SecurityStatus) }
if (copy[section] && typeof copy[section] === 'object') {
copy[section] = { ...copy[section], [field]: enabled } as never
}
return copy
})
return { previous }
},
onError: (_err, _vars, context: unknown) => {
if (context && typeof context === 'object' && 'previous' in context) {
queryClient.setQueryData(['security-status'], context.previous)
}
const msg = _err instanceof Error ? _err.message : String(_err)
toast.error(`Failed to update setting: ${msg}`)
},
onSuccess: () => {
queryClient.invalidateQueries({ queryKey: ['settings'] })
queryClient.invalidateQueries({ queryKey: ['security-status'] })
toast.success('Security setting updated')
},
})
```
**Problem**: The optimistic update logic assumes the SecurityStatus shape has `section[field]`, but the actual shape is:
- `status.acl.enabled`
- `status.waf.enabled`
- `status.rate_limit.enabled`
The current code tries to parse `key = "security.acl.enabled"` into `section = "acl"`, `field = "enabled"`, which is correct, but then assigns `copy[section][field]` which may fail if the section object structure is wrong.
**Solution**: Fix the optimistic update to preserve all required fields, especially `mode` for WAF and rate_limit.
**⚠️ CRITICAL BUG FIX**: The old code would drop the `mode` field from WAF and rate_limit sections, breaking the UI.
**SecurityStatus Interface** (for reference):
```typescript
interface SecurityStatus {
acl: { enabled: boolean }
waf: { enabled: boolean; mode: string } // ⚠️ mode is REQUIRED
rate_limit: { enabled: boolean; mode: string } // ⚠️ mode is REQUIRED
cerberus?: { enabled: boolean }
}
```
**Updated Code** (replace lines 100-160):
```typescript
// Generic toggle mutation for per-service settings
const toggleServiceMutation = useMutation({
mutationFn: async ({ key, enabled }: { key: string; enabled: boolean }) => {
await updateSetting(key, enabled ? 'true' : 'false', 'security', 'bool')
},
onMutate: async ({ key, enabled }: { key: string; enabled: boolean }) => {
// Cancel ongoing queries to avoid race conditions
await queryClient.cancelQueries({ queryKey: ['security-status'] })
// Snapshot current state for rollback
const previous = queryClient.getQueryData(['security-status'])
// Optimistic update: parse key like "security.acl.enabled" -> section "acl"
queryClient.setQueryData(['security-status'], (old: unknown) => {
if (!old || typeof old !== 'object') return old
const oldStatus = old as SecurityStatus
const copy = { ...oldStatus }
// Extract section from key (e.g., "security.acl.enabled" -> "acl")
const parts = key.split('.')
const section = parts[1] as keyof SecurityStatus
// ✅ CRITICAL: Spread existing section data to preserve fields like 'mode'
// Update ONLY the enabled field, keep everything else intact
if (section === 'acl') {
copy.acl = { ...copy.acl, enabled }
} else if (section === 'waf') {
// ⚠️ Preserve mode field (detection/prevention)
copy.waf = { ...copy.waf, enabled }
} else if (section === 'rate_limit') {
// ⚠️ Preserve mode field (log/block)
copy.rate_limit = { ...copy.rate_limit, enabled }
}
return copy
})
return { previous }
},
onError: (_err, _vars, context: unknown) => {
// Rollback on error
if (context && typeof context === 'object' && 'previous' in context) {
queryClient.setQueryData(['security-status'], context.previous)
}
const msg = _err instanceof Error ? _err.message : String(_err)
toast.error(`Failed to update setting: ${msg}`)
},
onSuccess: () => {
// Refresh data from server
queryClient.invalidateQueries({ queryKey: ['settings'] })
queryClient.invalidateQueries({ queryKey: ['security-status'] })
toast.success('Security setting updated')
},
})
```
**Why This Matters**: WAF and rate_limit have a `mode` field (e.g., `{enabled: true, mode: "detection"}`) that must be preserved during optimistic updates. The spread operator `...copy.waf` ensures we only update `enabled` while keeping `mode` intact.
**File Changes**:
- `frontend/src/pages/Security.tsx` (lines 100-160)
- No API client changes needed - `updateSetting` in `frontend/src/api/settings.ts` already correct
### 2. Verify Toggle Component Integration
**File**: `frontend/src/pages/Security.tsx` (lines 420-520)
**Current Implementation**:
```tsx
{/* ACL - Layer 2: Access Control */}
<Card variant="interactive" className="flex flex-col">
<CardFooter className="justify-between pt-4">
<Tooltip>
<TooltipTrigger asChild>
<div>
<Switch
checked={status.acl.enabled}
disabled={!status.cerberus?.enabled}
onCheckedChange={(checked) => toggleServiceMutation.mutate({
key: 'security.acl.enabled',
enabled: checked
})}
data-testid="toggle-acl"
/>
</div>
</TooltipTrigger>
<TooltipContent>
<p>{cerberusDisabled ? t('security.enableCerberusFirst') : t('security.toggleAcl')}</p>
</TooltipContent>
</Tooltip>
{/* ... Configure button ... */}
</CardFooter>
</Card>
```
**⚠️ CRITICAL FIX**: Use `onCheckedChange` (not `onChange`) for Switch component:
- `onCheckedChange` receives `boolean` directly
- `onChange` receives `Event` object (legacy pattern)
**Apply to all toggles**:
- ✅ ACL: `security.acl.enabled`
- ✅ WAF: `security.waf.enabled`
- ✅ Rate Limit: `security.rate_limit.enabled`
**Action Items**:
1. Fix optimistic update logic (see section 1 above)
2. Replace `onChange` with `onCheckedChange` in all three toggle components
### 3. Update Switch Component (If Needed)
**File**: `frontend/src/components/ui/Switch.tsx`
**Current Implementation** (lines 1-50):
```tsx
const Switch = React.forwardRef<HTMLInputElement, SwitchProps>(
({ className, onCheckedChange, onChange, id, disabled, ...props }, ref) => {
return (
<label
htmlFor={id}
className={cn(
'relative inline-flex items-center',
disabled ? 'cursor-not-allowed opacity-50' : 'cursor-pointer',
className
)}
>
<input
id={id}
type="checkbox"
className="sr-only peer"
ref={ref}
disabled={disabled}
onChange={(e) => {
onChange?.(e)
onCheckedChange?.(e.target.checked)
}}
{...props}
/>
{/* ... visual toggle styling ... */}
</label>
)
}
)
```
**✅ No changes needed** - Component correctly:
1. Accepts `onChange` and `onCheckedChange` props
2. Supports `disabled` state
3. Renders accessible checkbox with visual toggle
---
## Middleware Updates
### 0. Cerberus Struct DB Injection (PREREQUISITE)
**✅ ALREADY COMPLETE**: Cerberus already has access to `*gorm.DB` to query runtime settings.
**File**: `backend/internal/cerberus/cerberus.go` (lines 20-32)
**Current Struct** (verified 2026-01-24):
```go
type Cerberus struct {
cfg config.SecurityConfig
db *gorm.DB // ✅ Already exists
accessSvc *services.AccessListService
securityNotifySvc *services.SecurityNotificationService
}
func New(cfg config.SecurityConfig, db *gorm.DB) *Cerberus { // ✅ Already accepts db
return &Cerberus{
cfg: cfg,
db: db,
}
}
```
**No Changes Required** - The prerequisite is already satisfied.
**Instantiation Sites** (verified):
- `backend/internal/api/routes/routes.go:107` - Primary instantiation site
- Test files use their own mock instances
**Validation Complete**:
```bash
# ✅ Verified 2026-01-24
grep -rn "cerberus.New(" backend/
# routes/routes.go:107: cerb := cerberus.New(cfg.Security, db)
```
---
### 1. Cerberus Middleware ACL Check
**File**: `backend/internal/cerberus/cerberus.go` (lines 85-148)
**Prerequisites**: DB field must be added (see section 0 above)
**Current Implementation** (lines 105-135):
```go
func (c *Cerberus) Middleware() gin.HandlerFunc {
return func(ctx *gin.Context) {
if !c.IsEnabled() {
ctx.Next()
return
}
// WAF tracking
if c.cfg.WAFMode != "" && c.cfg.WAFMode != "disabled" {
metrics.IncWAFRequest()
}
// ACL: simple per-request evaluation against all access lists if enabled
if c.cfg.ACLMode == "enabled" {
acls, err := c.accessSvc.List()
if err == nil {
clientIP := ctx.ClientIP()
for _, acl := range acls {
if !acl.Enabled {
continue
}
allowed, _, err := c.accessSvc.TestIP(acl.ID, clientIP)
if err == nil && !allowed {
// Send security notification
_ = c.securityNotifySvc.Send(context.Background(), models.SecurityEvent{
EventType: "acl_deny",
Severity: "warn",
Message: "Access control list blocked request",
ClientIP: clientIP,
Path: ctx.Request.URL.Path,
Timestamp: time.Now(),
Metadata: map[string]any{
"acl_name": acl.Name,
"acl_id": acl.ID,
},
})
ctx.AbortWithStatusJSON(http.StatusForbidden, gin.H{"error": "Blocked by access control list"})
return
}
}
}
}
ctx.Next()
}
}
```
**Issue**: Reads `c.cfg.ACLMode` (static config), not runtime setting from DB.
**Fix**: Query `settings` table for `security.acl.enabled` before checking ACLs.
**Updated Code**:
```go
func (c *Cerberus) Middleware() gin.HandlerFunc {
return func(ctx *gin.Context) {
if !c.IsEnabled() {
ctx.Next()
return
}
// WAF tracking - check runtime setting
wafEnabled := c.cfg.WAFMode != "" && c.cfg.WAFMode != "disabled"
if c.db != nil {
var s models.Setting
if err := c.db.Where("key = ?", "security.waf.enabled").First(&s).Error; err == nil {
wafEnabled = strings.EqualFold(s.Value, "true")
}
}
if wafEnabled {
metrics.IncWAFRequest()
}
// ACL: check runtime setting before evaluating access lists
aclEnabled := c.cfg.ACLMode == "enabled"
if c.db != nil {
var s models.Setting
if err := c.db.Where("key = ?", "security.acl.enabled").First(&s).Error; err == nil {
aclEnabled = strings.EqualFold(s.Value, "true")
}
}
if aclEnabled {
acls, err := c.accessSvc.List()
if err == nil {
clientIP := ctx.ClientIP()
for _, acl := range acls {
if !acl.Enabled {
continue
}
allowed, _, err := c.accessSvc.TestIP(acl.ID, clientIP)
if err == nil && !allowed {
// Send security notification
_ = c.securityNotifySvc.Send(context.Background(), models.SecurityEvent{
EventType: "acl_deny",
Severity: "warn",
Message: "Access control list blocked request",
ClientIP: clientIP,
Path: ctx.Request.URL.Path,
Timestamp: time.Now(),
Metadata: map[string]any{
"acl_name": acl.Name,
"acl_id": acl.ID,
},
})
ctx.AbortWithStatusJSON(http.StatusForbidden, gin.H{"error": "Blocked by access control list"})
return
}
}
}
}
// CrowdSec integration (already correct - checks mode)
if c.cfg.CrowdSecMode == "local" {
metrics.IncCrowdSecRequest()
logger.Log().WithField("client_ip", ctx.ClientIP()).WithField("path", ctx.Request.URL.Path).Debug("Request evaluated by CrowdSec bouncer at Caddy layer")
}
ctx.Next()
}
}
```
**File Changes**:
- `backend/internal/cerberus/cerberus.go` (lines 85-148)
### 2. Caddy Config Generation (WAF and Rate Limit)
**File**: `backend/internal/caddy/config.go`
**Current Implementation** (lines 1-300):
```go
func GenerateConfig(hosts []models.ProxyHost, storageDir, acmeEmail, frontendDir, sslProvider string, acmeStaging, crowdsecEnabled, wafEnabled, rateLimitEnabled, aclEnabled bool, adminWhitelist string, rulesets []models.SecurityRuleSet, rulesetPaths map[string]string, decisions []models.SecurityDecision, secCfg *models.SecurityConfig, dnsProviderConfigs []DNSProviderConfig) (*Config, error) {
// ... config generation ...
}
```
**Issue**: Function parameters `wafEnabled`, `rateLimitEnabled`, `aclEnabled` are **static booleans** passed from static config, not runtime settings.
**Fix**: Before calling `GenerateConfig`, query runtime settings and pass correct values.
**Caller**: `backend/internal/caddy/manager.go` (ApplyConfig method)
**Current Code** (approximate):
```go
func (m *Manager) ApplyConfig(ctx context.Context) error {
// ... fetch hosts, rulesets, etc. ...
// Get static config flags
wafEnabled := m.secCfg.WAFMode != "" && m.secCfg.WAFMode != "disabled"
rateLimitEnabled := m.secCfg.RateLimitMode == "enabled"
aclEnabled := m.secCfg.ACLMode == "enabled"
config, err := GenerateConfig(
hosts,
m.storageDir,
acmeEmail,
m.frontendDir,
sslProvider,
acmeStaging,
crowdsecEnabled,
wafEnabled, // ❌ Static
rateLimitEnabled, // ❌ Static
aclEnabled, // ❌ Static
adminWhitelist,
rulesets,
rulesetPaths,
decisions,
secCfg,
dnsProviderConfigs,
)
// ... apply to Caddy ...
}
```
**Updated Code**:
```go
func (m *Manager) ApplyConfig(ctx context.Context) error {
// ... fetch hosts, rulesets, etc. ...
// Get runtime settings (priority 1) or fallback to static config
wafEnabled := m.secCfg.WAFMode != "" && m.secCfg.WAFMode != "disabled"
rateLimitEnabled := m.secCfg.RateLimitMode == "enabled"
aclEnabled := m.secCfg.ACLMode == "enabled"
// Override with runtime settings from DB
if m.db != nil {
var s models.Setting
// WAF runtime setting
if err := m.db.Where("key = ?", "security.waf.enabled").First(&s).Error; err == nil {
wafEnabled = strings.EqualFold(s.Value, "true")
}
// Rate Limit runtime setting
s = models.Setting{} // Reset
if err := m.db.Where("key = ?", "security.rate_limit.enabled").First(&s).Error; err == nil {
rateLimitEnabled = strings.EqualFold(s.Value, "true")
}
// ACL runtime setting
s = models.Setting{} // Reset
if err := m.db.Where("key = ?", "security.acl.enabled").First(&s).Error; err == nil {
aclEnabled = strings.EqualFold(s.Value, "true")
}
}
config, err := GenerateConfig(
hosts,
m.storageDir,
acmeEmail,
m.frontendDir,
sslProvider,
acmeStaging,
crowdsecEnabled,
wafEnabled, // ✅ Runtime
rateLimitEnabled, // ✅ Runtime
aclEnabled, // ✅ Runtime
adminWhitelist,
rulesets,
rulesetPaths,
decisions,
secCfg,
dnsProviderConfigs,
)
// ... apply to Caddy ...
}
```
**File Changes**:
- `backend/internal/caddy/manager.go` (ApplyConfig method, ~line 150-250)
---
### 3. Performance: Settings Cache Layer
**⚠️ CRITICAL PERFORMANCE FIX**: Querying settings table on every request causes unnecessary DB load.
**File**: `backend/internal/cerberus/cerberus.go`
**Problem**: Current implementation queries `settings` table on every HTTP request in middleware (lines 105-135). For high-traffic sites, this adds ~1-2ms per request and increases DB load.
**Solution**: Add in-memory cache with 60-second TTL.
**Cache Implementation**:
```go
import (
"sync"
"time"
)
type Cerberus struct {
cfg config.SecurityConfig
db *gorm.DB
accessSvc AccessService
securityNotifySvc SecurityNotificationService
// ✅ Add cache fields
settingsCache map[string]string // key -> value
settingsCacheMu sync.RWMutex
settingsCacheTime time.Time
settingsCacheTTL time.Duration
}
func New(cfg config.SecurityConfig, db *gorm.DB, accessSvc AccessService, securityNotifySvc SecurityNotificationService) *Cerberus {
return &Cerberus{
cfg: cfg,
db: db,
accessSvc: accessSvc,
securityNotifySvc: securityNotifySvc,
settingsCache: make(map[string]string),
settingsCacheTTL: 60 * time.Second, // ✅ 60-second TTL
}
}
// getSetting retrieves a setting with in-memory caching.
func (c *Cerberus) getSetting(key string) (string, bool) {
// Fast path: check cache with read lock
c.settingsCacheMu.RLock()
if time.Since(c.settingsCacheTime) < c.settingsCacheTTL {
val, ok := c.settingsCache[key]
c.settingsCacheMu.RUnlock()
return val, ok
}
c.settingsCacheMu.RUnlock()
// Slow path: refresh cache with write lock
c.settingsCacheMu.Lock()
defer c.settingsCacheMu.Unlock()
// Double-check: another goroutine might have refreshed cache
if time.Since(c.settingsCacheTime) < c.settingsCacheTTL {
val, ok := c.settingsCache[key]
return val, ok
}
// Refresh entire cache from DB (batch query is faster than individual queries)
var settings []models.Setting
if err := c.db.Where("key LIKE ?", "security.%").Find(&settings).Error; err != nil {
return "", false
}
// Update cache
c.settingsCache = make(map[string]string)
for _, s := range settings {
c.settingsCache[s.Key] = s.Value
}
c.settingsCacheTime = time.Now()
val, ok := c.settingsCache[key]
return val, ok
}
// InvalidateCache forces cache refresh on next access.
// Call this after updating security settings.
func (c *Cerberus) InvalidateCache() {
c.settingsCacheMu.Lock()
c.settingsCacheTime = time.Time{} // Zero time forces refresh
c.settingsCacheMu.Unlock()
}
```
**Usage in Middleware** (replace individual queries):
```go
func (c *Cerberus) Middleware() gin.HandlerFunc {
return func(ctx *gin.Context) {
if !c.IsEnabled() {
ctx.Next()
return
}
// ✅ Use cached settings instead of direct DB queries
wafEnabled := c.cfg.WAFMode != "" && c.cfg.WAFMode != "disabled"
if val, ok := c.getSetting("security.waf.enabled"); ok {
wafEnabled = strings.EqualFold(val, "true")
}
if wafEnabled {
metrics.IncWAFRequest()
}
aclEnabled := c.cfg.ACLMode == "enabled"
if val, ok := c.getSetting("security.acl.enabled"); ok {
aclEnabled = strings.EqualFold(val, "true")
}
if aclEnabled {
// ... ACL logic ...
}
ctx.Next()
}
}
```
**Cache Invalidation** (in SettingsHandler):
```go
// In UpdateSetting, after saving to DB:
if strings.HasPrefix(req.Key, "security.") {
// Invalidate Cerberus cache
if h.Cerberus != nil {
h.Cerberus.InvalidateCache()
}
// Trigger config reload (async)
if h.CaddyManager != nil {
go func() {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
h.CaddyManager.ApplyConfig(ctx)
}()
}
}
```
**Performance Impact**:
- **Before**: 3 DB queries per request (~3-6ms DB time)
- **After**: 0 DB queries per request (cache hit), 1 batch query per 60s (cache refresh)
- **Expected Improvement**: ~5ms per request reduction at high traffic
**Benchmark Requirement**:
```go
// Add benchmark test to verify performance improvement
func BenchmarkCerberus_Middleware_WithCache(b *testing.B) {
// ... benchmark setup ...
b.ResetTimer()
for i := 0; i < b.N; i++ {
// ... call middleware ...
}
}
```
**File Changes**:
-`backend/internal/cerberus/cerberus.go` (add cache struct fields and methods, ~100 lines)
-`backend/internal/api/handlers/settings_handler.go` (add cache invalidation, ~5 lines)
-`backend/internal/cerberus/cerberus_test.go` (add cache tests, ~50 lines)
-`backend/internal/cerberus/cerberus_bench_test.go` (new file, benchmark, ~30 lines)
---
## Testing Strategy
### 1. Backend Unit Tests
#### Test Settings Handler (Already Covered)
**File**: `backend/internal/api/handlers/settings_handler_test.go` (if exists)
**Tests to Add/Verify**:
- ✅ UpdateSetting creates new setting
- ✅ UpdateSetting updates existing setting
- ✅ UpdateSetting validates required fields
- ⚠️ Add test: UpdateSetting handles `security.*.enabled` keys
**New Test**:
```go
func TestSettingsHandler_UpdateSetting_SecurityToggles(t *testing.T) {
db := setupTestDB(t)
handler := NewSettingsHandler(db)
router := setupTestRouter()
router.POST("/settings", handler.UpdateSetting)
testCases := []struct {
name string
key string
value string
category string
typ string
}{
{"ACL Enable", "security.acl.enabled", "true", "security", "bool"},
{"WAF Enable", "security.waf.enabled", "true", "security", "bool"},
{"Rate Limit Enable", "security.rate_limit.enabled", "true", "security", "bool"},
{"ACL Disable", "security.acl.enabled", "false", "security", "bool"},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
payload := map[string]string{
"key": tc.key,
"value": tc.value,
"category": tc.category,
"type": tc.typ,
}
body, _ := json.Marshal(payload)
w := httptest.NewRecorder()
req, _ := http.NewRequest("POST", "/settings", bytes.NewBuffer(body))
req.Header.Set("Content-Type", "application/json")
router.ServeHTTP(w, req)
assert.Equal(t, http.StatusOK, w.Code)
// Verify in DB
var setting models.Setting
err := db.Where("key = ?", tc.key).First(&setting).Error
require.NoError(t, err)
assert.Equal(t, tc.value, setting.Value)
})
}
}
```
#### Test Cerberus Middleware
**File**: `backend/internal/cerberus/cerberus_test.go` (new or existing)
**Tests to Add**:
- ✅ Middleware checks runtime `security.acl.enabled` setting
- ✅ Middleware blocks request when ACL enabled and IP not allowed
- ✅ Middleware allows request when ACL disabled
- ✅ Middleware blocks request when ACL enabled and IP blocked
**New Test**:
```go
func TestCerberus_Middleware_ACLRuntimeSetting(t *testing.T) {
db := setupTestDB(t)
require.NoError(t, db.AutoMigrate(&models.Setting{}, &models.AccessList{}))
// Create ACL that blocks all IPs except 127.0.0.1
acl := models.AccessList{
Name: "Test ACL",
Type: "whitelist",
Enabled: true,
IPRules: `[{"cidr":"127.0.0.1/32"}]`,
}
require.NoError(t, db.Create(&acl).Error)
cfg := config.SecurityConfig{
CerberusEnabled: true,
ACLMode: "enabled", // Static config enables ACL
}
cerb := New(cfg, db)
router := gin.New()
router.Use(cerb.Middleware())
router.GET("/test", func(c *gin.Context) {
c.JSON(200, gin.H{"ok": true})
})
// Test 1: ACL disabled via runtime setting - should allow request
db.Create(&models.Setting{Key: "security.acl.enabled", Value: "false"})
w := httptest.NewRecorder()
req, _ := http.NewRequest("GET", "/test", nil)
req.RemoteAddr = "192.168.1.100:1234" // Blocked IP
router.ServeHTTP(w, req)
assert.Equal(t, http.StatusOK, w.Code, "ACL disabled, should allow")
// Test 2: ACL enabled via runtime setting - should block request
db.Model(&models.Setting{}).Where("key = ?", "security.acl.enabled").Update("value", "true")
w = httptest.NewRecorder()
req, _ = http.NewRequest("GET", "/test", nil)
req.RemoteAddr = "192.168.1.100:1234" // Blocked IP
router.ServeHTTP(w, req)
assert.Equal(t, http.StatusForbidden, w.Code, "ACL enabled, should block")
}
```
#### Test Caddy Manager
**File**: `backend/internal/caddy/manager_test.go` (existing)
**Tests to Add**:
- ✅ ApplyConfig reads runtime `security.waf.enabled` setting
- ✅ ApplyConfig reads runtime `security.rate_limit.enabled` setting
- ✅ ApplyConfig reads runtime `security.acl.enabled` setting
- ✅ Config generation includes WAF handler only when enabled
- ✅ Config generation includes rate limit handler only when enabled
**New Test**:
```go
func TestCaddyManager_ApplyConfig_RuntimeSettings(t *testing.T) {
db := setupTestDB(t)
require.NoError(t, db.AutoMigrate(&models.Setting{}, &models.ProxyHost{}, &models.SecurityConfig{}))
// Create proxy host
host := models.ProxyHost{
DomainNames: "test.example.com",
Enabled: true,
ForwardScheme: "http",
ForwardHost: "localhost",
ForwardPort: 8080,
}
require.NoError(t, db.Create(&host).Error)
// Create static security config (WAF disabled by default)
secCfg := models.SecurityConfig{
Name: "default",
Enabled: true,
WAFMode: "disabled",
}
require.NoError(t, db.Create(&secCfg).Error)
mgr := &Manager{
db: db,
storageDir: t.TempDir(),
secCfg: config.SecurityConfig{WAFMode: "disabled"},
}
// Test 1: Runtime setting enables WAF - should include WAF handler
db.Create(&models.Setting{Key: "security.waf.enabled", Value: "true"})
err := mgr.ApplyConfig(context.Background())
require.NoError(t, err)
// Verify config includes WAF handler
// (Implementation depends on how you verify generated config)
}
```
### 2. Frontend Unit Tests
#### Test Security.tsx Toggle Mutation
**File**: `frontend/src/pages/Security.test.tsx` (new or existing)
**Tests to Add**:
- ✅ toggleServiceMutation calls updateSetting with correct key
- ✅ toggleServiceMutation updates optimistic state correctly
- ✅ toggleServiceMutation rolls back on error
- ✅ toggleServiceMutation invalidates queries on success
**New Test** (using Vitest + React Testing Library):
```typescript
import { describe, it, expect, vi, beforeEach } from 'vitest'
import { render, screen, waitFor } from '@testing-library/react'
import userEvent from '@testing-library/user-event'
import { QueryClient, QueryClientProvider } from '@tanstack/react-query'
import Security from './Security'
import * as settingsAPI from '../api/settings'
vi.mock('../api/settings')
vi.mock('../api/security')
describe('Security Toggle Actions', () => {
let queryClient: QueryClient
beforeEach(() => {
queryClient = new QueryClient({
defaultOptions: { queries: { retry: false } },
})
})
it('should call updateSetting when ACL toggle is clicked', async () => {
const updateSettingMock = vi.spyOn(settingsAPI, 'updateSetting').mockResolvedValue()
render(
<QueryClientProvider client={queryClient}>
<Security />
</QueryClientProvider>
)
const aclToggle = await screen.findByTestId('toggle-acl')
await userEvent.click(aclToggle)
await waitFor(() => {
expect(updateSettingMock).toHaveBeenCalledWith(
'security.acl.enabled',
'true',
'security',
'bool'
)
})
})
it('should show error toast when toggle fails', async () => {
vi.spyOn(settingsAPI, 'updateSetting').mockRejectedValue(new Error('Network error'))
render(
<QueryClientProvider client={queryClient}>
<Security />
</QueryClientProvider>
)
const wafToggle = await screen.findByTestId('toggle-waf')
await userEvent.click(wafToggle)
await waitFor(() => {
expect(screen.getByText(/failed to update setting/i)).toBeInTheDocument()
})
})
})
```
### 3. E2E Tests (Playwright)
**File**: `tests/security/security-dashboard.spec.ts` (already written)
**Tests to Enable** (currently skipped with runtime check):
-`should toggle ACL enabled/disabled` (lines 118-138)
-`should toggle WAF enabled/disabled` (lines 140-160)
-`should toggle Rate Limiting enabled/disabled` (lines 162-182)
-`should persist toggle state after page reload` (lines 184-216)
**Current Skip Logic**:
```typescript
test('should toggle ACL enabled/disabled', async ({ page }) => {
const toggle = page.getByTestId('toggle-acl');
// Check if toggle is disabled (Cerberus must be enabled for toggles to work)
const isDisabled = await toggle.isDisabled();
if (isDisabled) {
test.info().annotations.push({
type: 'skip-reason',
description: 'Toggle is disabled because Cerberus security is not enabled'
});
test.skip();
return;
}
// ... test logic ...
});
```
**After Implementation**: These tests will **automatically pass** once toggles are functional (no code changes needed).
**File**: `tests/security/rate-limiting.spec.ts` (already written)
**Tests to Enable**:
-`should toggle rate limiting on/off` (lines 42-67)
---
## Implementation Phases
### Phase 0: Cerberus DB Injection ~~(2 hours)~~ ✅ ALREADY COMPLETE
**Objective**: ~~Add DB field to Cerberus struct and update all instantiation sites.~~
**STATUS**: ✅ **SKIP THIS PHASE** - Verified complete as of 2026-01-24
The Supervisor review confirmed that:
- Cerberus struct already has `db *gorm.DB` field (lines 20-32)
- Constructor `New()` already accepts `*gorm.DB` parameter
- Only one production instantiation site exists: `routes.go:107`
- Test files manage their own mock instances
**Time Saved**: 2 hours
**Proceed directly to Phase 1.**
---
### Phase 1: Backend Middleware Updates (5 hours)
**Objective**: Make middleware honor runtime settings and add performance cache layer.
**Prerequisites**: ✅ Phase 0 already complete (DB injection verified in place).
**Tasks**:
1. Update `backend/internal/cerberus/cerberus.go`:
- ✅ Add cache fields (settingsCache, mutex, TTL)
- ✅ Implement `getSetting()` method with 60s TTL cache
- ✅ Implement `InvalidateCache()` method
- ✅ Update Middleware() to use cached settings
- ✅ Add unit tests for cache behavior
- ✅ Add benchmark tests for cache performance
2. Update `backend/internal/api/handlers/settings_handler.go`:
- ✅ Add `CaddyManager` field to struct
- ✅ Add `Cerberus` field to struct (for cache invalidation)
- ✅ Update `UpdateSetting()` to trigger config reload for security.* keys
- ✅ Add async reload with 30s timeout
- ✅ Add cache invalidation call
- ✅ Add unit tests for reload trigger
3. Update `backend/internal/caddy/manager.go`:
- ✅ Query runtime settings before calling GenerateConfig()
- ✅ Pass runtime-enabled flags to GenerateConfig()
- ✅ Add unit tests for runtime setting integration
4. Update constructor injection:
-`NewSettingsHandler()` receives CaddyManager and Cerberus
- ✅ Update all handler instantiation sites
**Files to Modify**:
-`backend/internal/cerberus/cerberus.go` (~120 lines changed/added)
-`backend/internal/api/handlers/settings_handler.go` (~40 lines changed/added)
-`backend/internal/caddy/manager.go` (~30 lines added)
-`backend/internal/cerberus/cerberus_test.go` (~150 lines new tests)
-`backend/internal/cerberus/cerberus_bench_test.go` (~30 lines new file)
-`backend/internal/api/handlers/settings_handler_test.go` (~100 lines new tests)
-`backend/internal/caddy/manager_test.go` (~50 lines added)
-`backend/internal/api/server.go` (~10 lines handler setup)
**Validation**:
```bash
# Run backend unit tests
cd backend
go test ./internal/cerberus/...
go test ./internal/caddy/...
go test ./internal/api/handlers/...
# Run benchmarks
go test -bench=. ./internal/cerberus/...
```
### Phase 2: Frontend Toggle Handlers (2 hours)
**Objective**: Fix optimistic update logic and Switch component usage in Security.tsx.
**Tasks**:
1. Update `frontend/src/pages/Security.tsx`:
- ✅ Replace optimistic update logic in toggleServiceMutation (preserve `mode` field)
- ✅ Fix all three toggle components to use `onCheckedChange` instead of `onChange`
- ✅ Ensure correct SecurityStatus type handling with spread operators
- ✅ Add TypeScript type guards for safety
- ✅ Add unit tests for optimistic update logic
2. Verify Switch component is correct:
- ✅ Confirm `onCheckedChange` prop exists and works
- ✅ No changes needed to Switch component itself
**Files to Modify**:
-`frontend/src/pages/Security.tsx` (~80 lines changed)
-`frontend/src/pages/Security.test.tsx` (~100 lines new tests)
**Critical Fixes**:
1. **Preserve mode field**: WAF and rate_limit have `{enabled: boolean, mode: string}` - must use spread operator
2. **Use onCheckedChange**: Receives `boolean` directly, not `Event` object
3. **Apply to all toggles**: ACL, WAF, Rate Limit
**Validation**:
```bash
# Run frontend unit tests
cd frontend
npm test -- Security.test.tsx
```
### Phase 3: Integration Testing (4 hours)
**Objective**: Validate end-to-end toggle functionality.
**Tasks**:
1. Run E2E tests against Docker container:
```bash
npx playwright test tests/security/security-dashboard.spec.ts --project=chromium
npx playwright test tests/security/rate-limiting.spec.ts --project=chromium
```
2. Verify all 8 previously skipped tests now pass
3. Manual testing:
- Toggle ACL on/off, verify status persists
- Toggle WAF on/off, verify status persists
- Toggle Rate Limit on/off, verify status persists
- Refresh page, verify state persists
- Verify middleware blocks requests when ACL enabled
- Verify middleware allows requests when ACL disabled
4. Test edge cases:
- Toggle while Cerberus disabled (should be disabled)
- Toggle during pending state (should be disabled)
- Network error during toggle (should rollback)
- ⚠️ **NEW**: Config reload failure (setting should still save)
- ⚠️ **NEW**: Concurrent toggles (100 simultaneous toggles)
- ⚠️ **NEW**: Cache refresh (verify 60s TTL works)
- ⚠️ **NEW**: Mode field preservation (WAF and rate_limit)
**Validation**:
- ✅ All 8 E2E tests pass
- ✅ Manual toggle works in UI
- ✅ Settings persist across page reloads
- ✅ Middleware respects runtime settings
### Phase 4: Documentation and Cleanup (2 hours)
**Objective**: Update documentation and finalize implementation.
**Tasks**:
1. Update `docs/plans/skipped-tests-remediation.md`:
- Mark Phase 4 as complete
- Update test count (63 → 55 skipped)
- Add Phase 4 completion summary
2. Update `docs/features.md`:
- Document security module toggle functionality
- Add screenshots if needed
3. Update `CHANGELOG.md`:
- Add Phase 4 completion entry
4. Code cleanup:
- Remove debug logging
- Add JSDoc comments to new functions
- Run linters and fix issues
**Files to Modify**:
- ✅ `docs/plans/skipped-tests-remediation.md` (update progress)
- ✅ `docs/features.md` (add toggle documentation)
- ✅ `CHANGELOG.md` (add entry)
---
## File Modification Checklist
### Backend Files
| File | Lines Changed | Effort | Status |
|------|---------------|--------|--------|
| `backend/internal/cerberus/cerberus.go` | ~135 (struct, cache, middleware) | 2.5h | ⬜ TODO |
| `backend/internal/api/handlers/settings_handler.go` | ~40 (reload trigger) | 1h | ⬜ TODO |
| `backend/internal/caddy/manager.go` | ~30 (runtime settings) | 1h | ⬜ TODO |
| `backend/internal/api/server.go` | ~15 (handler setup) | 0.5h | ⬜ TODO |
| `backend/internal/cerberus/cerberus_test.go` | ~150 (new tests) | 2.5h | ⬜ TODO |
| `backend/internal/cerberus/cerberus_bench_test.go` | ~30 (new file) | 0.5h | ⬜ TODO |
| `backend/internal/api/handlers/settings_handler_test.go` | ~100 (new tests) | 1.5h | ⬜ TODO |
| `backend/internal/caddy/manager_test.go` | ~50 (add tests) | 1h | ⬜ TODO |
**Total Backend**: 8 files, ~550 lines, 10.5 hours
### Frontend Files
| File | Lines Changed | Effort | Status |
|------|---------------|--------|--------|
| `frontend/src/pages/Security.tsx` | ~80 (optimistic update + onCheckedChange) | 1.5h | ⬜ TODO |
| `frontend/src/pages/Security.test.tsx` | ~120 (new tests) | 1.5h | ⬜ TODO |
**Total Frontend**: 2 files, ~200 lines, 3 hours
### Test Files
| File | Lines Changed | Effort | Status |
|------|---------------|--------|--------|
| `tests/security/security-dashboard.spec.ts` | 0 (already written) | 2h (validation) | ⬜ TODO |
| `tests/security/rate-limiting.spec.ts` | 0 (already written) | 0.5h (validation) | ⬜ TODO |
**Total Test**: 2 files, 0 lines changed, 2.5 hours validation
### Documentation Files
| File | Lines Changed | Effort | Status |
|------|---------------|--------|--------|
| `docs/plans/skipped-tests-remediation.md` | ~50 | 0.5h | ⬜ TODO |
| `docs/features.md` | ~30 | 0.5h | ⬜ TODO |
| `CHANGELOG.md` | ~10 | 0.25h | ⬜ TODO |
**Total Documentation**: 3 files, ~90 lines, 1.25 hours
### Grand Total
| Category | Files | Lines | Effort |
|----------|-------|-------|--------|
| Backend | 8 | ~550 | 8.5h |
| Frontend | 2 | ~200 | 3h |
| Tests | 2 | 0 | 2.5h |
| Docs | 3 | ~90 | 1h |
| **TOTAL** | **15** | **~840** | **15h** |
**With buffer**: 13-15 hours (2 days)
**✅ Revised Effort (2026-01-24 Supervisor Review)**:
- ~~DB injection prerequisite: +2h~~ → **SKIP** (already complete, saves 2h)
- Cache layer implementation: +3h
- Config reload trigger: +1.5h
- Enhanced testing (concurrent, cache, reload failures): +1.5h
- Frontend fixes (mode preservation, onCheckedChange): +1h
- Documentation streamlined: -0.25h
---
## Validation Criteria
### Phase 0 Complete (Prerequisites) ✅ VERIFIED COMPLETE
- [x] Cerberus struct has `db *gorm.DB` field ✅ (verified 2026-01-24)
- [x] Cerberus `New()` constructor accepts `*gorm.DB` parameter ✅ (verified 2026-01-24)
- [x] All instantiation sites already pass db (routes.go:107) ✅
- [x] Compilation successful (`go build ./...`) ✅
- [x] Import for `"strings"` package added (needed for Phase 1 middleware updates) ✅
### Phase 1 Complete (Backend) ✅ COMPLETE 2026-01-24
- [x] Cerberus has cache fields (settingsCache, mutex, TTL) ✅
- [x] Cerberus implements `getSetting()` with 60s TTL ✅
- [x] Cerberus implements `InvalidateCache()` method ✅
- [x] Cerberus middleware uses cached settings (not direct DB queries) ✅
- [x] SettingsHandler has CaddyManager and Cerberus fields ✅
- [x] SettingsHandler triggers config reload for security.* keys ✅
- [x] SettingsHandler invalidates Cerberus cache on update ✅
- [x] Config reload is async with 30s timeout ✅
- [x] Caddy manager queries runtime settings before config generation ✅
- [x] All backend unit tests pass (`go test ./...`) ✅
- [x] Benchmark tests show cache performance improvement ✅
- [x] No staticcheck errors (`staticcheck ./...`) ✅
### Phase 2 Complete (Frontend) ✅ COMPLETE 2026-01-24
- [x] Security.tsx optimistic update preserves `mode` field for WAF and rate_limit ✅
- [x] All toggle components use `onCheckedChange` (not `onChange`) ✅
- [x] Toggle mutations call updateSetting with correct keys ✅
- [x] Error handling rolls back optimistic updates ✅
- [x] Success handler invalidates queries correctly ✅
- [x] Spread operator used correctly: `{ ...copy.waf, enabled }` ✅
- [x] All frontend unit tests pass (`npm test`) ✅
- [x] Unit tests verify mode field preservation ✅
- [x] No TypeScript errors (`npm run type-check`) ✅
- [x] No ESLint errors (`npm run lint`) ✅
### Phase 3 Complete (E2E) ✅ COMPLETE 2026-01-24
- [x] Test: `should toggle ACL enabled/disabled` passes ✅
- [x] Test: `should toggle WAF enabled/disabled` passes ✅
- [x] Test: `should toggle Rate Limiting enabled/disabled` passes ✅
- [x] Test: `should persist toggle state after page reload` passes ✅
- [x] Test: `should toggle rate limiting on/off` passes (rate-limiting.spec.ts) ✅
- [x] Manual test: Toggle ACL, verify middleware blocks/allows requests ✅
- [x] Manual test: Toggle state persists across browser refresh ✅
- [x] Manual test: Error toast displays on network failure ✅
- [x] Manual test: Config reload failure doesn't block UI toggle ✅
- [x] Manual test: Concurrent toggles (stress test with 100 toggles) ✅
- [x] Manual test: Cache refresh (wait 60s, verify new queries) ✅
- [x] Manual test: Mode field preserved (WAF/rate_limit still show mode after toggle) ✅
### Phase 4 Complete (Documentation) ✅ COMPLETE 2026-01-24
- [x] `skipped-tests-remediation.md` updated with Phase 4 completion ✅
- [x] `features.md` documents toggle functionality ✅
- [x] `CHANGELOG.md` includes Phase 4 entry ✅
- [x] All linters pass ✅
- [x] Code review complete ✅
### Final Acceptance ✅ COMPLETE 2026-01-24
- [x] **8 E2E tests passing** (down from 7 skipped) ✅
- [x] **Total skipped tests: 55** (down from 63) ✅
- [x] **Backend coverage ≥85%** (no regression) ✅
- [x] **Frontend coverage ≥85%** (no regression) ✅
- [x] **Zero staticcheck errors** ✅
- [x] **Zero TypeScript errors** ✅
- [x] **Zero ESLint errors** ✅
- [x] **PR approved and merged** ✅
---
## Risk Mitigation
### Risk 1: Middleware Performance Impact
**Risk**: Querying settings table on every request may slow down Cerberus middleware.
**Likelihood**: Low (DB queries are fast, <1ms)
**Mitigation**:
1. Add in-memory cache for settings with 60-second TTL
2. Invalidate cache when setting is updated
3. Profile middleware with and without cache
**Fallback**: If performance degrades >10ms per request, implement caching layer.
### Risk 2: Race Condition Between Toggle and Status Refresh
**Risk**: User toggles switch while status query is in flight, causing stale UI state.
**Likelihood**: Medium (fast users or slow networks)
**Mitigation**:
1. Optimistic updates handle this gracefully
2. Query invalidation ensures eventual consistency
3. Disable toggle during mutation
**Fallback**: Add version/timestamp to settings and reject stale updates.
### Risk 3: Caddy Config Not Applied After Toggle
**Risk**: User toggles setting but Caddy config isn't regenerated, so WAF/rate limit don't reflect new state.
**Likelihood**: High (config generation is manual)
**Mitigation**:
1. ApplyConfig is called automatically on toggle via query invalidation
2. Add explicit Caddy config reload trigger after settings update
3. Document that config reload may take 1-2 seconds
**Fallback**: Add "Apply Changes" button to manually trigger config reload.
---
## Appendix A: API Endpoint Reference
### Existing Endpoints (No Changes)
| Method | Endpoint | Description | Handler |
|--------|----------|-------------|---------|
| GET | `/api/v1/security/status` | Get security module status | `security_handler.go:GetStatus()` |
| POST | `/api/v1/settings` | Update a setting | `settings_handler.go:UpdateSetting()` |
| GET | `/api/v1/settings` | Get all settings | `settings_handler.go:GetSettings()` |
### Settings Keys Used
| Key | Type | Category | Description |
|-----|------|----------|-------------|
| `security.acl.enabled` | bool | security | ACL module enabled/disabled |
| `security.waf.enabled` | bool | security | WAF module enabled/disabled |
| `security.rate_limit.enabled` | bool | security | Rate limit enabled/disabled |
| `security.crowdsec.enabled` | bool | security | CrowdSec enabled/disabled (already working) |
---
## Appendix B: Test Coverage Goals
### Backend Unit Tests
**Target**: 85% minimum coverage for modified files
| File | Current Coverage | Target | Gap |
|------|------------------|--------|-----|
| `cerberus/cerberus.go` | ~70% | 85% | +15% |
| `caddy/manager.go` | ~80% | 85% | +5% |
**New Tests Required**:
- Cerberus middleware with runtime settings (5 tests)
- Caddy manager runtime setting integration (3 tests)
### Frontend Unit Tests
**Target**: 85% minimum coverage for modified files
| File | Current Coverage | Target | Gap |
|------|------------------|--------|-----|
| `pages/Security.tsx` | ~60% | 85% | +25% |
**New Tests Required**:
- Toggle mutation logic (4 tests)
- Optimistic update logic (3 tests)
- Error handling (2 tests)
### E2E Tests
**Target**: All previously skipped tests pass
| Test Suite | Tests to Pass | Current Passing | Gap |
|------------|---------------|-----------------|-----|
| `security-dashboard.spec.ts` | 4 | 0 | +4 |
| `rate-limiting.spec.ts` | 1 | 0 | +1 |
| **TOTAL** | **5** | **0** | **+5** |
---
## Appendix C: Debugging Guide
### Issue: Toggle Doesn't Update UI
**Symptoms**: Clicking toggle doesn't change visual state.
**Diagnosis**:
1. Check browser console for errors
2. Verify mutation is called: `console.log` in toggleServiceMutation
3. Check network tab: POST /api/v1/settings should return 200
4. Verify optimistic update logic updates correct section
**Fix**:
- If no mutation call: Check Switch onChange handler
- If no network request: Check mutation function signature
- If network error: Check backend logs
- If UI doesn't update: Check optimistic update logic
### Issue: Toggle Updates UI But Doesn't Persist
**Symptoms**: Toggle works, but state resets on page reload.
**Diagnosis**:
1. Check DB: `SELECT * FROM settings WHERE key LIKE 'security.%.enabled'`
2. Verify POST /api/v1/settings returns 200 with updated setting
3. Check GET /api/v1/security/status returns correct enabled state
**Fix**:
- If setting not in DB: Check UpdateSetting handler
- If setting in DB but status wrong: Check GetStatus priority chain
- If status correct but UI wrong: Check React Query cache
### Issue: Middleware Doesn't Block Requests
**Symptoms**: ACL enabled but requests still go through.
**Diagnosis**:
1. Check Cerberus middleware logs: Should see DB query
2. Verify setting exists: `SELECT * FROM settings WHERE key = 'security.acl.enabled'`
3. Check access list exists and is enabled
4. Verify client IP matches blocked range
**Fix**:
- If no DB query logged: Middleware not reading runtime setting
- If setting not found: Create setting via UI toggle
- If ACL not enabled: Enable ACL in UI
- If IP not blocked: Check access list CIDR ranges
---
## Conclusion
This specification provides a complete, actionable plan for implementing security module toggle actions in Phase 4. The implementation leverages **existing infrastructure** (Settings table, UpdateSetting endpoint) rather than creating new APIs, minimizing scope and complexity.
**Key Success Factors**:
1. **Minimal Backend Changes**: Only middleware and Caddy manager need updates
2. **Frontend Fix**: Simple optimistic update logic correction
3. **Zero New Endpoints**: Reuse `/api/v1/settings` for all toggles
4. **Tests Already Written**: E2E tests will pass once toggles work
5. **Clear Validation**: 8 tests passing = Phase 4 complete
**Next Steps**:
1. Review this spec with team
2. Begin Phase 1: Backend middleware updates
3. Test each phase incrementally
4. Enable E2E tests after Phase 3
5. Update documentation in Phase 4
**Estimated Timeline**: 2 days (13-15 hours) for complete implementation and validation.
**Revised Phases** (Phase 0 skipped):
1. Phase 1: Backend Middleware Updates (5h) - **START HERE**
2. Phase 2: Frontend Toggle Handlers (2h) - Can parallelize with Phase 1
3. Phase 3: Integration Testing (4h)
4. Phase 4: Documentation and Cleanup (2h)