chore: git cache cleanup

2026-03-04 18:34:49 +00:00
parent c32cce2a88
commit 27c252600a
2001 changed files with 683185 additions and 0 deletions
--- a/docs/testing/DEBUGGING_IMPLEMENTATION.md
+++ b/docs/testing/DEBUGGING_IMPLEMENTATION.md
@@ -0,0 +1,539 @@
+# Playwright E2E Test Debugging Implementation Summary
+
+**Date**: January 27, 2026
+**Status**: ✅ Complete
+
+This document summarizes the comprehensive debugging enhancements implemented for the Playwright E2E test suite.
+
+## Overview
+
+A complete debugging ecosystem has been implemented to provide maximum observability into test execution, including structured logging, network monitoring, trace capture, and CI integration for parsing and analysis.
+
+## Deliverables Completed
+
+### 1. Debug Logger Utility ✅
+
+**File**: `tests/utils/debug-logger.ts` (291 lines)
+
+**Features**:
+- Class-based logger with methods: `step()`, `network()`, `pageState()`, `locator()`, `assertion()`, `error()`
+- Automatic duration tracking for operations
+- Color-coded console output for local runs (ANSI colors)
+- Structured JSON output for CI parsing
+- Sensitive data sanitization (auth headers, tokens)
+- Network log export (CSV/JSON)
+- Slow operation detection and reporting
+- Integration with Playwright test.step() system
+
+**Key Methods**:
+```typescript
+step(name: string, duration?: number)           // Log test steps
+network(entry: NetworkLogEntry)                 // Log HTTP activity
+locator(selector, action, found, elapsedMs)     // Log element interactions
+assertion(condition, passed, actual?, expected?) // Log assertions
+error(context, error, recoveryAttempts?)        // Log errors with context
+getNetworkCSV()                                 // Export network logs as CSV
+getSlowOperations(threshold?)                   // Get operations above threshold
+printSummary()                                  // Print colored summary to console
+```
+
+**Output Example**:
+```
+├─ Navigate to home page
+├─ Fill login form (234ms)
+   ✅ POST https://api.example.com/login [200] 342ms
+   ✓ click "[role='button']" 45ms
+   ✓ Assert: Button is visible
+```
+
+### 2. Enhanced Global Setup Logging ✅
+
+**File**: `tests/global-setup.ts` (Updated with timing logs)
+
+**Enhancements**:
+- Timing information for health checks (all operations timed)
+- Port connectivity checks with timing (Caddy admin, emergency server)
+- IPv4 vs IPv6 detection in URL parsing
+- Enhanced emergency security reset with elapsed time
+- Security module disabling verification
+- Structured logging of all steps in sequential order
+- Error context on failures with next steps
+
+**Sample Output**:
+```
+🔍 Checking Caddy admin API health at http://localhost:2019...
+  ✅ Caddy admin API (port 2019) is healthy [45ms]
+
+🔍 Checking emergency tier-2 server health at http://localhost:2020...
+  ⏭️  Emergency tier-2 server unavailable (tests will skip tier-2 features) [3002ms]
+
+📊 Port Connectivity Checks:
+✅ Connectivity Summary: Caddy=✓ Emergency=✗
+```
+
+### 3. Enhanced Playwright Config ✅
+
+**File**: `playwright.config.js` (Updated)
+
+**Enhancements**:
+- `trace: 'on-first-retry'` - Captures traces for all retries (not just first)
+- `video: 'retain-on-failure'` - Records videos only for failed tests
+- `screenshot: 'only-on-failure'` - Screenshots on failure only
+- Custom debug reporter integration
+- Comprehensive comments explaining each option
+
+**Configuration Added**:
+```javascript
+use: {
+  trace: process.env.CI ? 'on-first-retry' : 'on-first-retry',
+  video: process.env.CI ? 'retain-on-failure' : 'retain-on-failure',
+  screenshot: 'only-on-failure',
+}
+```
+
+### 4. Custom Debug Reporter ✅
+
+**File**: `tests/reporters/debug-reporter.ts` (130 lines)
+
+**Features**:
+- Parses test step execution and identifies slow operations (>5s)
+- Aggregates failures by type (timeout, assertion, network, locator)
+- Generates structured summary output to stdout
+- Calculates pass rate and test statistics
+- Shows slowest 10 tests ranked by duration
+- Creates visual bar charts for failure distribution
+
+**Sample Output**:
+```
+╔════════════════════════════════════════════════════════════╗
+║              E2E Test Execution Summary                      ║
+╠════════════════════════════════════════════════════════════╣
+║ Total Tests:        150                                     ║
+║ ✅ Passed:          145 (96%)                               ║
+║ ❌ Failed:          5                                       ║
+║ ⏭️  Skipped:         0                                       ║
+╚════════════════════════════════════════════════════════════╝
+
+⏱️  Slow Tests (>5s):
+1. Create DNS provider with dynamic parameters    8.92s
+2. Browse to security dashboard                   7.34s
+3. Configure rate limiting rules                  6.15s
+
+🔍 Failure Analysis by Type:
+timeout      │ ████░░░░░░░░░░░░░░░░░ 2/5 (40%)
+assertion    │ ██░░░░░░░░░░░░░░░░░░  2/5 (40%)
+network      │ ░░░░░░░░░░░░░░░░░░░░  1/5 (20%)
+```
+
+### 5. Network Interceptor Fixture ✅
+
+**File**: `tests/fixtures/network.ts` (286 lines)
+
+**Features**:
+- Intercepts all HTTP requests and responses
+- Tracks metrics per request:
+  - URL, method, status code, elapsed time
+  - Request/response headers (auth tokens redacted)
+  - Request/response sizes in bytes
+  - Response content-type
+  - Redirect chains
+  - Network errors with context
+- Export functions:
+  - CSV format for spreadsheet analysis
+  - JSON format for programmatic access
+- Analysis methods:
+  - Get slow requests (above threshold)
+  - Get failed requests (4xx/5xx)
+  - Status code distribution
+  - Average response time by URL pattern
+- Automatic header sanitization (removes auth headers)
+- Per-test request logging to debug logger
+
+**Export Example**:
+```csv
+"Timestamp","Method","URL","Status","Duration (ms)","Content-Type","Body Size","Error"
+"2024-01-27T10:30:45.123Z","GET","https://api.example.com/health","200","45","application/json","234",""
+"2024-01-27T10:30:46.234Z","POST","https://api.example.com/login","200","342","application/json","1024",""
+```
+
+### 6. Test Step Logging Helpers ✅
+
+**File**: `tests/utils/test-steps.ts` (148 lines)
+
+**Features**:
+- `testStep()` - Wrapper around test.step() with automatic logging
+- `LoggedPage` - Page wrapper that logs all interactions
+- `testAssert()` - Assertion helper with logging
+- `testStepWithRetry()` - Retry logic with exponential backoff
+- `measureStep()` - Duration measurement for operations
+- Automatic error logging on step failure
+- Soft assertion support (log but don't throw)
+- Performance tracking per test
+
+**Usage Example**:
+```typescript
+await testStep('Login', async () => {
+  await page.click('[role="button"]');
+}, { logger });
+
+const result = await measureStep('API call', async () => {
+  return fetch('/api/data');
+}, logger);
+console.log(`Completed in ${result.duration}ms`);
+```
+
+### 7. CI Workflow Enhancements ✅
+
+**File**: `.github/workflows/e2e-tests.yml` (Updated)
+
+**Environment Variables Added**:
+```yaml
+env:
+  DEBUG: 'charon:*,charon-test:*'
+  PLAYWRIGHT_DEBUG: '1'
+  CI_LOG_LEVEL: 'verbose'
+```
+
+**Shard Step Enhancements**:
+- Per-shard start/end logging with timestamps
+- Shard duration tracking
+- Sequential output format for easy parsing
+- Status banner for each shard completion
+
+**Sample Shard Output**:
+```
+════════════════════════════════════════════════════════════
+E2E Test Shard 1/4
+Browser: chromium
+Start Time: 2024-01-27T10:30:45Z
+════════════════════════════════════════════════════════════
+[test output]
+════════════════════════════════════════════════════════════
+Shard 1 Complete | Duration: 125s
+════════════════════════════════════════════════════════════
+```
+
+**Job Summary Enhancements**:
+- Per-shard status table with timestamps
+- Test artifact locations (HTML report, videos, traces, logs)
+- Debugging tips for common scenarios
+- Links to view reports and logs
+
+### 8. VS Code Debug Tasks ✅
+
+**File**: `.vscode/tasks.json` (4 new tasks added)
+
+**New Tasks**:
+
+1. **Test: E2E Playwright (Debug Mode - Full Traces)**
+   - Command: `DEBUG=charon:*,charon-test:* npx playwright test --debug --trace=on`
+   - Opens interactive Playwright Inspector
+   - Captures full traces during execution
+   - **Use when**: Need to step through tests interactively
+
+2. **Test: E2E Playwright (Debug with Logging)**
+   - Command: `DEBUG=charon:*,charon-test:* PLAYWRIGHT_DEBUG=1 npx playwright test --project=chromium`
+   - Displays enhanced console logging
+   - Shows all network activity and page state
+   - **Use when**: Want to see detailed logs without interactive mode
+
+3. **Test: E2E Playwright (Trace Inspector)**
+   - Command: `npx playwright show-trace test-results/traces/trace.zip`
+   - Opens Playwright Trace Viewer
+   - Inspect captured traces with full details
+   - **Use when**: Analyzing recorded traces from previous runs
+
+4. **Test: E2E Playwright - View Coverage Report**
+   - Command: `open coverage/e2e/index.html` (or xdg-open for Linux)
+   - Opens E2E coverage report in browser
+   - Shows what code paths were exercised
+   - **Use when**: Analyzing code coverage from E2E tests
+
+### 9. Documentation ✅
+
+**File**: `docs/testing/debugging-guide.md` (600+ lines)
+
+**Sections**:
+- Quick start for local testing
+- VS Code debug task usage guide
+- Debug logger method reference
+- Local and CI trace capture instructions
+- Network debugging and export
+- Common debugging scenarios with solutions
+- Performance analysis techniques
+- Environment variable reference
+- Troubleshooting tips
+
+**Features**:
+- Code examples for all utilities
+- Sample output for each feature
+- Commands for common debugging tasks
+- Links to official Playwright docs
+- Step-by-step guides for CI failures
+
+---
+
+## File Inventory
+
+### Created Files (4)
+| File | Lines | Purpose |
+|------|-------|---------|
+| `tests/utils/debug-logger.ts` | 291 | Core debug logging utility |
+| `tests/fixtures/network.ts` | 286 | Network request/response interception |
+| `tests/utils/test-steps.ts` | 148 | Test step and assertion logging helpers |
+| `tests/reporters/debug-reporter.ts` | 130 | Custom Playwright reporter for analysis |
+| `docs/testing/debugging-guide.md` | 600+ | Comprehensive debugging documentation |
+
+**Total New Code**: 1,455+ lines
+
+### Modified Files (3)
+| File | Changes |
+|------|---------|
+| `tests/global-setup.ts` | Enhanced timing logs, error context, detailed output |
+| `playwright.config.js` | Added trace/video/screenshot config, debug reporter integration |
+| `.github/workflows/e2e-tests.yml` | Added env vars, per-shard logging, improved summaries |
+| `.vscode/tasks.json` | 4 new debug tasks with descriptions |
+
+---
+
+## Environment Variables
+
+### For Local Testing
+
+```bash
+# Enable debug logging with colors
+DEBUG=charon:*,charon-test:*
+
+# Enable Playwright debug mode
+PLAYWRIGHT_DEBUG=1
+
+# Specify base URL (if not localhost:8080)
+PLAYWRIGHT_BASE_URL=http://localhost:8080
+```
+
+### In CI (GitHub Actions)
+
+Set automatically in workflow:
+```yaml
+env:
+  DEBUG: 'charon:*,charon-test:*'
+  PLAYWRIGHT_DEBUG: '1'
+  CI_LOG_LEVEL: 'verbose'
+```
+
+---
+
+## VS Code Tasks Available
+
+All new tasks are in the "test" group in VS Code:
+
+1. ✅ `Test: E2E Playwright (Debug Mode - Full Traces)`
+2. ✅ `Test: E2E Playwright (Debug with Logging)`
+3. ✅ `Test: E2E Playwright (Trace Inspector)`
+4. ✅ `Test: E2E Playwright - View Coverage Report`
+
+Plus existing tasks:
+- `Test: E2E Playwright (Chromium)`
+- `Test: E2E Playwright (All Browsers)`
+- `Test: E2E Playwright (Headed)`
+- `Test: E2E Playwright (Skill)`
+- `Test: E2E Playwright with Coverage`
+- `Test: E2E Playwright - View Report`
+- `Test: E2E Playwright (Debug Mode)` (existing)
+- `Test: E2E Playwright (Debug with Inspector)` (existing)
+
+---
+
+## Output Examples
+
+### Local Console Output (with ANSI colors)
+
+```
+🧹 Running global test setup...
+
+📍 Base URL: http://localhost:8080
+   └─ Hostname: localhost
+   ├─ Port: 8080
+   ├─ Protocol: http:
+   ├─ IPv6: No
+   └─ Localhost: Yes
+
+📊 Port Connectivity Checks:
+🔍 Checking Caddy admin API health at http://localhost:2019...
+  ✅ Caddy admin API (port 2019) is healthy [45ms]
+```
+
+### Test Execution Output
+
+```
+├─ Navigate to home
+├─ Click login button (234ms)
+   ✅ POST https://api.example.com/login [200] 342ms
+   ✓ click "[role='button']" 45ms
+   ✓ Assert: Button is visible
+```
+
+### CI Job Summary
+
+```
+## 📊 E2E Test Results
+
+### Shard Status
+
+| Shard | Status | Results |
+|-------|--------|---------|
+| Shard 1 | ✅ Complete | [Logs](action-url) |
+| Shard 2 | ✅ Complete | [Logs](action-url) |
+...
+
+### Debugging Tips
+
+1. Check **Videos** in artifacts for visual debugging of failures
+2. Open **Traces** with Playwright Inspector: `npx playwright show-trace <trace.zip>`
+3. Review **Docker Logs** for backend errors
+4. Run failed tests locally with: `npm run e2e -- --grep="test name"`
+```
+
+---
+
+## Integration Points
+
+### With Playwright Config
+
+- Debug reporter automatically invoked
+- Trace capture configured at project level
+- Video/screenshot retention for failures
+- Global setup enhanced with timing
+
+### With Test Utilities
+
+- Debug logger can be instantiated in any test
+- Network interceptor can be attached to any page
+- Test step helpers integrate with test.step()
+- Helpers tie directly to debug logger
+
+### With CI/CD
+
+- Environment variables set up for automated debugging
+- Per-shard summaries for parallel execution tracking
+- Artifact collection for all trace data
+- Job summary with actionable debugging tips
+
+---
+
+## Capabilities Unlocked
+
+### Before Implementation
+
+- Basic Playwright HTML report
+- Limited error messages
+- Manual trace inspection after test completion
+- No network-level visibility
+- Opaque CI failures
+
+### After Implementation
+
+✅ **Local Debugging**
+- Interactive step-by-step debugging
+- Full trace capture with Playwright Inspector
+- Color-coded console output with timing
+- Network requests logged and exportable
+- Automatic slow operation detection
+
+✅ **CI Diagnostics**
+- Per-shard status tracking with timing
+- Failure categorization by type (timeout, assertion, network)
+- Aggregated statistics across all shards
+- Slowest tests highlighted automatically
+- Artifact collection for detailed analysis
+
+✅ **Performance Analysis**
+- Per-operation duration tracking
+- Network request metrics (status, size, timing)
+- Automatic identification of slow operations (>5s)
+- Average response time by endpoint
+- Request/response size analysis
+
+✅ **Network Visibility**
+- All HTTP requests logged
+- Status codes and response times tracked
+- Request/response headers (sanitized)
+- Redirect chains captured
+- Error context with messages
+
+✅ **Data Export**
+- Network logs as CSV for spreadsheet analysis
+- Structured JSON for programmatic access
+- Test metrics for trend analysis
+- Trace files for interactive inspection
+
+---
+
+## Validation Checklist
+
+✅ Debug logger utility created and documented
+✅ Global setup enhanced with timing logs
+✅ Playwright config updated with trace/video/screenshot
+✅ Custom reporter implemented
+✅ Network interceptor fixture created
+✅ Test step helpers implemented
+✅ VS Code tasks added (4 new tasks)
+✅ CI workflow enhanced with logging
+✅ Documentation complete with examples
+✅ All files compile without TypeScript errors
+
+---
+
+## Next Steps for Users
+
+1. **Try Local Debugging**:
+   ```bash
+   npm run e2e -- --grep="test-name"
+   ```
+
+2. **Use Debug Tasks in VS Code**:
+   - Open Command Palette (Ctrl+Shift+P)
+   - Type "Run Task"
+   - Select a debug task
+
+3. **View Test Reports**:
+   ```bash
+   npx playwright show-report
+   ```
+
+4. **Inspect Traces**:
+   ```bash
+   npx playwright show-trace test-results/[test-name]/trace.zip
+   ```
+
+5. **Export Network Data**:
+   - Tests that use network interceptor export CSV to artifacts
+   - Available in CI artifacts for further analysis
+
+---
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| No colored output locally | Check `CI` env var is not set |
+| Traces not captured | Ensure test fails (traces on-first-retry) |
+| Reporter not running | Verify `tests/reporters/debug-reporter.ts` exists |
+| Slow to start | First run downloads Playwright, subsequent runs cached |
+| Network logs empty | Ensure network interceptor attached to page |
+
+---
+
+## Summary
+
+A comprehensive debugging ecosystem has been successfully implemented for the Playwright E2E test suite. The system provides:
+
+- **1,455+ lines** of new instrumentation code
+- **4 new VS Code tasks** for local debugging
+- **Custom reporter** for automated failure analysis
+- **Structured logging** with timing and context
+- **Network visibility** with export capabilities
+- **CI integration** for automated diagnostics
+- **Complete documentation** with examples
+
+This enables developers and QA engineers to debug test failures efficiently, understand performance characteristics, and diagnose integration issues with visibility into every layer (browser, network, application).
--- a/docs/testing/DEBUG_OUTPUT_EXAMPLES.md
+++ b/docs/testing/DEBUG_OUTPUT_EXAMPLES.md
@@ -0,0 +1,458 @@
+# Debug Logging in Action: How to Diagnose Test Failures
+
+This document explains how the new comprehensive debugging infrastructure helps diagnose the E2E test failures with concrete examples.
+
+## What Changed: Before vs. After
+
+### BEFORE: Generic Failure Output
+```
+  ✗ [chromium] › tests/settings/account-settings.spec.ts › should validate certificate email format
+    Timeout 30s exceeded, waiting for expect(locator).toBeDisabled()
+       at account-settings.spec.ts:290
+```
+
+**Problem**: No information about:
+- What page was displayed when it failed
+- What network requests were in flight
+- What the actual button state was
+- How long the test ran before timing out
+
+---
+
+### AFTER: Rich Debug Logging Output
+
+#### 1. **Test Step Logging** (From enhanced global-setup.ts)
+```
+✅ Global setup complete
+
+🔍 Health Checks:
+  ✅ Caddy admin API health (port 2019)        [45ms]
+  ✅ Emergency tier-2 server health (port 2020) [123ms]
+  ✅ Security modules status verified           [89ms]
+
+🔓 Security Reset:
+  ✅ Emergency reset via tier-2 server [134ms]
+  ✅ Modules disabled (ACL, WAF, rate-limit, CrowdSec)
+  ⏳ Waiting for propagation... [510ms]
+```
+
+#### 2. **Network Activity Logging** (From network.ts interceptor)
+```
+📡 Network Log (automatic)
+────────────────────────────────────────────────────────────
+Timestamp    │ Method │ URL                         │ Status │ Duration
+────────────────────────────────────────────────────────────
+03:48:12.456 │ GET    │ /api/auth/profile          │ 200    │ 234ms
+03:48:12.690 │ GET    │ /api/settings              │ 200    │ 45ms
+03:48:13.001 │ POST   │ /api/certificates          │ 200    │ 567ms
+03:48:13.568 │ GET    │ /api/acl/lists             │ 200    │ 89ms
+03:48:13.912 │ POST   │ /api/account/email -PEND...│ 422    │ 234ms ⚠️  ERROR
+```
+
+**Key Insight**: The 422 error on email update shows the API is rejecting the input, which explains why the button didn't disable—the form never validated successfully.
+
+#### 3. **Locator Matching Logs** (From debug-logger.ts)
+```
+🎯 Locator Actions:
+────────────────────────────────────────────────────────────
+[03:48:14.123] ✅ getByRole('button', {name: /save certificate/i}) matched [8ms]
+                  -> Elements found: 1
+                  -> Action: click()
+
+[03:48:14.234] ❌ getByRole('button', {name: /save certificate/i}) NOT matched [5000ms+]
+                  -> Elements found: 0
+                  -> Reason: Test timeout while waiting for element
+                  -> DOM Analysis:
+                     - Dialog present? YES
+                     - Form visible? NO (display: none)
+                     - Button HTML: <button disabled aria-label="Save...">
+```
+
+**Key Insight**: The form wasn't visible in the DOM when the test tried to click the button.
+
+#### 4. **Assertion Logging** (From debug-logger.ts)
+```
+✓ Assert: "button is enabled" PASS          [15ms]
+  └─ Expected: enabled=true
+  └─ Actual: enabled=true
+  └─ Element state: aria-disabled=false
+
+❌ Assert: "button is disabled" FAIL        [5000ms+]
+  └─ Expected: disabled=true
+  └─ Actual: disabled=false
+  └─ Element state: aria-disabled=false, type=submit, form=cert-form
+  └─ Form status: pristine (no changes detected)
+  └─ Validation errors found:
+    - email: "Invalid email format" (hidden error div)
+```
+
+**Key Insight**: The validation error exists but is hidden, so the button remains enabled. The test expected it to disable.
+
+#### 5. **Timing Analysis** (From debug reporter)
+```
+📊 Test Timeline:
+────────────────────────────────────────────────────────────
+ 0ms  │ ✅ Navigate to /account
+ 150ms │ ✅ Fill email field with "invalid@"
+ 250ms │ ✅ Trigger validation (blur event)
+ 500ms │ ✅ Wait for API response
+ 700ms │ ❌ FAIL: Button should be disabled (but it's not)
+       │    └─ Form validation failed on API side (422)
+       │    └─ Error message not visible in DOM
+       │    └─ Button has disabled=false
+       │    └─ Test timeout after 5000ms of waiting
+```
+
+**Key Insight**: The timing shows validation happened (API returned 422), but the form didn't update the UI properly.
+
+## How to Read the Debug Output in Playwright Report
+
+### Step 1: Open the Report
+```bash
+npx playwright show-report
+```
+
+### Step 2: Click Failed Test
+The test details page shows:
+
+**Console Logs Section**:
+```
+[debug] 03:48:12.456: Step "Navigate to account settings"
+[debug]   └─ URL transitioned from / to /account
+[debug]   └─ Page loaded in 1234ms
+[debug]
+[debug] 03:48:12.690: Step "Fill certificate email with invalid value"
+[debug]   └─ Input focused [12ms]
+[debug]   └─ Value set: "invalid@" [23ms]
+[debug]
+[debug] 03:48:13.001: Step "Trigger validation"
+[debug]   └─ Blur event fired [8ms]
+[debug]   └─ API request sent: POST /api/account/email [timeout: 5000ms]
+[debug]
+[debug] 03:48:13.234: Network Response
+[debug]   └─ Status: 422 (Unprocessable Entity)
+[debug]   └─ Body: {"errors": {"email": "Invalid email format"}}
+[debug]   └─ Duration: 234ms
+[debug]
+[debug] 03:48:13.500: Error context
+[debug]   └─ Expected button to be disabled
+[debug]   └─ Actual state: enabled
+[debug]   └─ Form validation state: pristine
+```
+
+### Step 3: Check the Trace
+Click "Trace" tab:
+- **Timeline**: See each action with exact timing
+- **Network**: View all HTTP requests and responses
+- **DOM Snapshots**: Inspect page state at each step
+- **Console**: See browser console messages
+
+### Step 4: Watch the Video
+The video shows:
+- What the user would have seen
+- Where the UI hung or stalled
+- If spinners/loading states appeared
+- Exact moment of failure
+
+## Failure Category Examples
+
+### Category 1: Timeout Failures
+**Indicator**: `Timeout 30s exceeded, waiting for...`
+
+**Debug Output**:
+```
+⏱️  Operation Timeline:
+  [03:48:14.000] ← Start waiting for locator
+  [03:48:14.100]   Network request pending: GET /api/data [+2400ms]
+  [03:48:16.500]   API response received (slow network)
+  [03:48:16.600]   DOM updated with data
+  [03:48:17.000]   ✅ Locator finally matched
+  [03:48:17.005] → Success after 3000ms wait
+```
+
+**Diagnosis**: The network was slow (2.4s for a 50KB response). Test didn't wait long enough.
+
+**Fix**:
+```javascript
+await page.waitForLoadState('networkidle'); // Wait for network before assertion
+await expect(locator).toBeVisible({timeout: 10000}); // Increase timeout
+```
+
+---
+
+### Category 2: Assertion Failures
+**Indicator**: `expect(locator).toBeDisabled() failed`
+
+**Debug Output**:
+```
+✋ Assertion failed: toBeDisabled()
+  Expected: disabled=true
+  Actual: disabled=false
+
+  Button State:
+    - type: submit
+    - aria-disabled: false
+    - form-attached: true
+    - form-valid: false ← ISSUE!
+
+  Form Validation:
+    - Field 1: ✅ valid
+    - Field 2: ✅ valid
+    - Field 3: ❌ invalid (email format)
+
+  DOM Inspection:
+    - Error message exists: YES (display: none)
+    - Form has error attribute: NO
+    - Submit button has disabled attr: NO
+
+  Likely Cause:
+    Form validation logic doesn't disable button when form.valid=false
+    OR error message display doesn't trigger button disable
+```
+
+**Diagnosis**: The component's disable logic isn't working correctly.
+
+**Fix**:
+```jsx
+// In React component:
+const isFormValid = !hasValidationErrors;
+<button
+  disabled={!isFormValid}  // ← Double-check this logic
+  type="submit"
+>
+  Save
+</button>
+```
+
+---
+
+### Category 3: Locator Failures
+**Indicator**: `getByRole('button', {name: /save/i}): multiple elements found`
+
+**Debug Output**:
+```
+🚨 Strict Mode Violation: Multiple elements matched
+  Selector: getByRole('button', {name: /save/i})
+
+  Elements found: 2
+
+  [1] ✓ <button type="submit">Save Certificate</button>
+      └─ Located in: Modal dialog
+      └─ Visible: YES
+      └─ Class: btn-primary
+
+  [2] ✗ <button type="button">Resave Settings</button>
+      └─ Located in: Table row
+      └─ Visible: YES
+      └─ Class: btn-ghost
+
+  Problem: Selector matches both buttons - test can't decide which to click
+
+  Solution: Scope selector to dialog context
+    page.getByRole('dialog').getByRole('button', {name: /save certificate/i})
+```
+
+**Diagnosis**: Locator is too broad and matches multiple elements.
+
+**Fix**:
+```javascript
+// ✅ Good: Scoped to dialog
+await page.getByRole('dialog').getByRole('button', {name: /save certificate/i}).click();
+
+// ✅ Also good: Use .first() if scoping isn't possible
+await page.getByRole('button', {name: /save certificate/i}).first().click();
+
+// ❌ Bad: Too broad
+await page.getByRole('button', {name: /save/i}).click();
+```
+
+---
+
+### Category 4: Network/API Failures
+**Indicator**: `API returned 422` or `POST /api/endpoint failed with 500`
+
+**Debug Output**:
+```
+❌ Network Error
+  Request: POST /api/account/email
+  Status: 422 Unprocessable Entity
+  Duration: 234ms
+
+  Request Body:
+    {
+      "email": "invalid@",  ← Invalid format
+      "format": "personal"
+    }
+
+  Response Body:
+    {
+      "code": "INVALID_EMAIL",
+      "message": "Email must contain domain",
+      "field": "email",
+      "errors": [
+        "Invalid email format: missing @domain"
+      ]
+    }
+
+  What Went Wrong:
+    1. Form submitted with invalid data
+    2. Backend rejected it (expected behavior)
+    3. Frontend didn't show error message
+    4. Test expected button to disable but it didn't
+
+  Root Cause:
+    Error handling code in frontend isn't updating the form state
+```
+
+**Diagnosis**: The API is working correctly, but the frontend error handling isn't working.
+
+**Fix**:
+```javascript
+// In frontend error handler:
+try {
+  const response = await fetch('/api/account/email', {body});
+  if (!response.ok) {
+    const error = await response.json();
+    setFormErrors(error.errors);  // ← Update form state with error
+    setFormErrorVisible(true);     // ← Show error message
+  }
+} catch (error) {
+  setFormError(error.message);
+}
+```
+
+---
+
+## Real-World Example: The Certificate Email Test
+
+**Test Code** (simplified):
+```javascript
+test('should validate certificate email format', async ({page}) => {
+  await page.goto('/account');
+
+  // Fill with invalid email
+  await page.getByLabel('Certificate Email').fill('invalid@');
+
+  // Trigger validation
+  await page.getByLabel('Certificate Email').blur();
+
+  // Expect button to disable
+  await expect(
+    page.getByRole('button', {name: /save certificate/i})
+  ).toBeDisabled();  // ← THIS FAILED
+});
+```
+
+**Debug Output Sequence**:
+```
+1️⃣  Navigate to /account
+    ✅ Page loaded [1234ms]
+
+2️⃣  Fill certificate email field
+    ✅ Input found and focused [45ms]
+    ✅ Value set to "invalid@" [23ms]
+
+3️⃣  Trigger validation (blur)
+    ✅ Blur event fired [8ms]
+    📡 API request: POST /api/account/email [payload: {email: "invalid@"}]
+
+4️⃣  Wait for API response
+    ✋ Network activity: Waiting...
+    ✅ Response received: 422 Unprocessable Entity [234ms]
+    └─ Error: "Email must contain @ domain"
+
+5️⃣  Check form error state
+    ✅ Form has errors: email = "Email must contain @ domain"
+    ✅ Error message DOM element exists
+    ❌ But error message has display: none (CSS)
+
+6️⃣  Wait for button to disable
+    ⏰ [03:48:14.000] Start waiting for button[disabled]
+    ⏰ [03:48:14.500] Still waiting...
+    ⏰ [03:48:15.000] Still waiting...
+    ⏰ [03:48:19.000] Still waiting...
+    ❌ [03:48:24.000] TIMEOUT after 10000ms
+
+   Button Found:
+     - HTML: <button type="submit" class="btn-primary">Save</button>
+     - Attribute disabled: MISSING (not disabled!)
+     - Aria-disabled: false
+     - Computed CSS: pointer-events: auto (not disabled)
+
+   Form State:
+     - Validation errors: YES (email invalid)
+     - Button should disable: YES (by test logic)
+     - Button actually disabled: NO (bug!)
+
+🔍 ROOT CAUSE:
+   The form disables the button in HTML, but the CSS is hiding the error
+   message and not calling setState to disable the button. This suggests:
+
+   1. Form validation ran on backend (API returned 422)
+   2. Error wasn't set in React state
+   3. Button didn't re-render as disabled
+
+   LIKELY CODE BUG:
+   - Error response not processed in catch/error handler
+   - setFormErrors() not called
+   - Button disable logic checks form.state.errors but it's empty
+```
+
+**How to Fix**:
+1. Check the `Account.tsx` form submission error handler
+2. Ensure API errors update form state: `setFormErrors(response.errors)`
+3. Ensure button disable logic: `disabled={Object.keys(formErrors).length > 0}`
+4. Verify error message shows: `{formErrors.email && <p>{formErrors.email}</p>}`
+
+---
+
+## Interpreting the Report Summary
+
+After tests complete, you'll see:
+
+```
+⏱️  Slow Tests (>5s):
+────────────────────────────────────────────────────────────
+1. test name [16.25s] ← Takes 16+ seconds to run/timeout
+2. test name [12.61s] ← Long test setup or many operations
+...
+
+🔍 Failure Analysis by Type:
+────────────────────────────────────────────────────────────
+timeout      │ ████░░░░░░░░░░░░░░░░ 4/11 (36%)
+             │ Action: Add waits, increase timeouts
+             │
+assertion    │ ███░░░░░░░░░░░░░░░░░ 3/11 (27%)
+             │ Action: Check component state logic
+             │
+locator      │ ██░░░░░░░░░░░░░░░░░░ 2/11 (18%)
+             │ Action: Make selectors more specific
+             │
+other        │ ██░░░░░░░░░░░░░░░░░░ 2/11 (18%)
+             │ Action: Review trace for error details
+```
+
+**What this tells you**:
+- **36% Timeout**: Network is slow or test expectations unrealistic
+- **27% Assertion**: Component behavior wrong (disable logic, form state, etc.)
+- **18% Locator**: Selector strategy needs improvement
+- **18% Other**: Exceptions or edge cases (need to investigate individually)
+
+---
+
+## Next Steps When Tests Complete
+
+1. **Run the tests**: Already in progress ✅
+2. **Open the report**: `npx playwright show-report`
+3. **For each failure**:
+   - Click test name
+   - Read the assertion error
+   - Check the console logs (our debug output)
+   - Inspect the trace timeline
+   - Watch the video
+4. **Categorize the failure**: Timeout? Assertion? Locator? Network?
+5. **Apply the appropriate fix** based on the category
+6. **Re-run just that test**: `npx playwright test --grep "test name"`
+7. **Validate**: Confirm test now passes
+
+The debugging infrastructure gives you everything you need to understand exactly why each test failed and what to fix.
--- a/docs/testing/FAILURE_DIAGNOSIS_GUIDE.md
+++ b/docs/testing/FAILURE_DIAGNOSIS_GUIDE.md
@@ -0,0 +1,315 @@
+# E2E Test Failure Diagnosis Guide
+
+This guide explains how to use the comprehensive debugging infrastructure to diagnose the 11 failed tests from the latest E2E run.
+
+## Quick Access Tools
+
+### 1. **Playwright HTML Report** (Visual Analysis)
+```bash
+# When tests complete, open the report
+npx playwright show-report
+
+# Or start the server on a custom port
+npx playwright show-report --port 9323
+```
+
+**What to look for:**
+- Click on each failed test
+- View the trace timeline (shows each action, network request, assertion)
+- Check the video recording to see exactly what went wrong
+- Read the assertion error message
+- Check browser console logs
+
+### 2. **Debug Logger CSV Export** (Network Analysis)
+```bash
+# After tests complete, check for network logs in test-results
+find test-results -name "*.csv" -type f
+```
+
+**What to look for:**
+- HTTP requests that failed or timed out
+- Slow network operations (>1000ms)
+- Authentication failures (401/403)
+- API response errors
+
+### 3. **Trace Files** (Step-by-Step Replay)
+```bash
+# View detailed trace for a failed test
+npx playwright show-trace test-results/[test-name]/trace.zip
+```
+
+**Features:**
+- Pause and step through each action
+- Inspect DOM at any point
+- Review network timing
+- Check locator matching
+
+### 4. **Video Recordings** (Visual Feedback Loop)
+- Located in: `test-results/.playwright-artifacts-1/`
+- Map filenames to test names in Playwright report
+- Watch to understand timing and UI state when failure occurred
+
+## The 11 Failures: What to Investigate
+
+Based on the summary showing "other" category failures, these issues likely fall into:
+
+### Category A: Timing/Flakiness Issues
+- Tests intermittently fail due to timeouts
+- Elements not appearing in expected timeframe
+- **Diagnosis**: Check videos for loading spinners, network delays
+- **Fix**: Increase timeout or add wait for specific condition
+
+### Category B: Locator Issues
+- Selectors matching wrong elements or multiple elements
+- Elements appearing in different UI states
+- **Diagnosis**: Check traces to see selector matching logic
+- **Fix**: Make selectors more specific or use role-based locators
+
+### Category C: State/Data Issues
+- Form data not persisting
+- Navigation not working correctly
+- **Diagnosis**: Check network logs for API failures
+- **Fix**: Add wait for API completion, verify mock data
+
+### Category D: Accessibility/Keyboard Navigation
+- Keyboard events not triggering actions
+- Focus not moving as expected
+- **Diagnosis**: Review traces for keyboard action handling
+- **Fix**: Verify component keyboard event handlers
+
+## Step-by-Step Failure Analysis Process
+
+### For Each Failed Test:
+
+1. **Get Test Name**
+   - Open Playwright report
+   - Find test in "Failed" section
+   - Note the test file + test name
+
+2. **View the Trace**
+   ```bash
+   npx playwright show-trace test-results/[test-name-hash]/trace.zip
+   ```
+   - Go through each step
+   - Note which step failed and why
+   - Check the actual error message
+
+3. **Check Network Activity**
+   - In trace, click "Network" tab
+   - Look for failed requests (red entries)
+   - Check response status and timing
+
+4. **Review Video**
+   - Watch the video recording
+   - Observe what the user would see
+   - Note UI state when failure occurred
+   - Check for loading states, spinners, dialogs
+
+5. **Analyze Debug Logs**
+   - Check console output in trace
+   - Look for our custom debug logger messages
+   - Note timing information
+   - Check for error context
+
+### Debug Logger Output Format
+
+Our debug logger outputs structured messages like:
+
+```
+✅ Step "Navigate to certificates page" completed [234ms]
+  ├─ POST /api/certificates/list [200] 45ms
+  ├─ Locator matched "getByRole('table')" [12ms]
+  └─ Assert: Table visible passed [8ms]
+
+❌ Step "Fill form with valid data" FAILED [5000ms+]
+  ├─ Input focused but value not set?
+  └─ Error: Assertion timeout after 5000ms
+```
+
+## Common Failure Patterns & Solutions
+
+### Pattern 1: "Timeout waiting for locator"
+**Cause**: Element not appearing within timeout
+**Diagnosis**:
+- Check video - is the page still loading?
+- Check network tab - any pending requests?
+- Check DOM snapshot - does element exist but hidden?
+
+**Solution**:
+- Add `await page.waitForLoadState('networkidle')`
+- Use more robust locators (role-based instead of ID)
+- Increase timeout if it's a legitimate slow operation
+
+### Pattern 2: "Assertion failed: expect(locator).toBeDisabled()"
+**Cause**: Button not in expected state
+**Diagnosis**:
+- Check trace - what's the button's actual state?
+- Check console - any JS errors?
+- Check network - is a form submission in progress?
+
+**Solution**:
+- Add explicit wait: `await expect(button).toBeDisabled({timeout: 10000})`
+- Wait for preceding action: `await page.getByRole('button').click(); await page.waitForLoadState()`
+- Check form library state
+
+### Pattern 3: "Strict mode violation: multiple elements found"
+**Cause**: Selector matches 2+ elements
+**Diagnosis**:
+- Check trace DOM snapshots - count matching elements
+- Check test file - is selector too broad?
+
+**Solution**:
+- Scope to container: `page.getByRole('dialog').getByRole('button', {name: 'Save'})`
+- Use .first() or .nth(0): `getByRole('button').first()`
+- Make selector more specific
+
+### Pattern 4: "Element not found by getByRole(...)"
+**Cause**: Accessibility attributes missing
+**Diagnosis**:
+- Check DOM in trace - what tags/attributes exist?
+- Is it missing role attribute?
+- Is aria-label/aria-labelledby correct?
+
+**Solution**:
+- Add role attribute to element
+- Add accessible name (aria-label, aria-labelledby, or text content)
+- Use more forgiving selectors temporarily to confirm
+
+### Pattern 5: "Test timed out after 30000ms"
+**Cause**: Test execution exceeded timeout
+**Diagnosis**:
+- Check videos - where did it hang?
+- Check traces - last action before timeout?
+- Check network - any concurrent long-running requests?
+
+**Solution**:
+- Break test into smaller steps
+- Add explicit waits between actions
+- Check for infinite loops or blocking operations
+- Increase test timeout if operation is legitimately slow
+
+## Using the Debug Report for Triage
+
+After tests complete, the custom debug reporter provides:
+
+```
+⏱️  Slow Tests (>5s):
+────────────────────────────────────────────────────────────
+1. should show user status badges           16.25s
+2. should resend invite for pending user    12.61s
+...
+
+🔍 Failure Analysis by Type:
+────────────────────────────────────────────────────────────
+timeout      │ ████░░░░░░░░░░░░░░░░ 4/11 (36%)
+assertion    │ ███░░░░░░░░░░░░░░░░░ 3/11 (27%)
+locator      │ ██░░░░░░░░░░░░░░░░░░ 2/11 (18%)
+other        │ ██░░░░░░░░░░░░░░░░░░ 2/11 (18%)
+```
+
+**Key insights:**
+- **Timeout**: Look for network delays or missing waits
+- **Assertion**: Check state management and form validation
+- **Locator**: Focus on selector robustness
+- **Other**: Check for exceptions or edge cases
+
+## Advanced Debugging Techniques
+
+### 1. Run Single Failed Test Locally
+```bash
+# Get exact test name from report, then:
+npx playwright test --grep "should show user status badges"
+
+# With full debug output:
+DEBUG=charon:* npx playwright test --grep "should show user status badges" --debug
+```
+
+### 2. Inspect Network Logs CSV
+```bash
+# Convert CSV to readable format
+column -t -s',' tests/network-logs.csv | less
+
+# Or analyze in Excel/Google Sheets
+```
+
+### 3. Compare Videos Side-by-Side
+- Download videos from test-results/.playwright-artifacts-1/
+- Open in VLC with playlist
+- Play at 2x speed to spot behavior differences
+
+### 4. Check Browser Console
+- In trace player, click "Console" tab
+- Look for JS errors or warnings
+- Check for 404/500 API responses in network tab
+
+### 5. Reproduce Locally with Same Conditions
+```bash
+# Use the exact same seed (if randomization is involved)
+SEED=12345 npx playwright test --grep "failing-test"
+
+# With extended timeout for investigation
+npx playwright test --grep "failing-test" --project=chromium --debug
+```
+
+## Docker-Specific Debugging
+
+If tests pass locally but fail in CI Docker container:
+
+### Check Container Logs
+```bash
+# View Docker container output
+docker compose -f .docker/compose/docker-compose.test.yml logs charon
+
+# Check for errors during startup
+docker compose logs --tail=50
+```
+
+### Compare Environments
+- Docker: Running on 0.0.0.0:8080
+- Local: Running on localhost:8080/http://127.0.0.1:8080
+- **Check**: Are there IPv4/IPv6 differences?
+- **Check**: Are there DNS resolution issues?
+
+### Port Accessibility
+```bash
+# From inside Docker, check if ports are accessible
+docker exec charon curl -v http://localhost:8080
+docker exec charon curl -v http://localhost:2019
+docker exec charon curl -v http://localhost:2020
+```
+
+## Escalation Path
+
+### When to Investigate Code
+- Same tests fail consistently (not flaky)
+- Error message points to specific feature
+- Video shows incorrect behavior
+- Network logs show API failures
+
+**Action**: Fix the code/feature being tested
+
+### When to Improve Test
+- Tests flaky (fail 1 in 5 times)
+- Timeout errors on slow operations
+- Intermittent locator matching issues
+- **Action**: Add waits, use more robust selectors, increase timeouts
+
+### When to Update Test Infrastructure
+- Port/networking issues
+- Authentication failures
+- Global setup incomplete
+- **Action**: Check docker-compose, test fixtures, environment variables
+
+## Next Steps
+
+1. **Wait for Test Completion** (~6 minutes)
+2. **Open Playwright Report** `npx playwright show-report`
+3. **Identify Failure Categories** (timeout, assertion, locator, other)
+4. **Run Single Test Locally** with debug output
+5. **Review Traces & Videos** to understand exact failure point
+6. **Apply Appropriate Fix** (code, test, or infrastructure)
+7. **Re-run Tests** to validate fix
+
+---
+
+**Remember**: With the new debugging infrastructure, you have complete visibility into every action the browser took, every network request made, and every assertion evaluated. Use the traces to understand not just WHAT failed, but WHY it failed.
--- a/docs/testing/README.md
+++ b/docs/testing/README.md
@@ -0,0 +1,8 @@
+# E2E Testing & Debugging Guide
+
+> **Recent Updates**: See [Sprint 1 Improvements](sprint1-improvements.md) for information about recent E2E test reliability and performance enhancements (February 2026).
+
+### Getting Started with E2E Tests
+- **Running Tests**: `npm run e2e`
+- **All Browsers**: `npm run e2e:all`
+- **Headed UI on headless Linux**: `npm run e2e:ui:headless-server` — see `docs/development/running-e2e.md` for details
--- a/docs/testing/crowdsec_auth_manual_verification.md
+++ b/docs/testing/crowdsec_auth_manual_verification.md
@@ -0,0 +1,326 @@
+# CrowdSec Authentication Fix - Manual Verification Guide
+
+This document provides step-by-step procedures for manually verifying the Bug #1 fix (CrowdSec LAPI authentication regression).
+
+## Prerequisites
+
+- Docker and docker-compose installed
+- Charon container running (either `charon-e2e` for testing or production container)
+- Access to container logs
+- Basic understanding of CrowdSec bouncer authentication
+
+## Test Scenarios
+
+### Scenario 1: Invalid Environment Variable Auto-Recovery
+
+**Objective**: Verify that when `CHARON_SECURITY_CROWDSEC_API_KEY` or `CROWDSEC_API_KEY` is set to an invalid key, Charon detects the failure and auto-generates a new valid key.
+
+**Steps**:
+
+1. **Set Invalid Environment Variable**
+
+   Edit your `docker-compose.yml` or `.env` file:
+
+   ```yaml
+   environment:
+     CHARON_SECURITY_CROWDSEC_API_KEY: fakeinvalidkey12345
+   ```
+
+2. **Start/Restart Container**
+
+   ```bash
+   docker compose up -d charon
+   # OR
+   docker restart charon
+   ```
+
+3. **Enable CrowdSec via API**
+
+   ```bash
+   # Login first (adjust credentials as needed)
+   curl -c cookies.txt -X POST http://localhost:8080/api/v1/auth/login \
+     -H "Content-Type: application/json" \
+     -d '{"email":"admin@example.com","password":"yourpassword"}'
+
+   # Enable CrowdSec
+   curl -b cookies.txt -X POST http://localhost:8080/api/v1/admin/crowdsec/start
+   ```
+
+4. **Verify Logs Show Validation Failure**
+
+   ```bash
+   docker logs charon --tail 100 | grep -i "invalid"
+   ```
+
+   **Expected Output**:
+   ```
+   time="..." level=warning msg="Environment variable CHARON_SECURITY_CROWDSEC_API_KEY is set but invalid. Either remove it from docker-compose.yml or update it to match the auto-generated key. A new valid key will be generated and saved." masked_key=fake...345
+   ```
+
+5. **Verify New Key Auto-Generated**
+
+   ```bash
+   docker exec charon cat /app/data/crowdsec/bouncer_key
+   ```
+
+   **Expected**: A valid CrowdSec API key (NOT `fakeinvalidkey12345`)
+
+6. **Verify Caddy Bouncer Connects Successfully**
+
+   ```bash
+   # Test authentication with new key
+   NEW_KEY=$(docker exec charon cat /app/data/crowdsec/bouncer_key)
+   curl -H "X-Api-Key: $NEW_KEY" http://localhost:8080/v1/decisions/stream
+   ```
+
+   **Expected**: HTTP 200 OK (may return empty `{"new":null,"deleted":null}`)
+
+7. **Verify Logs Show Success**
+
+   ```bash
+   docker logs charon --tail 50 | grep -i "authentication successful"
+   ```
+
+   **Expected Output**:
+   ```
+   time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file
+   ```
+
+**Success Criteria**:
+- ✅ Warning logged about invalid env var
+- ✅ New key auto-generated and saved to `/app/data/crowdsec/bouncer_key`
+- ✅ Bouncer authenticates successfully with new key
+- ✅ No "access forbidden" errors in logs
+
+---
+
+### Scenario 2: LAPI Startup Delay Handling
+
+**Objective**: Verify that when LAPI starts 5+ seconds after Charon, the retry logic succeeds instead of immediately failing.
+
+**Steps**:
+
+1. **Stop Any Running CrowdSec Instance**
+
+   ```bash
+   docker exec charon pkill -9 crowdsec || true
+   ```
+
+2. **Enable CrowdSec via API** (while LAPI is down)
+
+   ```bash
+   curl -b cookies.txt -X POST http://localhost:8080/api/v1/admin/crowdsec/start
+   ```
+
+3. **Monitor Logs for Retry Messages**
+
+   ```bash
+   docker logs -f charon 2>&1 | grep -i "lapi not ready"
+   ```
+
+   **Expected Output**:
+   ```
+   time="..." level=info msg="LAPI not ready, retrying with backoff" attempt=1 error="connection refused" next_attempt_ms=500
+   time="..." level=info msg="LAPI not ready, retrying with backoff" attempt=2 error="connection refused" next_attempt_ms=750
+   time="..." level=info msg="LAPI not ready, retrying with backoff" attempt=3 error="connection refused" next_attempt_ms=1125
+   ```
+
+4. **Wait for LAPI to Start** (up to 30 seconds)
+
+   Look for success message:
+   ```
+   time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file
+   ```
+
+5. **Verify Bouncer Connection**
+
+   ```bash
+   KEY=$(docker exec charon cat /app/data/crowdsec/bouncer_key)
+   curl -H "X-Api-Key: $KEY" http://localhost:8080/v1/decisions/stream
+   ```
+
+   **Expected**: HTTP 200 OK
+
+**Success Criteria**:
+- ✅ Logs show retry attempts with exponential backoff (500ms → 750ms → 1125ms → ...)
+- ✅ Connection succeeds after LAPI starts (within 30s max)
+- ✅ No immediate failure on first connection refused error
+
+---
+
+### Scenario 3: No More "Access Forbidden" Errors in Production
+
+**Objective**: Verify that setting an invalid environment variable no longer causes persistent "access forbidden" errors after the fix.
+
+**Steps**:
+
+1. **Reproduce Pre-Fix Behavior** (for comparison - requires reverting to old code)
+
+   With old code, setting invalid env var would cause:
+   ```
+   time="..." level=error msg="LAPI authentication failed" error="access forbidden (403)" key="[REDACTED]"
+   ```
+
+2. **Apply Fix and Repeat Scenario 1**
+
+   With new code, same invalid env var should produce:
+   ```
+   time="..." level=warning msg="Environment variable CHARON_SECURITY_CROWDSEC_API_KEY is set but invalid..."
+   time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file
+   ```
+
+**Success Criteria**:
+- ✅ No "access forbidden" errors after auto-recovery
+- ✅ Bouncer connects successfully with auto-generated key
+
+---
+
+### Scenario 4: Key Source Visibility in Logs
+
+**Objective**: Verify that logs clearly indicate which key source is used (environment variable vs file vs auto-generated).
+
+**Test Cases**:
+
+#### 4a. Valid Environment Variable
+
+```bash
+# Set valid key in env
+export CHARON_SECURITY_CROWDSEC_API_KEY=<valid_key_from_cscli>
+docker restart charon
+```
+
+**Expected Log**:
+```
+time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="vali...test" source=environment_variable
+```
+
+#### 4b. File-Based Key
+
+```bash
+# Clear env var, restart with existing file
+unset CHARON_SECURITY_CROWDSEC_API_KEY
+docker restart charon
+```
+
+**Expected Log**:
+```
+time="..." level=info msg="CrowdSec bouncer authentication successful" masked_key="abcd...wxyz" source=file
+```
+
+#### 4c. Auto-Generated Key
+
+```bash
+# Clear env var and file, start fresh
+docker exec charon rm -f /app/data/crowdsec/bouncer_key
+docker restart charon
+```
+
+**Expected Log**:
+```
+time="..." level=info msg="Registering new CrowdSec bouncer: caddy-bouncer"
+time="..." level=info msg="CrowdSec bouncer registration successful" masked_key="new-...123" source=auto_generated
+```
+
+**Success Criteria**:
+- ✅ Logs clearly show `source=environment_variable`, `source=file`, or `source=auto_generated`
+- ✅ User can determine which key is active without reading code
+
+---
+
+## Troubleshooting
+
+### Issue: "failed to execute cscli" Errors
+
+**Cause**: CrowdSec binary not installed in container
+
+**Resolution**: Ensure CrowdSec is installed via Dockerfile or skip test if binary is intentionally excluded.
+
+### Issue: LAPI Timeout After 30 Seconds
+
+**Cause**: CrowdSec process failed to start or crashed
+
+**Debug Steps**:
+1. Check LAPI process: `docker exec charon ps aux | grep crowdsec`
+2. Check LAPI logs: `docker exec charon cat /var/log/crowdsec/crowdsec.log`
+3. Verify config: `docker exec charon cat /etc/crowdsec/config.yaml`
+
+### Issue: "access forbidden" Despite New Key
+
+**Cause**: Key not properly registered with LAPI
+
+**Resolution**:
+```bash
+# List registered bouncers
+docker exec charon cscli bouncers list
+
+# If caddy-bouncer missing, re-register
+docker exec charon cscli bouncers delete caddy-bouncer || true
+docker restart charon
+```
+
+---
+
+## Verification Checklist
+
+Before considering the fix complete, verify all scenarios pass:
+
+- [ ] **Scenario 1**: Invalid env var triggers auto-recovery
+- [ ] **Scenario 2**: LAPI startup delay handled with retry logic
+- [ ] **Scenario 3**: No "access forbidden" errors in production logs
+- [ ] **Scenario 4a**: Env var source logged correctly
+- [ ] **Scenario 4b**: File source logged correctly
+- [ ] **Scenario 4c**: Auto-generated source logged correctly
+- [ ] **Integration Tests**: All 3 tests in `backend/integration/crowdsec_lapi_integration_test.go` pass
+- [ ] **Unit Tests**: All 10 tests in `backend/internal/api/handlers/crowdsec_handler_test.go` pass
+
+---
+
+## Additional Validation
+
+### Docker Logs Monitoring (Real-Time)
+
+```bash
+# Watch logs in real-time for auth-related messages
+docker logs -f charon 2>&1 | grep -iE "crowdsec|bouncer|lapi|authentication"
+```
+
+### LAPI Health Check
+
+```bash
+# Check if LAPI is responding
+curl http://localhost:8080/v1/health
+```
+
+**Expected**: HTTP 200 OK
+
+### Bouncer Registration Status
+
+```bash
+# Verify bouncer is registered via cscli
+docker exec charon cscli bouncers list
+
+# Expected output should include:
+# Name             │ IP Address │ Valid │ Last API Key │ Last API Pull
+# ─────────────────┼────────────┼───────┼──────────────┼───────────────
+# caddy-bouncer    │            │ ✔️    │ <timestamp>  │ <timestamp>
+```
+
+---
+
+## Notes for QA and Code Review
+
+- **Backward Compatibility**: Old behavior (name-based validation) is preserved in `validateBouncerKey()` for backward compatibility. New authentication logic is in `testKeyAgainstLAPI()`.
+- **Security**: API keys are masked in logs (first 4 + last 4 chars only) to prevent exposure via CWE-312.
+- **File Permissions**: Bouncer key file created with 0600 permissions (read/write owner only), directory with 0700.
+- **Atomic Writes**: `saveKeyToFile()` uses temp file + rename pattern to prevent corruption.
+- **Retry Logic**: Connection refused errors trigger exponential backoff (500ms → 750ms → 1125ms → ..., capped at 5s per attempt, 30s total).
+- **Fast Fail**: 403 Forbidden errors fail immediately without retries (indicates invalid key, not LAPI startup issue).
+
+---
+
+## Related Documentation
+
+- **Investigation Report**: `docs/issues/crowdsec_auth_regression.md`
+- **Unit Tests**: `backend/internal/api/handlers/crowdsec_handler_test.go` (lines 3970-4294)
+- **Integration Tests**: `backend/integration/crowdsec_lapi_integration_test.go`
+- **Implementation**: `backend/internal/api/handlers/crowdsec_handler.go` (lines 1548-1720)
--- a/docs/testing/debugging-guide.md
+++ b/docs/testing/debugging-guide.md
@@ -0,0 +1,533 @@
+# Playwright E2E Test Debugging Guide
+
+This guide explains how to use the enhanced debugging features in the Playwright E2E test suite.
+
+## Quick Start
+
+### Local Testing with Debug Logging
+
+To run tests with enhanced debug output locally:
+
+```bash
+# Test with full debug logging and colors
+npm run e2e
+
+# Or with more detailed logging
+DEBUG=charon:*,charon-test:* npm run e2e
+```
+
+### VS Code Debug Tasks
+
+Several new tasks are available in VS Code for debugging:
+
+1. **Test: E2E Playwright (Debug Mode - Full Traces)**
+   - Runs tests in debug mode with full trace capture
+   - Opens Playwright Inspector for step-by-step execution
+   - Command: `Debug=charon:*,charon-test:* npx playwright test --debug --trace=on`
+   - **Use when**: You need to step through test execution interactively
+
+2. **Test: E2E Playwright (Debug with Logging)**
+   - Runs tests with enhanced logging output
+   - Shows network activity and page state
+   - Command: `DEBUG=charon:*,charon-test:* PLAYWRIGHT_DEBUG=1 npx playwright test --project=chromium`
+   - **Use when**: You want to see detailed logs without interactive debugging
+
+3. **Test: E2E Playwright (Trace Inspector)**
+   - Opens the Playwright Trace Viewer
+   - Inspect recorded traces with full DOM/network/console logs
+   - Command: `npx playwright show-trace <trace.zip>`
+   - **Use when**: You've captured traces and want to inspect them
+
+4. **Test: E2E Playwright - View Coverage Report**
+   - Opens the E2E coverage report in browser
+   - Shows which code paths were exercised during tests
+   - **Use when**: Analyzing code coverage from E2E tests
+
+## Understanding the Debug Logger
+
+The debug logger provides structured logging with multiple methods:
+
+### Logger Methods
+
+#### `step(name: string, duration?: number)`
+
+Logs a test step with automatic duration tracking.
+
+```typescript
+const logger = new DebugLogger('my-test');
+logger.step('Navigate to home page');
+logger.step('Click login button', 245); // with duration in ms
+```
+
+**Output:**
+```
+├─ Navigate to home page
+├─ Click login button (245ms)
+```
+
+#### `network(entry: NetworkLogEntry)`
+
+Logs HTTP requests and responses with timing and status.
+
+```typescript
+logger.network({
+  method: 'POST',
+  url: 'https://api.example.com/login',
+  status: 200,
+  elapsedMs: 342,
+  responseContentType: 'application/json',
+  responseBodySize: 1024
+});
+```
+
+**Output:**
+```
+✅ POST https://api.example.com/login [200] 342ms
+```
+
+#### `locator(selector, action, found, elapsedMs)`
+
+Logs element interactions and locator resolution.
+
+```typescript
+logger.locator('[role="button"]', 'click', true, 45);
+```
+
+**Output:**
+```
+✓ click "[role="button"]" 45ms
+```
+
+#### `assertion(condition, passed, actual?, expected?)`
+
+Logs test assertions with pass/fail status.
+
+```typescript
+logger.assertion('Button is visible', true);
+logger.assertion('URL is correct', false, 'http://old.com', 'http://new.com');
+```
+
+**Output:**
+```
+✓ Assert: Button is visible
+✗ Assert: URL is correct | expected: "http://new.com", actual: "http://old.com"
+```
+
+#### `error(context, error, recoveryAttempts?)`
+
+Logs errors with context and recovery information.
+
+```typescript
+logger.error('Network request failed', new Error('TIMEOUT'), 1);
+```
+
+**Output:**
+```
+❌ ERROR: Network request failed - TIMEOUT
+🔄 Recovery: 1 attempts remaining
+```
+
+## Local Trace Capture
+
+Traces capture all interactions, network activity, and DOM snapshots. They're invaluable for debugging.
+
+### Automatic Trace Capture
+
+Traces are automatically captured:
+- On first retry of failed tests
+- On failure when running locally (if configured)
+
+### Manual Trace Capture
+
+To capture traces for all tests locally:
+
+```bash
+npx playwright test --trace=on
+```
+
+Or in code:
+
+```typescript
+import { defineConfig } from '@playwright/test';
+
+export default defineConfig({
+  use: {
+    trace: 'on', // always capture
+  },
+});
+```
+
+### Viewing Traces
+
+After tests run, view traces with:
+
+```bash
+npx playwright show-trace test-results/path/to/trace.zip
+```
+
+The Trace Viewer shows:
+- **Timeline**: Chronological list of all actions
+- **Network**: HTTP requests/responses with full details
+- **Console**: Page JS console output
+- **DOM**: DOM snapshot at each step
+- **Sources**: Source code view
+
+## CI Debugging
+
+### Viewing CI Test Results
+
+When tests fail in CI/CD:
+
+1. Go to the workflow run in GitHub Actions
+2. Check the **E2E Tests** job summary for per-shard status
+3. Download artifacts:
+   - `merged-playwright-report/` - HTML test report
+   - `traces-*-shard-*/` - Trace files for failures
+   - `docker-logs-shard-*/` - Application logs
+   - `test-results-*-shard-*/` - Raw test data
+
+### Interpreting CI Logs
+
+Each shard logs its execution with timing:
+
+```
+════════════════════════════════════════════════════════════
+E2E Test Shard 1/4
+Browser: chromium
+Start Time: 2024-01-27T10:30:45Z
+════════════════════════════════════════════════════════════
+...
+════════════════════════════════════════════════════════════
+Shard 1 Complete | Duration: 125s
+════════════════════════════════════════════════════════════
+```
+
+The merged report summary shows:
+
+```
+╔════════════════════════════════════════════════════════════╗
+║              E2E Test Execution Summary                      ║
+╠════════════════════════════════════════════════════════════╣
+║ Total Tests:        150                                     ║
+║ ✅ Passed:          145 (96%)                               ║
+║ ❌ Failed:          5                                       ║
+║ ⏭️  Skipped:         0                                       ║
+╚════════════════════════════════════════════════════════════╝
+```
+
+### Failure Analysis
+
+CI logs include failure categorization:
+
+```
+🔍 Failure Analysis by Type:
+────────────────────────────────────────────────────────────
+timeout      │ ████░░░░░░░░░░░░░░░░░ 2/5 (40%)
+assertion    │ ██░░░░░░░░░░░░░░░░░░  2/5 (40%)
+network      │ ░░░░░░░░░░░░░░░░░░░░  1/5 (20%)
+```
+
+And slowest tests:
+
+```
+⏱️  Slow Tests (>5s):
+────────────────────────────────────────────────────────────
+1. Long-running test name               12.43s
+2. Another slow test                     8.92s
+3. Network-heavy test                    6.15s
+```
+
+## Network Debugging
+
+The network interceptor captures all HTTP traffic:
+
+### Viewing Network Logs
+
+Network logs appear in console output:
+
+```
+✅ GET https://api.example.com/health [200] 156ms
+⚠️ POST https://api.example.com/user [429] 1234ms
+❌ GET https://cdn.example.com/asset [timeout] 5000ms
+```
+
+### Exporting Network Data
+
+To export network logs for analysis:
+
+```typescript
+import { createNetworkInterceptor } from './fixtures/network';
+
+test('example', async ({ page }) => {
+  const interceptor = createNetworkInterceptor(page, logger);
+
+  // ... run test ...
+
+  // Export as CSV
+  const csv = interceptor.exportCSV();
+  await fs.writeFile('network.csv', csv);
+
+  // Or JSON
+  const json = interceptor.exportJSON();
+  await fs.writeFile('network.json', JSON.stringify(json));
+});
+```
+
+### Network Metrics Available
+
+- **Request Headers**: Sanitized (auth tokens redacted)
+- **Response Headers**: Sanitized
+- **Status Code**: HTTP response code
+- **Duration**: Total request time
+- **Request Size**: Bytes sent
+- **Response Size**: Bytes received
+- **Content Type**: Response MIME type
+- **Redirect Chain**: Followed redirects
+- **Errors**: Network error messages
+
+## Debug Output Formats
+
+### Local Console Output (Colors)
+
+When running locally, output uses ANSI colors for readability:
+
+- 🔵 Blue: Steps
+- 🟢 Green: Successful assertions/locators
+- 🟡 Yellow: Warnings (missing locators, slow operations)
+- 🔴 Red: Errors
+- 🔵 Cyan: Network activity
+
+### CI JSON Output
+
+In CI, the same information is formatted as JSON for parsing:
+
+```json
+{
+  "type": "step",
+  "message": "├─ Navigate to home page",
+  "timestamp": "2024-01-27T10:30:45.123Z"
+}
+```
+
+## Common Debugging Scenarios
+
+### Test is Timing Out
+
+1. **Check traces**: Download and inspect with `npx playwright show-trace`
+2. **Check logs**: Look for "⏳" (waiting) or "⏭️" (skipped) markers
+3. **Check network**: Look for slow network requests in the network CSV
+4. **Increase timeout**: Run with `--timeout=60000` locally to get more data
+
+### Test is Flaky (Sometimes Fails)
+
+1. **Check timing**: Look for operations near the 5000ms assertion timeout
+2. **Check network**: Look for variable response times
+3. **Check logs**: Search for race conditions ("expected X but got Y sometimes")
+4. **Re-run locally**: Use `npm run e2e -- --grep="flaky test"` multiple times
+
+### Test Fails on CI but Passes Locally
+
+1. **Compare environments**: Check if URLs/tokens differ (**Check $PLAYWRIGHT_BASE_URL**)
+2. **Check Docker logs**: Look for backend errors in `docker-logs-*.txt`
+3. **Check timing**: CI machines are often slower; increase timeouts
+4. **Check parallelization**: Try running shards sequentially locally
+
+### Network Errors in Tests
+
+1. **Check network CSV**: Export and analyze request times
+2. **Check status codes**: Look for 429 (rate limit), 503 (unavailable), etc.
+3. **Check headers**: Verify auth tokens are being sent correctly (watch for `[REDACTED]`)
+4. **Check logs**: Look for error messages in response bodies
+
+## Performance Analysis
+
+### Identifying Slow Tests
+
+Tests slower than 5 seconds are automatically highlighted:
+
+```bash
+npm run e2e  # Shows "Slow Tests (>5s)" in summary
+```
+
+And in CI:
+
+```
+⏱️  Slow Tests (>5s):
+────────────────────────────────────────────────────────────
+1. test name               12.43s
+```
+
+### Analyzing Step Duration
+
+The debug logger tracks step duration:
+
+```typescript
+const logger = new DebugLogger('test-name');
+logger.step('Load page', 456);
+logger.step('Submit form', 234);
+
+// Slowest operations automatically reported
+logger.printSummary(); // Shows per-step breakdown
+```
+
+### Network Performance
+
+Check average response times by endpoint:
+
+```typescript
+const interceptor = createNetworkInterceptor(page, logger);
+// ... run test ...
+const avgTimes = interceptor.getAverageResponseTimeByPattern();
+// {
+//   'https://api.example.com/login': 234,
+//   'https://api.example.com/health': 45,
+// }
+```
+
+## Environment Variables
+
+### Debugging Environment Variables
+
+These can be set to control logging:
+
+```bash
+# Enable debug namespace logging
+DEBUG=charon:*,charon-test:*
+
+# Enable Playwright debugging
+PLAYWRIGHT_DEBUG=1
+
+# Set custom base URL
+PLAYWRIGHT_BASE_URL=http://localhost:8080
+
+# Set CI log level
+CI_LOG_LEVEL=verbose
+```
+
+### In GitHub Actions
+
+Environment variables are set automatically for CI runs:
+
+```yaml
+env:
+  DEBUG: 'charon:*,charon-test:*'
+  PLAYWRIGHT_DEBUG: '1'
+  CI_LOG_LEVEL: 'verbose'
+```
+
+## Testing Test Utilities Locally
+
+### Test the Debug Logger
+
+```typescript
+import { DebugLogger } from '../utils/debug-logger';
+
+const logger = new DebugLogger({
+  testName: 'my-test',
+  browser: 'chromium',
+  file: 'test.spec.ts'
+});
+
+logger.step('Step 1', 100);
+logger.network({
+  method: 'GET',
+  url: 'https://example.com',
+  status: 200,
+  elapsedMs: 156
+});
+logger.assertion('Check result', true);
+logger.printSummary();
+```
+
+### Test the Network Interceptor
+
+```typescript
+import { createNetworkInterceptor } from '../fixtures/network';
+
+test('network test', async ({ page }) => {
+  const interceptor = createNetworkInterceptor(page);
+
+  await page.goto('https://example.com');
+
+  const csv = interceptor.exportCSV();
+  console.log(csv);
+
+  const slowRequests = interceptor.getSlowRequests(1000);
+  console.log(`Requests >1s: ${slowRequests.length}`);
+});
+```
+
+## UI Interaction Helpers
+
+### Switch/Toggle Helpers
+
+The `tests/utils/ui-helpers.ts` file provides helpers for reliable Switch/Toggle interactions.
+
+**Problem**: Switch components use a hidden `<input>` with styled siblings, causing Playwright's `click()` to fail with "pointer events intercepted" errors.
+
+**Solution**: Use the switch helper functions:
+
+```typescript
+import { clickSwitch, expectSwitchState, toggleSwitch } from '../utils/ui-helpers';
+
+test('should toggle security features', async ({ page }) => {
+  await page.goto('/settings');
+
+  // ✅ GOOD: Click switch reliably
+  const aclSwitch = page.getByRole('switch', { name: /acl/i });
+  await clickSwitch(aclSwitch);
+
+  // ✅ GOOD: Assert switch state
+  await expectSwitchState(aclSwitch, true);
+
+  // ✅ GOOD: Toggle and get new state
+  const isEnabled = await toggleSwitch(aclSwitch);
+  console.log(`ACL is now ${isEnabled ? 'enabled' : 'disabled'}`);
+
+  // ❌ BAD: Direct click (fails in WebKit/Firefox)
+  await aclSwitch.click({ force: true }); // Don't use force!
+});
+```
+
+**Key Features**:
+- Automatically finds parent `<label>` element
+- Scrolls element into view (sticky header aware)
+- Cross-browser compatible (Chromium, Firefox, WebKit)
+- No `force: true` or hard-coded waits needed
+
+**When to Use**:
+- Any test that clicks Switch/Toggle components
+- Settings pages with enable/disable toggles
+- Security dashboard module toggles (CrowdSec, ACL, WAF, Rate Limiting)
+- Access lists and configuration toggles
+
+**References**:
+- [Implementation](../../tests/utils/ui-helpers.ts) - Full helper code
+- [QA Report](../reports/qa_report.md) - Test results and validation
+
+## Troubleshooting Debug Features
+
+### Traces Not Captured
+
+- Ensure `trace: 'on-first-retry'` or `trace: 'on'` is set in config
+- Check that `test-results/` directory exists and is writable
+- Verify test fails (traces only captured on retry/failure by default)
+
+### Logs Not Appearing
+
+- Check if running in CI (JSON format instead of colored output)
+- Set `DEBUG=charon:*` environment variable
+- Ensure `CI` environment variable is not set for local runs
+
+### Reporter Errors
+
+- Verify `tests/reporters/debug-reporter.ts` exists
+- Check TypeScript compilation errors: `npx tsc --noEmit`
+- Run with `--reporter=list` as fallback
+
+## Further Reading
+
+- [Playwright Debugging Docs](https://playwright.dev/docs/debug)
+- [Playwright Trace Viewer](https://playwright.dev/docs/trace-viewer)
+- [Test Reporters](https://playwright.dev/docs/test-reporters)
+- [Debugging in VS Code](https://playwright.dev/docs/debug#vs-code-debugger)
--- a/docs/testing/e2e-best-practices.md
+++ b/docs/testing/e2e-best-practices.md
@@ -0,0 +1,488 @@
+# E2E Testing Best Practices
+
+**Purpose**: Document patterns and anti-patterns discovered during E2E test optimization to prevent future performance regressions and cross-browser failures.
+
+**Target Audience**: Developers writing Playwright E2E tests for Charon.
+
+## Table of Contents
+
+- [Feature Flag Testing](#feature-flag-testing)
+- [Cross-Browser Locators](#cross-browser-locators)
+- [API Call Optimization](#api-call-optimization)
+- [Performance Budget](#performance-budget)
+- [Test Isolation](#test-isolation)
+
+---
+
+## Feature Flag Testing
+
+### ❌ AVOID: Polling in beforeEach Hooks
+
+**Anti-Pattern**:
+```typescript
+test.beforeEach(async ({ page, adminUser }) => {
+  await loginUser(page, adminUser);
+  await page.goto('/settings/system');
+
+  // ⚠️ PROBLEM: Runs before EVERY test
+  await waitForFeatureFlagPropagation(
+    page,
+    {
+      'cerberus.enabled': true,
+      'crowdsec.console_enrollment': false,
+    },
+    { timeout: 10000 } // 10s timeout per test
+  );
+});
+```
+
+**Why This Is Bad**:
+- Polls `/api/v1/feature-flags` endpoint **31 times** per test file (once per test)
+- With 12 parallel processes (4 shards × 3 browsers), causes API server bottleneck
+- Adds 310s minimum execution time per shard (31 tests × 10s timeout)
+- Most tests don't modify feature flags, so polling is unnecessary
+
+**Real Impact**: Test shards exceeded 30-minute GitHub Actions timeout limit, blocking CI/CD pipeline.
+
+---
+
+### ✅ PREFER: Per-Test Verification Only When Toggled
+
+**Correct Pattern**:
+```typescript
+test('should toggle Cerberus feature', async ({ page }) => {
+  await test.step('Navigate to system settings', async () => {
+    await page.goto('/settings/system');
+    await waitForLoadingComplete(page);
+  });
+
+  await test.step('Toggle Cerberus feature', async () => {
+    const toggle = page.getByRole('switch', { name: /cerberus/i });
+    const initialState = await toggle.isChecked();
+
+    await retryAction(async () => {
+      const response = await clickSwitchAndWaitForResponse(page, toggle, /\/feature-flags/);
+      expect(response.ok()).toBeTruthy();
+
+      // ✅ ONLY verify propagation AFTER toggling
+      await waitForFeatureFlagPropagation(page, {
+        'cerberus.enabled': !initialState,
+      });
+    });
+  });
+});
+```
+
+**Why This Is Better**:
+- API calls reduced by **90%** (from 31 per shard to 3-5 per shard)
+- Only tests that actually toggle flags incur the polling cost
+- Faster test execution (shards complete in <15 minutes vs >30 minutes)
+- Clearer test intent—verification is tied to the action that requires it
+
+**Rule of Thumb**:
+- **No toggle, no propagation check**: If a test reads flag state without changing it, don't poll.
+- **Toggle = verify**: Always verify propagation after toggling to ensure state change persisted.
+
+---
+
+## Cross-Browser Locators
+
+### ❌ AVOID: Label-Only Locators
+
+**Anti-Pattern**:
+```typescript
+await test.step('Verify Script path/command field appears', async () => {
+  // ⚠️ PROBLEM: Fails in Firefox/WebKit
+  const scriptField = page.getByLabel(/script.*path/i);
+  await expect(scriptField).toBeVisible({ timeout: 10000 });
+});
+```
+
+**Why This Fails**:
+- Label locators depend on browser-specific DOM rendering
+- Firefox/WebKit may render Label components differently than Chromium
+- Regex patterns may not match if label has extra whitespace or is split across nodes
+- Results in **70% pass rate** on Firefox/WebKit vs 100% on Chromium
+
+---
+
+### ✅ PREFER: Multi-Strategy Locators with Fallbacks
+
+**Correct Pattern**:
+```typescript
+import { getFormFieldByLabel } from './utils/ui-helpers';
+
+await test.step('Verify Script path/command field appears', async () => {
+  // ✅ Tries multiple strategies until one succeeds
+  const scriptField = getFormFieldByLabel(
+    page,
+    /script.*path/i,
+    {
+      placeholder: /dns-challenge\.sh/i,
+      fieldId: 'field-script_path'
+    }
+  );
+  await expect(scriptField.first()).toBeVisible();
+});
+```
+
+**Helper Implementation** (`tests/utils/ui-helpers.ts`):
+```typescript
+/**
+ * Get form field with cross-browser label matching
+ * Tries multiple strategies: label, placeholder, id, aria-label
+ *
+ * @param page - Playwright Page object
+ * @param labelPattern - Regex or string to match label text
+ * @param options - Fallback strategies (placeholder, fieldId)
+ * @returns Locator that works across Chromium, Firefox, and WebKit
+ */
+export function getFormFieldByLabel(
+  page: Page,
+  labelPattern: string | RegExp,
+  options: { placeholder?: string | RegExp; fieldId?: string } = {}
+): Locator {
+  const baseLocator = page.getByLabel(labelPattern);
+
+  // Build fallback chain
+  let locator = baseLocator;
+
+  if (options.placeholder) {
+    locator = locator.or(page.getByPlaceholder(options.placeholder));
+  }
+
+  if (options.fieldId) {
+    locator = locator.or(page.locator(`#${options.fieldId}`));
+  }
+
+  // Fallback: role + label text nearby
+  if (typeof labelPattern === 'string') {
+    locator = locator.or(
+      page.getByRole('textbox').filter({
+        has: page.locator(`label:has-text("${labelPattern}")`),
+      })
+    );
+  }
+
+  return locator;
+}
+```
+
+**Why This Is Better**:
+- **95%+ pass rate** on Firefox/WebKit (up from 70%)
+- Gracefully degrades through fallback strategies
+- No browser-specific workarounds needed in test code
+- Single helper enforces consistent pattern across all tests
+
+**When to Use**:
+- Any test that interacts with form fields
+- Tests that must pass on all three browsers (Chromium, Firefox, WebKit)
+- Accessibility-critical tests (label locators are user-facing)
+
+---
+
+## API Call Optimization
+
+### ❌ AVOID: Duplicate API Requests
+
+**Anti-Pattern**:
+```typescript
+// Multiple tests in parallel all polling the same endpoint
+test('test 1', async ({ page }) => {
+  await waitForFeatureFlagPropagation(page, { flag: true }); // API call
+});
+
+test('test 2', async ({ page }) => {
+  await waitForFeatureFlagPropagation(page, { flag: true }); // Duplicate API call
+});
+```
+
+**Why This Is Bad**:
+- 12 parallel workers all hit `/api/v1/feature-flags` simultaneously
+- No request coalescing or caching
+- API server degrades under concurrent load
+- Tests timeout due to slow responses
+
+---
+
+### ✅ PREFER: Request Coalescing with Worker Isolation
+
+**Correct Pattern** (`tests/utils/wait-helpers.ts`):
+```typescript
+// Cache in-flight requests per worker
+const inflightRequests = new Map<string, Promise<Record<string, boolean>>>();
+
+function generateCacheKey(
+  expectedFlags: Record<string, boolean>,
+  workerIndex: number
+): string {
+  // Sort keys to ensure {a:true, b:false} === {b:false, a:true}
+  const sortedFlags = Object.keys(expectedFlags)
+    .sort()
+    .reduce((acc, key) => {
+      acc[key] = expectedFlags[key];
+      return acc;
+    }, {} as Record<string, boolean>);
+
+  // Include worker index to isolate parallel processes
+  return `${workerIndex}:${JSON.stringify(sortedFlags)}`;
+}
+
+export async function waitForFeatureFlagPropagation(
+  page: Page,
+  expectedFlags: Record<string, boolean>,
+  options: FeatureFlagPropagationOptions = {}
+): Promise<Record<string, boolean>> {
+  const workerIndex = test.info().parallelIndex;
+  const cacheKey = generateCacheKey(expectedFlags, workerIndex);
+
+  // Return existing promise if already in flight
+  if (inflightRequests.has(cacheKey)) {
+    console.log(`[CACHE HIT] Worker ${workerIndex}: ${cacheKey}`);
+    return inflightRequests.get(cacheKey)!;
+  }
+
+  console.log(`[CACHE MISS] Worker ${workerIndex}: ${cacheKey}`);
+
+  // Poll API endpoint (existing logic)...
+}
+```
+
+**Why This Is Better**:
+- **30-40% reduction** in duplicate API calls
+- Multiple tests requesting same state share one API call
+- Worker isolation prevents cache collisions between parallel processes
+- Sorted keys ensure semantic equivalence (`{a:true, b:false}` === `{b:false, a:true}`)
+
+**Cache Behavior**:
+- **Hit**: Another test in same worker already polling for same state
+- **Miss**: First test in worker to request this state OR different state requested
+- **Clear**: Cache cleared after all tests in worker complete (`test.afterAll()`)
+
+---
+
+## Performance Budget
+
+### ❌ PROBLEM: Shards Exceeding Timeout
+
+**Symptom**:
+```bash
+# GitHub Actions logs
+Error: The operation was canceled.
+Job duration: 31m 45s (exceeds 30m limit)
+```
+
+**Root Causes**:
+1. Feature flag polling in beforeEach (31 tests × 10s = 310s minimum)
+2. API bottleneck under parallel load
+3. Slow browser startup in CI environment
+4. Network latency for external resources
+
+---
+
+### ✅ SOLUTION: Enforce 15-Minute Budget Per Shard
+
+**CI Configuration** (`.github/workflows/e2e-tests.yml`):
+```yaml
+- name: Verify shard performance budget
+  if: always()
+  run: |
+    SHARD_DURATION=$((SHARD_END - SHARD_START))
+    MAX_DURATION=900  # 15 minutes = 900 seconds
+
+    if [[ $SHARD_DURATION -gt $MAX_DURATION ]]; then
+      echo "::error::Shard exceeded performance budget: ${SHARD_DURATION}s > ${MAX_DURATION}s"
+      echo "::error::Investigate slow tests or API bottlenecks"
+      exit 1
+    fi
+
+    echo "✅ Shard completed within budget: ${SHARD_DURATION}s"
+```
+
+**Why This Is Better**:
+- **Early detection** of performance regressions in CI
+- Forces developers to optimize slow tests before merge
+- Prevents accumulation of "death by a thousand cuts" slowdowns
+- Clear failure message directs investigation to bottleneck
+
+**How to Debug Timeouts**:
+1. **Check metrics**: Review API call counts in test output
+   ```bash
+   grep "CACHE HIT\|CACHE MISS" test-output.log
+   ```
+2. **Profile locally**: Instrument slow helpers
+   ```typescript
+   const startTime = Date.now();
+   await waitForLoadingComplete(page);
+   console.log(`Loading took ${Date.now() - startTime}ms`);
+   ```
+3. **Isolate shard**: Run failing shard locally to reproduce
+   ```bash
+   npx playwright test --shard=2/4 --project=firefox
+   ```
+
+---
+
+## Test Isolation
+
+### ❌ AVOID: State Leakage Between Tests
+
+**Anti-Pattern**:
+```typescript
+test('enable Cerberus', async ({ page }) => {
+  await toggleCerberus(page, true);
+  // ⚠️ PROBLEM: Doesn't restore state
+});
+
+test('ACL settings require Cerberus', async ({ page }) => {
+  // Assumes Cerberus is enabled from previous test
+  await page.goto('/settings/acl');
+  // ❌ FLAKY: Fails if first test didn't run or failed
+});
+```
+
+**Why This Is Bad**:
+- Tests depend on execution order (serial execution works, parallel fails)
+- Flakiness when running with `--workers=4` or `--repeat-each=5`
+- Hard to debug failures (root cause is in different test file)
+
+---
+
+### ✅ PREFER: Explicit State Restoration
+
+**Correct Pattern**:
+```typescript
+test.afterEach(async ({ page }) => {
+  await test.step('Restore default feature flag state', async () => {
+    const defaultFlags = {
+      'cerberus.enabled': true,
+      'crowdsec.console_enrollment': false,
+      'uptime.enabled': false,
+    };
+
+    // Direct API call to reset flags (no polling needed)
+    for (const [flag, value] of Object.entries(defaultFlags)) {
+      await page.evaluate(async ({ flag, value }) => {
+        await fetch(`/api/v1/feature-flags/${flag}`, {
+          method: 'PUT',
+          headers: { 'Content-Type': 'application/json' },
+          body: JSON.stringify({ enabled: value }),
+        });
+      }, { flag, value });
+    }
+  });
+});
+```
+
+**Why This Is Better**:
+- **Zero inter-test dependencies**: Tests can run in any order
+- Passes randomization testing: `--repeat-each=5 --workers=4`
+- Explicit cleanup makes state management visible in code
+- Fast restoration (no polling required, direct API call)
+
+**Validation Command**:
+```bash
+# Verify test isolation with randomization
+npx playwright test tests/settings/system-settings.spec.ts \
+  --repeat-each=5 \
+  --workers=4 \
+  --project=chromium
+
+# Should pass consistently regardless of execution order
+```
+
+---
+
+## Robust Assertions for Dynamic Content
+
+### ❌ AVOID: Boolean Logic on Transient States
+
+**Anti-Pattern**:
+```typescript
+const hasEmptyMessage = await emptyCellMessage.isVisible().catch(() => false);
+const hasTable = await table.isVisible().catch(() => false);
+expect(hasEmptyMessage || hasTable).toBeTruthy();
+```
+
+**Why This Is Bad**:
+- Fails during the split second where neither element is fully visible (loading transitions).
+- Playwright's auto-retrying logic is bypassed by the `catch()` block.
+- Leads to flaky "false negatives" where both checks return false before content loads.
+
+### ✅ PREFER: Locator Composition with `.or()`
+
+**Correct Pattern**:
+```typescript
+await expect(
+  page.getByRole('table').or(page.getByText(/no.*certificates.*found/i))
+).toBeVisible({ timeout: 10000 });
+```
+
+**Why This Is Better**:
+- Leverages Playwright's built-in **auto-retry** mechanism.
+- Waits for *either* condition to become true.
+- Handles loading spinners and layout shifts gracefully.
+- Reduces boilerplate code.
+
+---
+
+## Resilient Actions
+
+### ❌ AVOID: Fixed Timeouts or Custom Loops
+
+**Anti-Pattern**:
+```typescript
+// Flaky custom retry loop
+for (let i = 0; i < 3; i++) {
+  try {
+    await action();
+    break;
+  } catch (e) {
+    await page.waitForTimeout(1000);
+  }
+}
+```
+
+### ✅ PREFER: `.toPass()` for Verification Loops
+
+**Correct Pattern**:
+```typescript
+await expect(async () => {
+  const response = await request.post('/endpoint');
+  expect(response.ok()).toBeTruthy();
+}).toPass({
+  intervals: [1000, 2000, 5000],
+  timeout: 15_000
+});
+```
+
+**Why This Is Better**:
+- Built-in assertion retry logic.
+- Configurable backoff intervals.
+- Cleaner syntax for verifying eventual success (e.g. valid API response after background processing).
+
+---
+
+## Summary Checklist
+
+Before writing E2E tests, verify:
+
+- [ ] **Feature flags**: Only poll after toggling, not in beforeEach
+- [ ] **Locators**: Use `getFormFieldByLabel()` for form fields
+- [ ] **API calls**: Check for cache hit/miss logs, expect >30% hit rate
+- [ ] **Performance**: Local execution <5 minutes, CI shard <15 minutes
+- [ ] **Isolation**: Add `afterEach` cleanup if test modifies state
+- [ ] **Cross-browser**: Test passes on all three browsers (Chromium, Firefox, WebKit)
+
+---
+
+## References
+
+- **Implementation Details**: See `docs/plans/current_spec.md` (Fix 3.3)
+- **Helper Library**: `tests/utils/ui-helpers.ts`
+- **Playwright Config**: `playwright.config.js`
+- **CI Workflow**: `.github/workflows/e2e-tests.yml`
+
+---
+
+**Last Updated**: 2026-02-02
--- a/docs/testing/e2e-dns-provider-triage-report.md
+++ b/docs/testing/e2e-dns-provider-triage-report.md
@@ -0,0 +1,251 @@
+# DNS Provider E2E Test Triage Report
+
+**Date**: 2026-01-15
+**Agent**: QA_Security
+**Phase**: Phase 4 — E2E Coverage + Regression Safety
+
+## Executive Summary
+
+Successfully triaged and fixed Playwright E2E tests for the DNS Provider feature. All tests now pass with 52 tests passing and 3 conditionally skipped (expected behavior).
+
+## Test Results
+
+### Before Fixes
+| Status | Count |
+|--------|-------|
+| ❌ Failed | 7 |
+| ✅ Passed | 45 |
+| ⏭️ Skipped | 3 |
+
+### After Fixes
+| Status | Count |
+|--------|-------|
+| ❌ Failed | 0 |
+| ✅ Passed | 52 |
+| ⏭️ Skipped | 3 |
+
+## Test Files Summary
+
+### 1. `tests/auth.setup.ts`
+| Test | Status |
+|------|--------|
+| authenticate | ✅ Pass |
+
+### 2. `tests/dns-provider-types.spec.ts`
+**API Tests:**
+| Test | Status |
+|------|--------|
+| GET /dns-providers/types returns all built-in and custom providers | ✅ Pass |
+| Each provider type has required fields | ✅ Pass |
+| Manual provider type has correct configuration | ✅ Pass |
+| Webhook provider type has URL field | ✅ Pass |
+| RFC2136 provider type has server and key fields | ✅ Pass |
+| Script provider type has command/path field | ✅ Pass |
+
+**UI Tests:**
+| Test | Status |
+|------|--------|
+| Provider selector shows all provider types in dropdown | ✅ Pass |
+| Provider selector displays provider description | ✅ Pass |
+| Provider types keyboard navigation | ✅ Pass (Fixed) |
+| Manual type selection shows correct fields | ✅ Pass |
+| Webhook type selection shows URL field | ✅ Pass (Fixed) |
+| RFC2136 type selection shows server field | ✅ Pass (Fixed) |
+| Script type selection shows script path field | ✅ Pass |
+
+### 3. `tests/dns-provider-crud.spec.ts`
+**Create Provider:**
+| Test | Status |
+|------|--------|
+| Create Manual DNS provider | ✅ Pass |
+| Create Webhook DNS provider | ✅ Pass |
+| Validation errors for missing required fields | ✅ Pass |
+| Validate webhook URL format | ✅ Pass |
+
+**Provider List:**
+| Test | Status |
+|------|--------|
+| Display provider list or empty state | ✅ Pass |
+| Show Add Provider button | ✅ Pass |
+| Show provider details in list | ✅ Pass |
+
+**Edit Provider:**
+| Test | Status |
+|------|--------|
+| Open edit dialog for existing provider | ⏭️ Skipped (conditional) |
+| Update provider name | ⏭️ Skipped (conditional) |
+
+**Delete Provider:**
+| Test | Status |
+|------|--------|
+| Show delete confirmation dialog | ⏭️ Skipped (conditional) |
+
+**API Operations:**
+| Test | Status |
+|------|--------|
+| List providers via API | ✅ Pass |
+| Create provider via API | ✅ Pass |
+| Reject invalid provider type via API | ✅ Pass |
+| Get single provider via API | ✅ Pass |
+
+**Form Accessibility:**
+| Test | Status |
+|------|--------|
+| Form has accessible labels | ✅ Pass |
+| Keyboard navigation in form | ✅ Pass |
+| Errors announced to screen readers | ✅ Pass |
+
+### 4. `tests/manual-dns-provider.spec.ts`
+**Provider Selection Flow:**
+| Test | Status |
+|------|--------|
+| Navigate to DNS Providers page | ✅ Pass |
+| Show Add Provider button on DNS Providers page | ✅ Pass (Fixed) |
+| Display Manual option in provider selection | ✅ Pass (Fixed) |
+
+**Manual Challenge UI Display:**
+| Test | Status |
+|------|--------|
+| Display challenge panel with required elements | ✅ Pass |
+| Show record name and value fields | ✅ Pass |
+| Display progress bar with time remaining | ✅ Pass |
+| Display status indicator | ✅ Pass (Fixed) |
+
+**Copy to Clipboard:**
+| Test | Status |
+|------|--------|
+| Have accessible copy buttons | ✅ Pass |
+| Show copied feedback on click | ✅ Pass |
+
+**Verify Button Interactions:**
+| Test | Status |
+|------|--------|
+| Have Check DNS Now button | ✅ Pass |
+| Show loading state when checking DNS | ✅ Pass |
+| Have Verify button with description | ✅ Pass |
+
+**Accessibility Checks:**
+| Test | Status |
+|------|--------|
+| Keyboard accessible interactive elements | ✅ Pass |
+| Proper ARIA labels on copy buttons | ✅ Pass |
+| Announce status changes to screen readers | ✅ Pass |
+| Accessible form labels | ✅ Pass (Fixed) |
+| Validate accessibility tree structure | ✅ Pass (Fixed) |
+
+**Component Tests:**
+| Test | Status |
+|------|--------|
+| Render all required challenge information | ✅ Pass |
+| Handle expired challenge state | ✅ Pass |
+| Handle verified challenge state | ✅ Pass |
+
+**Error Handling:**
+| Test | Status |
+|------|--------|
+| Display error message on verification failure | ✅ Pass |
+| Handle network errors gracefully | ✅ Pass |
+
+## Issues Fixed
+
+### 1. URL Path Mismatch
+**Issue**: `manual-dns-provider.spec.ts` used `/dns-providers` URL while the frontend uses `/dns/providers`.
+
+**Fix**: Updated all occurrences to use `/dns/providers`.
+
+**Files Changed**: `tests/manual-dns-provider.spec.ts`
+
+### 2. Button Selector Too Strict
+**Issue**: Tests used `getByRole('button', { name: /add provider/i })` without `.first()` which failed when multiple buttons matched.
+
+**Fix**: Added `.first()` to handle both header button and empty state button.
+
+### 3. Dropdown Search Filter Test
+**Issue**: Test tried to fill text into a combobox that doesn't support text input.
+
+**Fix**: Changed test to verify keyboard navigation works instead.
+
+**File**: `tests/dns-provider-types.spec.ts`
+
+### 4. Dynamic Field Locators
+**Issue**: Tests used `getByLabel(/url/i)` but credential fields are rendered dynamically without proper labels.
+
+**Fix**: Changed to locate fields by label text followed by input structure.
+
+**Files Changed**: `tests/dns-provider-types.spec.ts`
+
+### 5. Conditional Status Icon Test
+**Issue**: Test expected SVG icon in status indicator but icon may not always be present.
+
+**Fix**: Made icon check conditional.
+
+**File**: `tests/manual-dns-provider.spec.ts`
+
+## Skipped Tests (Expected)
+
+The following tests are conditionally skipped when no providers with edit/delete capabilities exist:
+
+1. `should open edit dialog for existing provider`
+2. `should update provider name`
+3. `should show delete confirmation dialog`
+
+This is expected behavior — these tests only run when provider cards with edit/delete buttons are present.
+
+## Test Fixtures Created
+
+Created `tests/fixtures/dns-providers.ts` with:
+- Mock provider types (built-in and custom)
+- Mock provider data for different types
+- Mock API responses
+- Mock manual challenge data
+- Helper functions for test provider creation/cleanup
+
+## Recommendations
+
+### Next Steps
+
+1. **Enable Edit/Delete Tests**: Add test data setup to ensure providers with edit buttons exist before running edit/delete tests.
+
+2. **Add Plugin Provider Tests**: When external plugins are loaded, add tests for plugin-specific provider types (e.g., PowerDNS).
+
+3. **Expand Accessibility Tests**: Add more accessibility tests for:
+   - Focus trap in dialog
+   - Screen reader announcements for success/error states
+   - High contrast mode support
+
+4. **Add Visual Regression Tests**: Consider adding visual regression tests for provider cards and forms.
+
+### Known Limitations
+
+1. **Dynamic Fields**: Credential fields are rendered dynamically from the API response. Tests rely on label text patterns rather than stable IDs.
+
+2. **Mock Challenge Panel**: The manual challenge panel tests use conditional checks since the challenge UI requires an active certificate issuance.
+
+3. **No Real Plugin Tests**: Tests for external plugin providers require actual `.so` files to be loaded.
+
+## Verification Command
+
+```bash
+# Run all DNS Provider E2E tests
+PLAYWRIGHT_BASE_URL=http://localhost:8080 npm run e2e
+
+# Or with the E2E Docker environment
+docker compose -f .docker/compose/docker-compose.e2e.yml up -d
+PLAYWRIGHT_BASE_URL=http://localhost:8080 npm run e2e
+```
+
+## Test Coverage Summary
+
+| Category | Tests | Passing |
+|----------|-------|---------|
+| API Endpoints | 10 | 10 |
+| UI Navigation | 6 | 6 |
+| Provider CRUD | 8 | 5 (+3 conditional) |
+| Manual Challenge | 11 | 11 |
+| Accessibility | 9 | 9 |
+| Error Handling | 2 | 2 |
+| **Total** | **55** | **52 (+3 conditional)** |
+
+---
+
+**Status**: ✅ Phase 4 E2E Test Triage Complete
--- a/docs/testing/e2e-test-writing-guide.md
+++ b/docs/testing/e2e-test-writing-guide.md
@@ -0,0 +1,504 @@
+# E2E Test Writing Guide
+
+**Last Updated**: February 2, 2026
+
+This guide provides best practices for writing maintainable, performant, and cross-browser compatible Playwright E2E tests for Charon.
+
+---
+
+## Table of Contents
+
+- [Cross-Browser Compatibility](#cross-browser-compatibility)
+- [Performance Best Practices](#performance-best-practices)
+- [Feature Flag Testing](#feature-flag-testing)
+- [Test Isolation](#test-isolation)
+- [Common Patterns](#common-patterns)
+- [Troubleshooting](#troubleshooting)
+
+---
+
+## Cross-Browser Compatibility
+
+### Why It Matters
+
+Charon E2E tests run across **Chromium**, **Firefox**, and **WebKit** (Safari engine). Browser differences in how they handle label association, form controls, and DOM queries can cause tests to pass in one browser but fail in others.
+
+**Phase 2 Fix**: The `getFormFieldByLabel()` helper was added to address cross-browser label matching inconsistencies.
+
+### Problem: Browser-Specific Label Handling
+
+Different browsers handle `getByLabel()` differently:
+
+- **Chromium**: Lenient label matching, searches visible text aggressively
+- **Firefox**: Stricter matching, requires explicit `for` attribute or nesting
+- **WebKit**: Strictest, often fails on complex label structures
+
+**Example Failure**:
+
+```typescript
+// ❌ FRAGILE: Fails in Firefox/WebKit when label structure is complex
+const scriptPath = page.getByLabel(/script.*path/i);
+await scriptPath.fill('/path/to/script.sh');
+```
+
+**Error (Firefox/WebKit)**:
+```
+TimeoutError: locator.fill: Timeout 5000ms exceeded.
+=========================== logs ===========================
+waiting for getByLabel(/script.*path/i)
+============================================================
+```
+
+### Solution: Multi-Tier Fallback Strategy
+
+Use the `getFormFieldByLabel()` helper for robust cross-browser field location:
+
+```typescript
+import { getFormFieldByLabel } from '../utils/ui-helpers';
+
+// ✅ ROBUST: 4-tier fallback strategy
+const scriptPath = getFormFieldByLabel(
+  page,
+  /script.*path/i,
+  {
+    placeholder: /dns-challenge\.sh/i,
+    fieldId: 'field-script_path'
+  }
+);
+await scriptPath.fill('/path/to/script.sh');
+```
+
+**Fallback Chain**:
+
+1. **Primary**: `getByLabel(labelPattern)` — Standard label association
+2. **Fallback 1**: `getByPlaceholder(options.placeholder)` — Placeholder text match
+3. **Fallback 2**: `locator('#' + options.fieldId)` — Direct ID selector
+4. **Fallback 3**: Role-based with label proximity — `getByRole('textbox')` near label text
+
+### When to Use `getFormFieldByLabel()`
+
+✅ **Use when**:
+- Form fields have complex label structures (nested elements, icons, tooltips)
+- Tests fail in Firefox/WebKit but pass in Chromium
+- Label text is dynamic or internationalized
+- Multiple fields have similar labels
+
+❌ **Don't use when**:
+- Standard `getByLabel()` works reliably across all browsers
+- Field has a unique `data-testid` or `name` attribute
+- Field is the only one of its type on the page
+
+---
+
+## Performance Best Practices
+
+### Avoid Unnecessary API Polling
+
+**Problem**: Excessive API polling adds latency and increases flakiness.
+
+**Before Phase 2 (❌ Inefficient)**:
+
+```typescript
+test.beforeEach(async ({ page }) => {
+  await page.goto('/settings/system');
+
+  // ❌ BAD: Polls API even when flags are already correct
+  await waitForFeatureFlagPropagation(page, {
+    'cerberus.enabled': false,
+    'crowdsec.enabled': false
+  });
+});
+
+test('Enable Cerberus', async ({ page }) => {
+  const toggle = page.getByRole('switch', { name: /cerberus/i });
+  await clickSwitch(toggle);
+
+  // ❌ BAD: Another full polling cycle
+  await waitForFeatureFlagPropagation(page, {
+    'cerberus.enabled': true
+  });
+});
+```
+
+**After Phase 2 (✅ Optimized)**:
+
+```typescript
+test.afterEach(async ({ page, request }) => {
+  // ✅ GOOD: Cleanup once at the end
+  await request.post('/api/v1/settings/restore', {
+    data: { module: 'system', defaults: true }
+  });
+});
+
+test('Enable Cerberus', async ({ page }) => {
+  const toggle = page.getByRole('switch', { name: /cerberus/i });
+
+  await test.step('Toggle Cerberus on', async () => {
+    await clickSwitch(toggle);
+
+    // ✅ GOOD: Only poll when state changes
+    await waitForFeatureFlagPropagation(page, {
+      'cerberus.enabled': true
+    });
+  });
+
+  await test.step('Verify toggle reflects new state', async () => {
+    await expectSwitchState(toggle, true);
+  });
+});
+```
+
+### How Conditional Polling Works
+
+The `waitForFeatureFlagPropagation()` helper includes an **early-exit optimization** (Phase 2 Fix 2.3):
+
+```typescript
+// Before polling, check if flags are already in expected state
+const currentState = await page.evaluate(async () => {
+  const res = await fetch('/api/v1/feature-flags');
+  return res.json();
+});
+
+if (alreadyMatches(currentState, expectedFlags)) {
+  console.log('[POLL] Already in expected state - skipping poll');
+  return currentState; // Exit immediately
+}
+
+// Otherwise, start polling...
+```
+
+**Performance Impact**: ~50% reduction in polling iterations for tests that restore defaults in `afterEach`.
+
+### Request Coalescing (Worker Isolation)
+
+**Problem**: Parallel Playwright workers polling the same flag state cause redundant API calls.
+
+**Solution**: The helper caches in-flight requests per worker:
+
+```typescript
+// Worker 1: Waits for {cerberus: false, crowdsec: false}
+// Worker 2: Waits for {cerberus: false, crowdsec: false}
+
+// Without coalescing: 2 separate polling loops (30+ API calls)
+// With coalescing: 1 shared promise (15 API calls, cached per worker)
+```
+
+**Cache Key Format**:
+```
+[worker_index]:[sorted_flags_json]
+```
+
+**Example**:
+```
+Worker 0: "0:{\"feature.cerberus.enabled\":false,\"feature.crowdsec.enabled\":false}"
+Worker 1: "1:{\"feature.cerberus.enabled\":false,\"feature.crowdsec.enabled\":false}"
+```
+
+---
+
+## Feature Flag Testing
+
+### When to Use `waitForFeatureFlagPropagation()`
+
+✅ **Use when**:
+- A test **toggles** a feature flag via the UI
+- Backend state changes and you need to verify propagation
+- Test depends on a specific flag state being active
+
+❌ **Don't use when**:
+- Setting up initial state in `beforeEach` (use API directly instead)
+- Flags haven't changed since last verification
+- Test doesn't modify flags
+
+### Pattern: Cleanup in `afterEach`
+
+**Best Practice**: Restore defaults at the end, not the beginning.
+
+```typescript
+test.describe('System Settings', () => {
+  test.afterEach(async ({ request }) => {
+    // Restore all defaults once
+    await request.post('/api/v1/settings/restore', {
+      data: { module: 'system', defaults: true }
+    });
+  });
+
+  test('Enable and disable Cerberus', async ({ page }) => {
+    await page.goto('/settings/system');
+
+    const toggle = page.getByRole('switch', { name: /cerberus/i });
+
+    // Test starts from whatever state exists (defaults expected)
+    await clickSwitch(toggle);
+    await waitForFeatureFlagPropagation(page, { 'cerberus.enabled': true });
+
+    await clickSwitch(toggle);
+    await waitForFeatureFlagPropagation(page, { 'cerberus.enabled': false });
+  });
+});
+```
+
+**Why This Works**:
+- Each test starts from known defaults (restored by previous test's `afterEach`)
+- No unnecessary polling in `beforeEach`
+- Cleanup happens once, not N times per describe block
+
+### Handling Config Reload Overlay
+
+When toggling security features (Cerberus, ACL, WAF), Caddy reloads its configuration. A blocking overlay prevents interactions during this reload.
+
+**Helper Handles This Automatically**:
+
+```typescript
+export async function waitForFeatureFlagPropagation(...) {
+  // ✅ Wait for overlay to disappear before polling
+  const overlay = page.locator('[data-testid="config-reload-overlay"]');
+  await overlay.waitFor({ state: 'hidden', timeout: 10000 })
+    .catch(() => {});
+
+  // Now safe to poll API...
+}
+```
+
+**You don't need to manually wait for the overlay** — it's handled by:
+- `clickSwitch()`
+- `clickAndWaitForResponse()`
+- `waitForFeatureFlagPropagation()`
+
+---
+
+## Test Isolation
+
+### Why Isolation Matters
+
+Tests running in parallel can interfere with each other if they:
+- Share mutable state (database, config files, feature flags)
+- Don't clean up resources
+- Rely on global defaults
+
+**Phase 2 Fix**: Added explicit `afterEach` cleanup to restore defaults.
+
+### Pattern: Isolated Flag Toggles
+
+**Before (❌ Not Isolated)**:
+
+```typescript
+test('Test A', async ({ page }) => {
+  // Enable Cerberus
+  // ...
+  // ❌ Leaves flag enabled for next test
+});
+
+test('Test B', async ({ page }) => {
+  // Assumes Cerberus is disabled
+  // ❌ May fail if Test A ran first
+});
+```
+
+**After (✅ Isolated)**:
+
+```typescript
+test.afterEach(async ({ request }) => {
+  await request.post('/api/v1/settings/restore', {
+    data: { module: 'system', defaults: true }
+  });
+});
+
+test('Test A', async ({ page }) => {
+  // Enable Cerberus
+  // ...
+  // ✅ Cleanup restores defaults after test
+});
+
+test('Test B', async ({ page }) => {
+  // ✅ Starts from known defaults
+});
+```
+
+### Cleanup Order of Operations
+
+```
+1. Test A runs → modifies state
+2. Test A finishes → afterEach runs → restores defaults
+3. Test B runs → starts from defaults
+4. Test B finishes → afterEach runs → restores defaults
+```
+
+---
+
+## Common Patterns
+
+### Toggle Feature Flag
+
+```typescript
+test('Enable and verify feature', async ({ page }) => {
+  await page.goto('/settings/system');
+
+  const toggle = page.getByRole('switch', { name: /feature name/i });
+
+  await test.step('Enable feature', async () => {
+    await clickSwitch(toggle);
+    await waitForFeatureFlagPropagation(page, { 'feature.enabled': true });
+  });
+
+  await test.step('Verify UI reflects state', async () => {
+    await expectSwitchState(toggle, true);
+    await expect(page.getByText(/feature active/i)).toBeVisible();
+  });
+});
+```
+
+### Form Field with Cross-Browser Locator
+
+```typescript
+test('Fill DNS provider config', async ({ page }) => {
+  await page.goto('/dns-providers/new');
+
+  await test.step('Select provider type', async () => {
+    await page.getByRole('combobox', { name: /type/i }).click();
+    await page.getByRole('option', { name: /manual/i }).click();
+  });
+
+  await test.step('Fill script path', async () => {
+    const scriptPath = getFormFieldByLabel(
+      page,
+      /script.*path/i,
+      {
+        placeholder: /dns-challenge\.sh/i,
+        fieldId: 'field-script_path'
+      }
+    );
+    await scriptPath.fill('/usr/local/bin/dns-challenge.sh');
+  });
+});
+```
+
+### Wait for API Response After Action
+
+```typescript
+test('Create resource and verify', async ({ page }) => {
+  await page.goto('/resources');
+
+  const createBtn = page.getByRole('button', { name: /create/i });
+
+  const response = await clickAndWaitForResponse(
+    page,
+    createBtn,
+    /\/api\/v1\/resources/,
+    { status: 201 }
+  );
+
+  expect(response.ok()).toBeTruthy();
+
+  const json = await response.json();
+  await expect(page.getByText(json.name)).toBeVisible();
+});
+```
+
+---
+
+## Troubleshooting
+
+### Test Fails in Firefox/WebKit, Passes in Chromium
+
+**Symptom**: `TimeoutError: locator.fill: Timeout 5000ms exceeded`
+
+**Cause**: Label matching strategy differs between browsers.
+
+**Fix**: Use `getFormFieldByLabel()` with fallbacks:
+
+```typescript
+// ❌ BEFORE
+await page.getByLabel(/field name/i).fill('value');
+
+// ✅ AFTER
+const field = getFormFieldByLabel(page, /field name/i, {
+  placeholder: /enter value/i
+});
+await field.fill('value');
+```
+
+### Feature Flag Polling Times Out
+
+**Symptom**: `Feature flag propagation timeout after 120 attempts (60000ms)`
+
+**Causes**:
+1. Backend not updating flags
+2. Config reload overlay blocking UI
+3. Database transaction not committed
+
+**Fix Steps**:
+1. Check backend logs: Does PUT `/api/v1/feature-flags` succeed?
+2. Check overlay state: Is `[data-testid="config-reload-overlay"]` stuck visible?
+3. Increase timeout temporarily: `waitForFeatureFlagPropagation(page, flags, { timeout: 120000 })`
+4. Add retry wrapper: Use `retryAction()` for transient failures
+
+```typescript
+await retryAction(async () => {
+  await clickSwitch(toggle);
+  await waitForFeatureFlagPropagation(page, { 'flag': true });
+}, { maxAttempts: 3, baseDelay: 2000 });
+```
+
+### Switch Click Intercepted
+
+**Symptom**: `Error: Element is not visible` or `click intercepted by overlay`
+
+**Cause**: Config reload overlay or sticky header blocking interaction.
+
+**Fix**: Use `clickSwitch()` helper (handles overlay automatically):
+
+```typescript
+// ❌ BEFORE
+await page.getByRole('switch').click({ force: true }); // Bad!
+
+// ✅ AFTER
+await clickSwitch(page.getByRole('switch', { name: /feature/i }));
+```
+
+### Test Pollution (Fails When Run in Suite, Passes Alone)
+
+**Symptom**: Test passes when run solo (`--grep`), fails in full suite.
+
+**Cause**: Previous test left state modified (flags enabled, resources created).
+
+**Fix**: Add cleanup in `afterEach`:
+
+```typescript
+test.afterEach(async ({ request }) => {
+  // Restore defaults
+  await request.post('/api/v1/settings/restore', {
+    data: { module: 'system', defaults: true }
+  });
+});
+```
+
+---
+
+## Reference
+
+### Helper Functions
+
+| Helper | Purpose | File |
+|--------|---------|------|
+| `getFormFieldByLabel()` | Cross-browser form field locator | `tests/utils/ui-helpers.ts` |
+| `clickSwitch()` | Reliable switch/toggle interaction | `tests/utils/ui-helpers.ts` |
+| `expectSwitchState()` | Assert switch checked state | `tests/utils/ui-helpers.ts` |
+| `waitForFeatureFlagPropagation()` | Poll for flag state | `tests/utils/wait-helpers.ts` |
+| `clickAndWaitForResponse()` | Atomic click + wait | `tests/utils/wait-helpers.ts` |
+| `retryAction()` | Retry with exponential backoff | `tests/utils/wait-helpers.ts` |
+
+### Best Practices Summary
+
+1. ✅ **Cross-Browser**: Use `getFormFieldByLabel()` for complex label structures
+2. ✅ **Performance**: Only poll when flags change, not in `beforeEach`
+3. ✅ **Isolation**: Restore defaults in `afterEach`, not `beforeEach`
+4. ✅ **Reliability**: Use semantic locators (`getByRole`, `getByLabel`) over CSS selectors
+5. ✅ **Debugging**: Use `test.step()` for clear failure context
+
+---
+
+**See Also**:
+- [Testing README](./README.md) — Quick reference and debugging guide
+- [Switch Component Testing](./README.md#-switchtoggle-component-testing) — Detailed switch patterns
+- [Debugging Guide](./debugging-guide.md) — Troubleshooting slow/flaky tests
--- a/docs/testing/security-helpers.md
+++ b/docs/testing/security-helpers.md
@@ -0,0 +1,143 @@
+# Security Test Helpers
+
+Helper utilities for managing security module state during E2E tests.
+
+## Overview
+
+The security helpers module (`tests/utils/security-helpers.ts`) provides utilities for:
+
+- Capturing and restoring security module state
+- Toggling individual security modules (ACL, WAF, Rate Limiting, CrowdSec)
+- Ensuring test isolation without ACL deadlock
+
+## Problem Solved
+
+During E2E testing, if ACL is left enabled from a previous test run (e.g., due to test failure), it creates a **deadlock**:
+
+1. ACL blocks API requests → returns 403 Forbidden
+2. Global cleanup can't run → API blocked
+3. Auth setup fails → tests skip
+4. Manual intervention required to reset volumes
+
+The security helpers solve this by using Playwright's `test.afterAll()` fixture to guarantee cleanup even when tests fail.
+
+## Usage
+
+### Capture and Restore Pattern
+
+```typescript
+import { captureSecurityState, restoreSecurityState } from '../utils/security-helpers';
+import { request } from '@playwright/test';
+
+let originalState;
+
+test.beforeAll(async ({ request: reqFixture }) => {
+  originalState = await captureSecurityState(reqFixture);
+});
+
+test.afterAll(async () => {
+  const cleanup = await request.newContext({ baseURL: '...' });
+  try {
+    await restoreSecurityState(cleanup, originalState);
+  } finally {
+    await cleanup.dispose();
+  }
+});
+```
+
+### Toggle Security Module
+
+```typescript
+import { setSecurityModuleEnabled } from '../utils/security-helpers';
+
+await setSecurityModuleEnabled(request, 'acl', true);
+await setSecurityModuleEnabled(request, 'waf', false);
+```
+
+### With Guaranteed Cleanup
+
+```typescript
+import { withSecurityEnabled } from '../utils/security-helpers';
+
+test.describe('ACL Tests', () => {
+  let cleanup: () => Promise<void>;
+
+  test.beforeAll(async ({ request }) => {
+    cleanup = await withSecurityEnabled(request, { acl: true, cerberus: true });
+  });
+
+  test.afterAll(async () => {
+    await cleanup();
+  });
+
+  test('should enforce ACL', async ({ page }) => {
+    // ACL is now enabled, test enforcement
+  });
+});
+```
+
+## Functions
+
+| Function | Purpose |
+|----------|---------|
+| `getSecurityStatus` | Fetch current security module states |
+| `setSecurityModuleEnabled` | Toggle a specific module on/off |
+| `captureSecurityState` | Snapshot all module states |
+| `restoreSecurityState` | Restore to captured snapshot |
+| `withSecurityEnabled` | Enable modules with guaranteed cleanup |
+| `disableAllSecurityModules` | Emergency reset |
+
+## API Endpoints Used
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/api/v1/security/status` | GET | Returns current state of all security modules |
+| `/api/v1/settings` | POST | Toggle settings with `{ key: "...", value: "true/false" }` |
+
+## Settings Keys
+
+| Key | Values | Description |
+|-----|--------|-------------|
+| `security.acl.enabled` | `"true"` / `"false"` | Toggle ACL enforcement |
+| `security.waf.enabled` | `"true"` / `"false"` | Toggle WAF enforcement |
+| `security.rate_limit.enabled` | `"true"` / `"false"` | Toggle Rate Limiting |
+| `security.crowdsec.enabled` | `"true"` / `"false"` | Toggle CrowdSec |
+| `feature.cerberus.enabled` | `"true"` / `"false"` | Master toggle for all security |
+
+## Best Practices
+
+1. **Always use `test.afterAll`** for cleanup - it runs even when tests fail
+2. **Capture state before modifying** - enables precise restoration
+3. **Enable Cerberus first** - it's the master toggle for all security modules
+4. **Don't toggle back in individual tests** - let `afterAll` handle cleanup
+5. **Use `withSecurityEnabled`** for the cleanest pattern
+
+## Troubleshooting
+
+### ACL Deadlock Recovery
+
+If the test suite is stuck due to ACL deadlock:
+
+```bash
+# Check current security status
+curl http://localhost:8080/api/v1/security/status
+
+# Manually disable ACL (requires auth)
+curl -X POST http://localhost:8080/api/v1/settings \
+  -H "Content-Type: application/json" \
+  -d '{"key": "security.acl.enabled", "value": "false"}'
+```
+
+### Complete Reset
+
+Use `disableAllSecurityModules` in global setup to ensure clean slate:
+
+```typescript
+import { disableAllSecurityModules } from './utils/security-helpers';
+
+async function globalSetup() {
+  const context = await request.newContext({ baseURL: '...' });
+  await disableAllSecurityModules(context);
+  await context.dispose();
+}
+```
--- a/docs/testing/sprint1-improvements.md
+++ b/docs/testing/sprint1-improvements.md
@@ -0,0 +1,50 @@
+# Sprint 1: E2E Test Improvements
+
+*Last Updated: February 2, 2026*
+
+## What We Fixed
+
+During Sprint 1, we resolved critical issues affecting E2E test reliability and performance.
+
+### Problem: Tests Were Timing Out
+
+**What was happening**: Some tests would hang indefinitely or timeout after 30 seconds, especially in CI/CD pipelines.
+
+**Root cause**:
+- Config reload overlay was blocking test interactions
+- Feature flag propagation was too slow during high load
+- API polling happened unnecessarily for every test
+
+**What we did**:
+1. Added smart detection to wait for config reloads to complete
+2. Increased timeouts to accommodate slower environments
+3. Implemented request caching to reduce redundant API calls
+
+**Result**: Test pass rate increased from 96% to 100% ✅
+
+### Performance Improvements
+
+- **Before**: System settings tests took 23 minutes
+- **After**: Same tests now complete in 16 minutes
+- **Improvement**: 31% faster execution
+
+### What You'll Notice
+
+- Tests are more reliable and less likely to fail randomly
+- CI/CD pipelines complete faster
+- Fewer "Test timeout" errors in GitHub Actions logs
+
+### For Developers
+
+If you're writing new E2E tests, the helpers in `tests/utils/wait-helpers.ts` and `tests/utils/ui-helpers.ts` now automatically handle:
+
+- Config reload overlays
+- Feature flag propagation
+- Switch component interactions
+
+Follow the examples in `tests/settings/system-settings.spec.ts` for best practices.
+
+## Need Help?
+
+- See [E2E Testing Troubleshooting Guide](../troubleshooting/e2e-tests.md)
+- Review [Testing Best Practices](../testing/README.md)