Charon/docs/issues/created/20260202-manual-test-e2e-feature-flags.md

# Manual Test Plan: E2E Feature Flags Timeout Fix

**Created:** 2026-02-02
**Priority:** P1 - High
**Type:** Manual Testing
**Component:** E2E Tests, Feature Flags API
**Related PR:** #583

---

## Objective

Manually verify the E2E test timeout fix implementation works correctly in a real CI environment after resolving the Playwright infrastructure issue.

## Prerequisites

- [ ] Playwright deduplication issue resolved: `rm -rf node_modules && npm install && npm dedupe`
- [ ] E2E container rebuilt: `.github/skills/scripts/skill-runner.sh docker-rebuild-e2e`
- [ ] Container health check passing: `docker ps` shows `charon-e2e` as healthy

## Test Scenarios

### 1. Feature Flag Toggle Tests (Chromium)

**File:** `tests/settings/system-settings.spec.ts`

**Execute:**
```bash
npx playwright test tests/settings/system-settings.spec.ts --project=chromium --workers=1 --retries=0
```

**Expected Results:**
- [ ] All 7 tests pass (4 refactored + 3 new)
- [ ] Zero timeout errors
- [ ] Test execution time: ≤5s per test
- [ ] Console shows retry attempts (if transient failures occur)

**Tests to Validate:**
1. [ ] `should toggle Cerberus security feature`
2. [ ] `should toggle CrowdSec console enrollment`
3. [ ] `should toggle uptime monitoring`
4. [ ] `should persist feature toggle changes`
5. [ ] `should handle concurrent toggle operations`
6. [ ] `should retry on 500 Internal Server Error`
7. [ ] `should fail gracefully after max retries exceeded`

### 2. Cross-Browser Validation

**Execute:**
```bash
npx playwright test tests/settings/system-settings.spec.ts --project=chromium --project=firefox --project=webkit
```

**Expected Results:**
- [ ] All browsers pass: Chromium, Firefox, WebKit
- [ ] No browser-specific timeout issues
- [ ] Consistent behavior across browsers

### 3. Performance Metrics Extraction

**Execute:**
```bash
docker logs charon-e2e 2>&1 | grep "\[METRICS\]"
```

**Expected Results:**
- [ ] Metrics logged for GET operations: `[METRICS] GET /feature-flags: {latency}ms`
- [ ] Metrics logged for PUT operations: `[METRICS] PUT /feature-flags: {latency}ms`
- [ ] Latency values: <200ms P99 (CI environment)

### 4. Reliability Test (10 Consecutive Runs)

**Execute:**
```bash
for i in {1..10}; do
  echo "Run $i of 10"
  npx playwright test tests/settings/system-settings.spec.ts --project=chromium --workers=1 --retries=0
  if [ $? -ne 0 ]; then
    echo "FAILED on run $i"
    break
  fi
done
```

**Expected Results:**
- [ ] 10/10 runs pass (100% pass rate)
- [ ] Zero timeout errors across all runs
- [ ] Retry attempts: <5% of operations

### 5. UI Verification

**Manual Steps:**
1. [ ] Navigate to `/settings/system` in browser
2. [ ] Toggle Cerberus security feature switch
3. [ ] Verify toggle animation completes
4. [ ] Verify "Saved" notification appears
5. [ ] Refresh page
6. [ ] Verify toggle state persists

**Expected Results:**
- [ ] UI responsive (<1s toggle feedback)
- [ ] State changes reflect immediately
- [ ] No console errors

## Bug Discovery Focus

**Look for potential issues in:**

### Backend Performance
- [ ] Feature flags endpoint latency spikes (>500ms)
- [ ] Database lock timeouts
- [ ] Transaction rollback failures
- [ ] Memory leaks after repeated toggles

### Test Resilience
- [ ] Retry logic not triggering on transient failures
- [ ] Polling timeouts on slow CI runners
- [ ] Race conditions in concurrent toggle test
- [ ] Hard-coded wait remnants causing flakiness

### Edge Cases
- [ ] Concurrent toggles causing data corruption
- [ ] Network failures not handled gracefully
- [ ] Max retries not throwing expected error
- [ ] Initial state mismatch in `beforeEach`

## Success Criteria

- [ ] All 35 checks above pass without issues
- [ ] Zero timeout errors in 10 consecutive runs
- [ ] Performance metrics confirm <200ms P99 latency
- [ ] Cross-browser compatibility verified
- [ ] No new bugs discovered during manual testing

## Failure Handling

**If any test fails:**

1. **Capture Evidence:**
   - Screenshot of failure
   - Full test output (no truncation)
   - `docker logs charon-e2e` output
   - Network/console logs from browser DevTools

2. **Analyze Root Cause:**
   - Is it a code defect or infrastructure issue?
   - Is it reproducible locally?
   - Does it happen in all browsers?

3. **Take Action:**
   - **Code Defect:** Reopen issue, describe failure, assign to developer
   - **Infrastructure:** Document in known issues, create follow-up ticket
   - **Flaky Test:** Investigate retry logic, increase timeouts if justified

## Notes

- Run tests during low CI load times for accurate performance measurement
- Use `--headed` flag for UI verification: `npx playwright test --headed`
- Check Playwright trace if tests fail: `npx playwright show-report`

---

**Assigned To:** QA Team
**Estimated Time:** 2-3 hours
**Due Date:** Within 24 hours of Playwright infrastructure fix