260 lines
8.2 KiB
Markdown
260 lines
8.2 KiB
Markdown
# E2E Test Triage - Quick Start Guide
|
|
|
|
## Status: ROOT CAUSE IDENTIFIED ✅
|
|
|
|
**Date:** February 3, 2026
|
|
**Test Suite:** Cross-browser Playwright (Chromium, Firefox, WebKit)
|
|
**Total Tests:** 2,737
|
|
|
|
---
|
|
|
|
## Critical Finding
|
|
|
|
### Design Intent (CONFIRMED)
|
|
Cerberus should be **ENABLED** during E2E tests to test the break glass feature:
|
|
- Cerberus framework stays **ON** throughout test suite
|
|
- All Cerberus tests run first (toggles, navigation, etc.)
|
|
- **Break glass test runs LAST** to validate emergency override
|
|
|
|
### Problem
|
|
13 E2E tests are **conditionally skipping** at runtime because:
|
|
- Toggle buttons are **disabled** when Cerberus framework is off
|
|
- Emergency security reset is disabling **Cerberus itself** (bug)
|
|
- Tests check `toggle.isDisabled()` and skip when true
|
|
|
|
### Root Cause
|
|
The `/emergency/security-reset` endpoint (used in `tests/global-setup.ts`) is incorrectly disabling:
|
|
- ✓ `security.acl.enabled` = false ← CORRECT (module disabled)
|
|
- ✓ `security.waf.enabled` = false ← CORRECT (module disabled)
|
|
- ✓ `security.rate_limit.enabled` = false ← CORRECT (module disabled)
|
|
- ✓ `security.crowdsec.enabled` = false ← CORRECT (module disabled)
|
|
- ❌ **`feature.cerberus.enabled` = false** ← BUG (framework should stay enabled)
|
|
|
|
### Expected Behavior (CONFIRMED)
|
|
For E2E tests, Cerberus should be:
|
|
- **Framework Enabled:** `feature.cerberus.enabled` = true (allows testing)
|
|
- **Modules Disabled:** Individual security modules off for clean state
|
|
- **Test Order:** All Cerberus tests → Break glass test (LAST)
|
|
|
|
---
|
|
|
|
## Affected Tests (13 Total)
|
|
|
|
### Category 1: Security Dashboard - Toggle Actions (5 tests)
|
|
- Test 77: Toggle ACL enabled/disabled
|
|
- Test 78: Toggle WAF enabled/disabled
|
|
- Test 79: Toggle Rate Limiting enabled/disabled
|
|
- Test 80/214: Persist toggle state after page reload
|
|
|
|
### Category 2: Security Dashboard - Navigation (4 tests)
|
|
- Test 81/250: Navigate to CrowdSec config
|
|
- Test 83/309: Navigate to WAF config
|
|
- Test 84/335: Navigate to Rate Limiting config
|
|
|
|
### Category 3: Rate Limiting Config (1 test)
|
|
- Test 57/70: Toggle rate limiting on/off
|
|
|
|
### Category 4: CrowdSec Decisions (13 tests - SKIP OK)
|
|
- Tests 42-53: Explicitly skipped with `test.describe.skip()`
|
|
- **No action needed** - these require CrowdSec running (integration tests)
|
|
|
|
---
|
|
|
|
## Immediate Action Plan
|
|
|
|
### Step 1: Verify Current State ✅ CONFIRMED
|
|
**Design Intent:** Cerberus should be enabled for break glass testing
|
|
**Test Flow:** Global setup → All Cerberus tests → Break glass test (LAST)
|
|
**Problem:** Emergency reset incorrectly disables Cerberus framework
|
|
|
|
Run diagnostic script:
|
|
```bash
|
|
./scripts/diagnose-test-env.sh
|
|
```
|
|
|
|
Expected output shows:
|
|
- ✓ Container running
|
|
- ✗ Cerberus state unknown (no settings endpoint on emergency server)
|
|
|
|
### Step 2: Check Cerberus State via Main API
|
|
```bash
|
|
# Requires authentication - use your test user credentials
|
|
curl -H "Authorization: Bearer <token>" http://localhost:8080/api/v1/security/config | jq '.cerberus // .feature.cerberus'
|
|
```
|
|
|
|
### Step 3: Review Emergency Handler Code (INVESTIGATE)
|
|
File: `backend/internal/api/handlers/emergency_handler.go`
|
|
|
|
Find the `SecurityReset` function and check what it's disabling:
|
|
```bash
|
|
grep -A 20 "func.*SecurityReset" backend/internal/api/handlers/emergency_handler.go
|
|
```
|
|
|
|
### Step 4: Fix Emergency Reset Bug
|
|
|
|
**Goal:** Keep Cerberus enabled while disabling security modules
|
|
|
|
**Option A: Backend Fix (Recommended)**
|
|
Modify `emergency_handler.go` SecurityReset to:
|
|
- ❌ **REMOVE:** `feature.cerberus.enabled` = false (this is the bug)
|
|
- ✓ **KEEP:** Disable individual security modules
|
|
- ✓ **KEEP:** `security.{acl,waf,rate_limit,crowdsec}.enabled` = false
|
|
|
|
Expected behavior:
|
|
- Framework stays enabled for testing
|
|
- Modules disabled for clean slate
|
|
- Break glass test can run last to validate emergency override
|
|
|
|
**Option B: Frontend State Reset (Workaround)**
|
|
Add post-reset call in `tests/global-setup.ts`:
|
|
```typescript
|
|
// After emergency reset, re-enable Cerberus framework
|
|
// (Workaround for backend bug where reset disables Cerberus)
|
|
const enableResponse = await requestContext.patch('/api/v1/settings', {
|
|
data: { 'feature.cerberus.enabled': true }
|
|
});
|
|
```
|
|
|
|
### Step 5: Validate Fix
|
|
```bash
|
|
# Rebuild E2E environment
|
|
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
|
|
|
|
# Run affected tests
|
|
npm run test:e2e -- tests/security/security-dashboard.spec.ts --project=chromium
|
|
|
|
# Verify toggles are enabled (not disabled)
|
|
# Tests should now executed, not skip
|
|
```
|
|
|
|
---
|
|
|
|
## Files to Review/Modify
|
|
|
|
### Backend
|
|
- [ ] `backend/internal/api/handlers/emergency_handler.go` - SecurityReset function
|
|
- [ ] `backend/internal/services/settings_service.go` - Settings update logic
|
|
|
|
### Tests
|
|
- [ ] `tests/global-setup.ts` - Emergency reset call
|
|
- [ ] `tests/security/security-dashboard.spec.ts` - Toggle tests
|
|
- [ ] `tests/security/rate-limiting.spec.ts` - Toggle test
|
|
|
|
### Documentation
|
|
- [x] `docs/plans/e2e-test-triage-plan.md` - Full triage plan (COMPLETE)
|
|
- [x] `scripts/diagnose-test-env.sh` - Diagnostic script (CREATED)
|
|
- [ ] Update after fix is implemented
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
### Before Fix
|
|
```
|
|
Running 2737 tests using 2 workers
|
|
✓ pass - Tests that run successfully
|
|
- skip - Tests that conditionally skip (13 affected)
|
|
```
|
|
|
|
### After Fix
|
|
```
|
|
Running 2737 tests using 2 workers
|
|
✓ pass - All 13 previously-skipped tests now execute
|
|
- skip - Only explicitly skipped tests (test.describe.skip)
|
|
```
|
|
|
|
### Validation Checklist
|
|
- [ ] Emergency reset keeps Cerberus enabled
|
|
- [ ] Emergency reset disables all security modules
|
|
- [ ] Toggle buttons are enabled (not disabled)
|
|
- [ ] Configure buttons are enabled (not disabled)
|
|
- [ ] Tests execute instead of skip
|
|
- [ ] Tests pass (or have actionable failures)
|
|
- [ ] CI/CD pipeline updated if needed
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Investigate Backend** (30 min)
|
|
- Read `emergency_handler.go` SecurityReset implementation
|
|
- Determine what settings are being modified
|
|
- Document current behavior
|
|
|
|
2. **Design Fix** (30 min)
|
|
- Choose Option A (backend) or Option B (frontend)
|
|
- Create implementation plan
|
|
- Review with team if needed
|
|
|
|
3. **Implement Fix** (1-2 hours)
|
|
- Make code changes
|
|
- Add comments explaining the behavior
|
|
- Test locally
|
|
|
|
4. **Validate** (30 min)
|
|
- Run full E2E test suite
|
|
- Check that skip count decreases
|
|
- Verify tests pass
|
|
|
|
5. **Document** (15 min)
|
|
- Update triage plan with resolution
|
|
- Add decision record
|
|
- Update any affected documentation
|
|
|
|
---
|
|
|
|
## Risk Assessment
|
|
|
|
### Low Risk Fix (Recommended)
|
|
- Modify emergency reset to keep Cerberus enabled
|
|
- Only affects test environment behavior
|
|
- No production impact
|
|
- Easy to rollback
|
|
|
|
### Rollback Plan
|
|
```bash
|
|
git checkout HEAD^ -- backend/internal/api/handlers/emergency_handler.go
|
|
git checkout HEAD^ -- tests/global-setup.ts
|
|
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
|
|
```
|
|
|
|
---
|
|
|
|
## Questions for Investigation
|
|
|
|
1. **Why does emergency reset disable Cerberus?** ✅ ANSWERED
|
|
- **CONFIRMED BUG:** This is incorrect behavior
|
|
- **Design Intent:** Cerberus should stay enabled for break glass testing
|
|
- **Fix Required:** Remove line that disables `feature.cerberus.enabled`
|
|
|
|
2. **What should the test environment look like?** ✅ ANSWERED
|
|
- **Cerberus Framework:** ENABLED (`feature.cerberus.enabled` = true)
|
|
- **Security Modules:** DISABLED (clean slate for testing)
|
|
- **Test Order:** All Cerberus tests → Break glass test (LAST)
|
|
|
|
3. **Are there other tests affected?**
|
|
- Run full suite after fix
|
|
- Check for cascading test failures
|
|
- Validate assumptions
|
|
|
|
---
|
|
|
|
## Resources
|
|
|
|
- **Full Triage Plan:** [docs/plans/e2e-test-triage-plan.md](../plans/e2e-test-triage-plan.md)
|
|
- **Diagnostic Script:** [scripts/diagnose-test-env.sh](../../scripts/diagnose-test-env.sh)
|
|
- **Global Setup:** [tests/global-setup.ts](../../tests/global-setup.ts)
|
|
- **Emergency Handler:** [backend/internal/api/handlers/emergency_handler.go](../../backend/internal/api/handlers/emergency_handler.go)
|
|
- **Testing Instructions:** [.github/instructions/testing.instructions.md](../../.github/instructions/testing.instructions.md)
|
|
|
|
---
|
|
|
|
## Contact
|
|
|
|
For questions or clarification, see:
|
|
- Triage Plan: Full analysis and categorization
|
|
- Testing protocols: E2E test execution guidelines
|
|
- Architecture docs: Cerberus security framework
|
|
|
|
**Status:** Ready for implementation - Root cause identified
|