chore: git cache cleanup
This commit is contained in:
259
docs/plans/archive/e2e-test-triage-quick-start.md
Normal file
259
docs/plans/archive/e2e-test-triage-quick-start.md
Normal file
@@ -0,0 +1,259 @@
|
||||
# E2E Test Triage - Quick Start Guide
|
||||
|
||||
## Status: ROOT CAUSE IDENTIFIED ✅
|
||||
|
||||
**Date:** February 3, 2026
|
||||
**Test Suite:** Cross-browser Playwright (Chromium, Firefox, WebKit)
|
||||
**Total Tests:** 2,737
|
||||
|
||||
---
|
||||
|
||||
## Critical Finding
|
||||
|
||||
### Design Intent (CONFIRMED)
|
||||
Cerberus should be **ENABLED** during E2E tests to test the break glass feature:
|
||||
- Cerberus framework stays **ON** throughout test suite
|
||||
- All Cerberus tests run first (toggles, navigation, etc.)
|
||||
- **Break glass test runs LAST** to validate emergency override
|
||||
|
||||
### Problem
|
||||
13 E2E tests are **conditionally skipping** at runtime because:
|
||||
- Toggle buttons are **disabled** when Cerberus framework is off
|
||||
- Emergency security reset is disabling **Cerberus itself** (bug)
|
||||
- Tests check `toggle.isDisabled()` and skip when true
|
||||
|
||||
### Root Cause
|
||||
The `/emergency/security-reset` endpoint (used in `tests/global-setup.ts`) is incorrectly disabling:
|
||||
- ✓ `security.acl.enabled` = false ← CORRECT (module disabled)
|
||||
- ✓ `security.waf.enabled` = false ← CORRECT (module disabled)
|
||||
- ✓ `security.rate_limit.enabled` = false ← CORRECT (module disabled)
|
||||
- ✓ `security.crowdsec.enabled` = false ← CORRECT (module disabled)
|
||||
- ❌ **`feature.cerberus.enabled` = false** ← BUG (framework should stay enabled)
|
||||
|
||||
### Expected Behavior (CONFIRMED)
|
||||
For E2E tests, Cerberus should be:
|
||||
- **Framework Enabled:** `feature.cerberus.enabled` = true (allows testing)
|
||||
- **Modules Disabled:** Individual security modules off for clean state
|
||||
- **Test Order:** All Cerberus tests → Break glass test (LAST)
|
||||
|
||||
---
|
||||
|
||||
## Affected Tests (13 Total)
|
||||
|
||||
### Category 1: Security Dashboard - Toggle Actions (5 tests)
|
||||
- Test 77: Toggle ACL enabled/disabled
|
||||
- Test 78: Toggle WAF enabled/disabled
|
||||
- Test 79: Toggle Rate Limiting enabled/disabled
|
||||
- Test 80/214: Persist toggle state after page reload
|
||||
|
||||
### Category 2: Security Dashboard - Navigation (4 tests)
|
||||
- Test 81/250: Navigate to CrowdSec config
|
||||
- Test 83/309: Navigate to WAF config
|
||||
- Test 84/335: Navigate to Rate Limiting config
|
||||
|
||||
### Category 3: Rate Limiting Config (1 test)
|
||||
- Test 57/70: Toggle rate limiting on/off
|
||||
|
||||
### Category 4: CrowdSec Decisions (13 tests - SKIP OK)
|
||||
- Tests 42-53: Explicitly skipped with `test.describe.skip()`
|
||||
- **No action needed** - these require CrowdSec running (integration tests)
|
||||
|
||||
---
|
||||
|
||||
## Immediate Action Plan
|
||||
|
||||
### Step 1: Verify Current State ✅ CONFIRMED
|
||||
**Design Intent:** Cerberus should be enabled for break glass testing
|
||||
**Test Flow:** Global setup → All Cerberus tests → Break glass test (LAST)
|
||||
**Problem:** Emergency reset incorrectly disables Cerberus framework
|
||||
|
||||
Run diagnostic script:
|
||||
```bash
|
||||
./scripts/diagnose-test-env.sh
|
||||
```
|
||||
|
||||
Expected output shows:
|
||||
- ✓ Container running
|
||||
- ✗ Cerberus state unknown (no settings endpoint on emergency server)
|
||||
|
||||
### Step 2: Check Cerberus State via Main API
|
||||
```bash
|
||||
# Requires authentication - use your test user credentials
|
||||
curl -H "Authorization: Bearer <token>" http://localhost:8080/api/v1/security/config | jq '.cerberus // .feature.cerberus'
|
||||
```
|
||||
|
||||
### Step 3: Review Emergency Handler Code (INVESTIGATE)
|
||||
File: `backend/internal/api/handlers/emergency_handler.go`
|
||||
|
||||
Find the `SecurityReset` function and check what it's disabling:
|
||||
```bash
|
||||
grep -A 20 "func.*SecurityReset" backend/internal/api/handlers/emergency_handler.go
|
||||
```
|
||||
|
||||
### Step 4: Fix Emergency Reset Bug
|
||||
|
||||
**Goal:** Keep Cerberus enabled while disabling security modules
|
||||
|
||||
**Option A: Backend Fix (Recommended)**
|
||||
Modify `emergency_handler.go` SecurityReset to:
|
||||
- ❌ **REMOVE:** `feature.cerberus.enabled` = false (this is the bug)
|
||||
- ✓ **KEEP:** Disable individual security modules
|
||||
- ✓ **KEEP:** `security.{acl,waf,rate_limit,crowdsec}.enabled` = false
|
||||
|
||||
Expected behavior:
|
||||
- Framework stays enabled for testing
|
||||
- Modules disabled for clean slate
|
||||
- Break glass test can run last to validate emergency override
|
||||
|
||||
**Option B: Frontend State Reset (Workaround)**
|
||||
Add post-reset call in `tests/global-setup.ts`:
|
||||
```typescript
|
||||
// After emergency reset, re-enable Cerberus framework
|
||||
// (Workaround for backend bug where reset disables Cerberus)
|
||||
const enableResponse = await requestContext.patch('/api/v1/settings', {
|
||||
data: { 'feature.cerberus.enabled': true }
|
||||
});
|
||||
```
|
||||
|
||||
### Step 5: Validate Fix
|
||||
```bash
|
||||
# Rebuild E2E environment
|
||||
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
|
||||
|
||||
# Run affected tests
|
||||
npm run test:e2e -- tests/security/security-dashboard.spec.ts --project=chromium
|
||||
|
||||
# Verify toggles are enabled (not disabled)
|
||||
# Tests should now executed, not skip
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files to Review/Modify
|
||||
|
||||
### Backend
|
||||
- [ ] `backend/internal/api/handlers/emergency_handler.go` - SecurityReset function
|
||||
- [ ] `backend/internal/services/settings_service.go` - Settings update logic
|
||||
|
||||
### Tests
|
||||
- [ ] `tests/global-setup.ts` - Emergency reset call
|
||||
- [ ] `tests/security/security-dashboard.spec.ts` - Toggle tests
|
||||
- [ ] `tests/security/rate-limiting.spec.ts` - Toggle test
|
||||
|
||||
### Documentation
|
||||
- [x] `docs/plans/e2e-test-triage-plan.md` - Full triage plan (COMPLETE)
|
||||
- [x] `scripts/diagnose-test-env.sh` - Diagnostic script (CREATED)
|
||||
- [ ] Update after fix is implemented
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Before Fix
|
||||
```
|
||||
Running 2737 tests using 2 workers
|
||||
✓ pass - Tests that run successfully
|
||||
- skip - Tests that conditionally skip (13 affected)
|
||||
```
|
||||
|
||||
### After Fix
|
||||
```
|
||||
Running 2737 tests using 2 workers
|
||||
✓ pass - All 13 previously-skipped tests now execute
|
||||
- skip - Only explicitly skipped tests (test.describe.skip)
|
||||
```
|
||||
|
||||
### Validation Checklist
|
||||
- [ ] Emergency reset keeps Cerberus enabled
|
||||
- [ ] Emergency reset disables all security modules
|
||||
- [ ] Toggle buttons are enabled (not disabled)
|
||||
- [ ] Configure buttons are enabled (not disabled)
|
||||
- [ ] Tests execute instead of skip
|
||||
- [ ] Tests pass (or have actionable failures)
|
||||
- [ ] CI/CD pipeline updated if needed
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Investigate Backend** (30 min)
|
||||
- Read `emergency_handler.go` SecurityReset implementation
|
||||
- Determine what settings are being modified
|
||||
- Document current behavior
|
||||
|
||||
2. **Design Fix** (30 min)
|
||||
- Choose Option A (backend) or Option B (frontend)
|
||||
- Create implementation plan
|
||||
- Review with team if needed
|
||||
|
||||
3. **Implement Fix** (1-2 hours)
|
||||
- Make code changes
|
||||
- Add comments explaining the behavior
|
||||
- Test locally
|
||||
|
||||
4. **Validate** (30 min)
|
||||
- Run full E2E test suite
|
||||
- Check that skip count decreases
|
||||
- Verify tests pass
|
||||
|
||||
5. **Document** (15 min)
|
||||
- Update triage plan with resolution
|
||||
- Add decision record
|
||||
- Update any affected documentation
|
||||
|
||||
---
|
||||
|
||||
## Risk Assessment
|
||||
|
||||
### Low Risk Fix (Recommended)
|
||||
- Modify emergency reset to keep Cerberus enabled
|
||||
- Only affects test environment behavior
|
||||
- No production impact
|
||||
- Easy to rollback
|
||||
|
||||
### Rollback Plan
|
||||
```bash
|
||||
git checkout HEAD^ -- backend/internal/api/handlers/emergency_handler.go
|
||||
git checkout HEAD^ -- tests/global-setup.ts
|
||||
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Questions for Investigation
|
||||
|
||||
1. **Why does emergency reset disable Cerberus?** ✅ ANSWERED
|
||||
- **CONFIRMED BUG:** This is incorrect behavior
|
||||
- **Design Intent:** Cerberus should stay enabled for break glass testing
|
||||
- **Fix Required:** Remove line that disables `feature.cerberus.enabled`
|
||||
|
||||
2. **What should the test environment look like?** ✅ ANSWERED
|
||||
- **Cerberus Framework:** ENABLED (`feature.cerberus.enabled` = true)
|
||||
- **Security Modules:** DISABLED (clean slate for testing)
|
||||
- **Test Order:** All Cerberus tests → Break glass test (LAST)
|
||||
|
||||
3. **Are there other tests affected?**
|
||||
- Run full suite after fix
|
||||
- Check for cascading test failures
|
||||
- Validate assumptions
|
||||
|
||||
---
|
||||
|
||||
## Resources
|
||||
|
||||
- **Full Triage Plan:** [docs/plans/e2e-test-triage-plan.md](../plans/e2e-test-triage-plan.md)
|
||||
- **Diagnostic Script:** [scripts/diagnose-test-env.sh](../../scripts/diagnose-test-env.sh)
|
||||
- **Global Setup:** [tests/global-setup.ts](../../tests/global-setup.ts)
|
||||
- **Emergency Handler:** [backend/internal/api/handlers/emergency_handler.go](../../backend/internal/api/handlers/emergency_handler.go)
|
||||
- **Testing Instructions:** [.github/instructions/testing.instructions.md](../../.github/instructions/testing.instructions.md)
|
||||
|
||||
---
|
||||
|
||||
## Contact
|
||||
|
||||
For questions or clarification, see:
|
||||
- Triage Plan: Full analysis and categorization
|
||||
- Testing protocols: E2E test execution guidelines
|
||||
- Architecture docs: Cerberus security framework
|
||||
|
||||
**Status:** Ready for implementation - Root cause identified
|
||||
Reference in New Issue
Block a user