- Implemented CrowdSecBouncerKeyDisplay component to fetch and display the bouncer API key information. - Added loading skeletons and error handling for API requests. - Integrated the new component into the Security page, conditionally rendering it based on CrowdSec status. - Created unit tests for the CrowdSecBouncerKeyDisplay component, covering various states including loading, registered/unregistered bouncer, and no key configured. - Added functional tests for the Security page to ensure proper rendering of the CrowdSec Bouncer Key Display based on the CrowdSec status. - Updated translation files to include new keys related to the bouncer API key functionality.
40 KiB
Current Active Work
<EFBFBD> NEW: CrowdSec Bouncer Auto-Registration & Key Persistence (2026-02-03)
Status: ✅ Plan Complete - Ready for Implementation Priority: P1 (User Experience Enhancement) Plan: crowdsec_bouncer_auto_registration.md Estimated Effort: 8-12 hours
Summary
Comprehensive plan to implement automatic bouncer registration, persistent key storage, and UI display. This supersedes the earlier crowdsec_lapi_auth_fix.md plan with a complete solution.
Key Features:
- Auto-register bouncer on CrowdSec enable (no manual
csclicommands) - Persist key to
/app/data/crowdsec/bouncer_key(survives rebuilds) - Log full key to container logs for user reference
- Fallback to file if no env var set
- Env var always takes precedence
- Display masked key in Security UI with copy button
- Auto-heal invalid keys by re-registering
See full plan: crowdsec_bouncer_auto_registration.md
📋 Prior Work: CrowdSec LAPI Authentication Failure
Superseded by: crowdsec_bouncer_auto_registration.md
The original quick-fix plan at crowdsec_lapi_auth_fix.md addressed the immediate workaround. The new comprehensive plan above provides the complete solution
📋 BUG FIX: Config API Endpoint in Break Glass Recovery Test (2026-02-03)
Status: ✅ Research Complete - Ready for Implementation Priority: P1 (Test Failure) Estimated Fix Time: 10 minutes File: tests/security-enforcement/zzzz-break-glass-recovery.spec.ts
Problem
GET /api/v1/config does not exist - the test fails with non-OK status when trying to verify that admin_whitelist was persisted.
Root Cause
Looking at routes.go#L237:
protected.PATCH("/config", settingsHandler.PatchConfig) // Only PATCH exists, no GET
There is NO GET /api/v1/config route defined. Only PATCH was implemented for bulk config updates.
Available GET Endpoints
| Endpoint | Response Format | Use For |
|---|---|---|
GET /api/v1/settings |
{ "key": "value", ... } (flat map) |
All settings |
GET /api/v1/security/config |
{ "config": { ...SecurityConfig } } |
Security-specific config |
Fix Required
File: tests/security-enforcement/zzzz-break-glass-recovery.spec.ts (lines 66-72)
Current (Broken):
const response = await request.get(`${BASE_URL}/api/v1/config`); // ❌ Doesn't exist
expect(body.security?.admin_whitelist).toBe('0.0.0.0/0'); // ❌ Wrong format
Fixed:
const response = await request.get(`${BASE_URL}/api/v1/security/config`); // ✅ Correct endpoint
expect(body.config?.admin_whitelist).toBe('0.0.0.0/0'); // ✅ Correct path
Acceptance Criteria
- Step 1 uses
GET /api/v1/security/configinstead ofGET /api/v1/config - Assertion accesses
body.config.admin_whitelist(notbody.security?.admin_whitelist) - All 4 steps in
zzzz-break-glass-recovery.spec.tspass
Route Reference
From settings_handler.go:
PatchConfig()(line 176) syncsadmin_whitelistto both Settings table AND SecurityConfig model
From security_handler.go:
GetConfig()(line 205) returns SecurityConfig withadmin_whitelistfield
<EFBFBD>🚨 URGENT: Shard 1 CI Failure Investigation (2026-02-03)
Status: ✅ Root Cause Identified - Fix Ready Priority: P0 (Blocking CI) Estimated Fix Time: 1 hour CI Run: https://github.com/Wikid82/Charon/actions/runs/21613888904
Problem
After completing Phase 1-3 timeout remediation work:
- Shard 1 failed on ALL 3 browsers (Chromium, Firefox, WebKit)
- Shards 2 & 3 passed
- Success Rate: 50% (6/12 jobs)
Root Cause
Dynamic imports in wait-helpers.ts cause module resolution delays in CI:
- 2
await import('./ui-helpers')statements in hot paths - CI single worker (
workers: 1) exposes cold cache timing issues - Shard 1 contains 4 refactored files using these helpers extensively
Solution
Remove dynamic imports, use static imports:
// Add at top of wait-helpers.ts:
import { clickSwitch } from './ui-helpers';
// Remove 2 dynamic import statements (lines 69-70, 108-109)
Impact: Eliminates async module resolution overhead, fixes all Shard 1 failures
Documentation
- Investigation Summary: shard1_investigation_summary.md
- Fix Plan: shard1_fix_plan.md
Phase 3: Coverage Improvement ✅ COMPLETE
Status: ✅ Complete (with documented constraints) Completed: 2026-02-03 Priority: P1 (Quality Improvement) Actual Effort: 7.5 hours (within 6-8 hour budget)
Summary: Improved backend coverage to 84.2% (+0.7%), identified frontend WebSocket testing infrastructure limitation. Both within 1% of 85% target.
Deliverables:
- ✅ Phase 3.4 Validation Report
- ✅ Phase 3.3 Completion Report
- ✅ Phase 3.3 Technical Findings
- ✅ Phase 3.1 Coverage Gap Analysis
Recommendation: Accept current coverage (84.2% backend, 84.25% frontend). Schedule test infrastructure upgrade (8-12 hours) for next sprint to unlock WebSocket component testing.
E2E Test Timeout Remediation Plan
Status: Active Created: 2026-02-02 Priority: P0 (Blocking CI/CD pipeline) Estimated Effort: 5-7 business days
Executive Summary
E2E tests are timing out due to cascading API bottleneck caused by feature flag polling in beforeEach hooks, combined with browser-specific label locator failures. This blocks PR merges and slows development velocity.
Impact:
- 31 tests × 10s timeout × 12 parallel processes = ~310s minimum execution time per shard
- 4 shards × 3 browsers = 12 jobs, many exceeding 30min GitHub Actions limit
- Firefox/WebKit tests fail on DNS provider form due to label locator mismatches
Root Cause Analysis
Primary Issue: Feature Flag Polling API Bottleneck
Location: tests/settings/system-settings.spec.ts (lines 27-48)
test.beforeEach(async ({ page, adminUser }) => {
await loginUser(page, adminUser);
await waitForLoadingComplete(page);
await page.goto('/settings/system');
await waitForLoadingComplete(page);
// ⚠️ PROBLEM: Runs before EVERY test
await waitForFeatureFlagPropagation(
page,
{
'cerberus.enabled': true,
'crowdsec.console_enrollment': false,
'uptime.enabled': false,
},
{ timeout: 10000 } // 10s timeout per test
).catch(() => {
console.log('[WARN] Initial state verification skipped');
});
});
Why This Causes Timeouts:
waitForFeatureFlagPropagation()polls/api/v1/feature-flagsevery 500ms for up to 10s- Runs in
beforeEachhook = executes 31 times per test file - 12 parallel processes (4 shards × 3 browsers) all hitting same endpoint
- API server degrades under concurrent load → tests timeout → shards exceed job limit
Evidence:
tests/utils/wait-helpers.ts(lines 411-470): Polling interval 500ms, default timeout 30s- Workflow config: 4 shards × 3 browsers = 12 concurrent jobs
- Observed: Multiple shards exceed 30min job timeout
Secondary Issue: Browser-Specific Label Locator Failures
Location: tests/dns-provider-types.spec.ts (line 260)
await test.step('Verify Script path/command field appears', async () => {
const scriptField = page.getByLabel(/script.*path/i);
await expect(scriptField).toBeVisible({ timeout: 10000 });
});
Why Firefox/WebKit Fail:
- Backend returns
script_pathfield with label "Script Path" - Frontend applies
aria-label="Script Path"to input (line 276 in DNSProviderForm.tsx) - Firefox/WebKit may render Label component differently than Chromium
- Regex
/script.*path/imay not match if label has extra whitespace or is split across nodes
Evidence:
frontend/src/components/DNSProviderForm.tsx(lines 273-279): Hardcodedaria-label="Script Path"backend/pkg/dnsprovider/custom/script_provider.go(line 85): Backend returns "Script Path"- Test passes in Chromium, fails in Firefox/WebKit = browser-specific rendering difference
Requirements (EARS Notation)
REQ-1: Feature Flag Polling Optimization
WHEN E2E tests execute, THE SYSTEM SHALL minimize API calls to feature flag endpoint to reduce load and execution time.
Acceptance Criteria:
- Feature flag polling occurs once per test file, not per test
- API calls reduced by 90% (from 31 per shard to <3 per shard)
- Test execution time reduced by 20-30%
REQ-2: Browser-Agnostic Label Locators
WHEN E2E tests query form fields, THE SYSTEM SHALL use locators that work consistently across Chromium, Firefox, and WebKit.
Acceptance Criteria:
- All DNS provider form tests pass on Firefox and WebKit
- Locators use multiple fallback strategies (
getByLabel,getByPlaceholder,getById) - No browser-specific workarounds needed
REQ-3: API Stress Reduction
WHEN parallel test processes execute, THE SYSTEM SHALL implement throttling or debouncing to prevent API bottlenecks.
Acceptance Criteria:
- Concurrent API calls limited via request coalescing
- Tests use cached responses where appropriate
- API server remains responsive under test load
REQ-4: Test Isolation
WHEN a test modifies feature flags, THE SYSTEM SHALL restore original state without requiring global polling.
Acceptance Criteria:
- Feature flag state restored per-test using direct API calls
- No inter-test dependencies on feature flag state
- Tests can run in any order without failures
Technical Design
Phase 1: Quick Fixes (Deploy within 24h)
Fix 1.1: Remove Unnecessary Feature Flag Polling from beforeEach
File: tests/settings/system-settings.spec.ts
Change: Remove waitForFeatureFlagPropagation() from beforeEach hook entirely.
Rationale:
- Tests already verify feature flag state in test steps
- Initial state verification is redundant if tests toggle and verify in each step
- Polling is only needed AFTER toggling, not before every test
Implementation:
test.beforeEach(async ({ page, adminUser }) => {
await loginUser(page, adminUser);
await waitForLoadingComplete(page);
await page.goto('/settings/system');
await waitForLoadingComplete(page);
// ✅ REMOVED: Feature flag polling - tests verify state individually
});
Expected Impact: 10s × 31 tests = 310s saved per shard
Fix 1.1b: Add Test Isolation Strategy
File: tests/settings/system-settings.spec.ts
Change: Add test.afterEach() hook to restore default feature flag state after each test.
Rationale:
- Not all 31 tests explicitly verify feature flag state in their steps
- Some tests may modify flags without restoring them
- State leakage between tests can cause flakiness
- Explicit cleanup ensures test isolation
Implementation:
test.afterEach(async ({ page }) => {
await test.step('Restore default feature flag state', async () => {
// Reset to known good state after each test
const defaultFlags = {
'cerberus.enabled': true,
'crowdsec.console_enrollment': false,
'uptime.enabled': false,
};
// Direct API call to reset flags (no polling needed)
for (const [flag, value] of Object.entries(defaultFlags)) {
await page.evaluate(async ({ flag, value }) => {
await fetch(`/api/v1/feature-flags/${flag}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ enabled: value }),
});
}, { flag, value });
}
});
});
Validation Command:
# Test isolation: Run tests in random order with multiple workers
npx playwright test tests/settings/system-settings.spec.ts \
--repeat-each=5 \
--workers=4 \
--project=chromium
# Should pass consistently regardless of execution order
Expected Impact: Eliminates inter-test dependencies, prevents state leakage
Fix 1.2: Investigate and Fix Root Cause of Label Locator Failures
File: tests/dns-provider-types.spec.ts
Change: Investigate why label locator fails in Firefox/WebKit before applying workarounds.
Current Symptom:
const scriptField = page.getByLabel(/script.*path/i);
await expect(scriptField).toBeVisible({ timeout: 10000 });
// ❌ Passes in Chromium, fails in Firefox/WebKit
Investigation Steps (REQUIRED before implementing fix):
-
Use Playwright Inspector to examine actual DOM structure:
PWDEBUG=1 npx playwright test tests/dns-provider-types.spec.ts \ --project=firefox \ --headed- Pause at failure point
- Inspect the form field in dev tools
- Check actual
aria-label,labelassociation, andforattribute - Document differences between Chromium vs Firefox/WebKit
-
Check Component Implementation:
- Review
frontend/src/components/DNSProviderForm.tsx(lines 273-279) - Verify Label component from shadcn/ui renders correctly
- Check if
htmlForattribute matches inputid - Test manual form interaction in Firefox/WebKit locally
- Review
-
Verify Backend Response:
- Inspect
/api/v1/dns-providers/custom/scriptresponse - Confirm
script_pathfield metadata includes correct label - Check if label is being transformed or sanitized
- Inspect
Potential Root Causes (investigate in order):
- Label component not associating with input (missing
htmlFor/idmatch) - Browser-specific text normalization (e.g., whitespace, case sensitivity)
- ARIA label override conflicting with visible label
- React hydration issue in Firefox/WebKit
Fix Strategy (only after investigation):
IF root cause is fixable in component:
- Fix the actual bug in
DNSProviderForm.tsx - No workaround needed in tests
- Document Decision in Decision Record (required)
IF root cause is browser-specific rendering quirk:
- Use
.or()chaining as documented fallback:await test.step('Verify Script path/command field appears', async () => { // Primary strategy: label locator (works in Chromium) const scriptField = page .getByLabel(/script.*path/i) .or(page.getByPlaceholder(/dns-challenge\.sh/i)) // Fallback 1 .or(page.locator('input[id^="field-script"]')); // Fallback 2 await expect(scriptField.first()).toBeVisible({ timeout: 10000 }); }); - Document Decision in Decision Record (required)
- Add comment explaining why
.or()is needed
Decision Record Template (create if workaround is needed):
### Decision - 2026-02-02 - DNS Provider Label Locator Workaround
**Decision**: Use `.or()` chaining for Script Path field locator
**Context**:
- `page.getByLabel(/script.*path/i)` fails in Firefox/WebKit
- Root cause: [document findings from investigation]
- Component: `DNSProviderForm.tsx` line 276
**Options**:
1. Fix component (preferred) - [reason why not chosen]
2. Use `.or()` chaining (chosen) - [reason]
3. Skip Firefox/WebKit tests - [reason why not chosen]
**Rationale**: [Explain trade-offs and why workaround is acceptable]
**Impact**:
- Test reliability: [describe]
- Maintenance burden: [describe]
- Future component changes: [describe]
**Review**: Re-evaluate when Playwright or shadcn/ui updates are applied
Expected Impact: Tests pass consistently on all browsers with understood root cause
Fix 1.3: Add Request Coalescing with Worker Isolation
File: tests/utils/wait-helpers.ts
Change: Cache in-flight requests with proper worker isolation and sorted keys.
Implementation:
// Add at module level
const inflightRequests = new Map<string, Promise<Record<string, boolean>>>();
/**
* Generate stable cache key with worker isolation
* Prevents cache collisions between parallel workers
*/
function generateCacheKey(
expectedFlags: Record<string, boolean>,
workerIndex: number
): string {
// Sort keys to ensure {a:true, b:false} === {b:false, a:true}
const sortedFlags = Object.keys(expectedFlags)
.sort()
.reduce((acc, key) => {
acc[key] = expectedFlags[key];
return acc;
}, {} as Record<string, boolean>);
// Include worker index to isolate parallel processes
return `${workerIndex}:${JSON.stringify(sortedFlags)}`;
}
export async function waitForFeatureFlagPropagation(
page: Page,
expectedFlags: Record<string, boolean>,
options: FeatureFlagPropagationOptions = {}
): Promise<Record<string, boolean>> {
// Get worker index from test info
const workerIndex = test.info().parallelIndex;
const cacheKey = generateCacheKey(expectedFlags, workerIndex);
// Return existing promise if already in flight for this worker
if (inflightRequests.has(cacheKey)) {
console.log(`[CACHE HIT] Worker ${workerIndex}: ${cacheKey}`);
return inflightRequests.get(cacheKey)!;
}
console.log(`[CACHE MISS] Worker ${workerIndex}: ${cacheKey}`);
const promise = (async () => {
// Existing polling logic...
const interval = options.interval ?? 500;
const timeout = options.timeout ?? 30000;
const maxAttempts = options.maxAttempts ?? Math.ceil(timeout / interval);
let lastResponse: Record<string, boolean> | null = null;
let attemptCount = 0;
while (attemptCount < maxAttempts) {
attemptCount++;
const response = await page.evaluate(async () => {
const res = await fetch('/api/v1/feature-flags', {
method: 'GET',
headers: { 'Content-Type': 'application/json' },
});
return { ok: res.ok, status: res.status, data: await res.json() };
});
lastResponse = response.data as Record<string, boolean>;
const allMatch = Object.entries(expectedFlags).every(
([key, expectedValue]) => response.data[key] === expectedValue
);
if (allMatch) {
inflightRequests.delete(cacheKey);
return lastResponse;
}
await page.waitForTimeout(interval);
}
inflightRequests.delete(cacheKey);
throw new Error(
`Feature flag propagation timeout after ${attemptCount} attempts (${timeout}ms).\n` +
`Expected: ${JSON.stringify(expectedFlags)}\n` +
`Actual: ${JSON.stringify(lastResponse)}`
);
})();
inflightRequests.set(cacheKey, promise);
return promise;
}
// Clear cache after all tests in a worker complete
test.afterAll(() => {
const workerIndex = test.info().parallelIndex;
const keysToDelete = Array.from(inflightRequests.keys())
.filter(key => key.startsWith(`${workerIndex}:`));
keysToDelete.forEach(key => inflightRequests.delete(key));
console.log(`[CLEANUP] Worker ${workerIndex}: Cleared ${keysToDelete.length} cache entries`);
});
Why Sorted Keys?
{a:true, b:false}vs{b:false, a:true}are semantically identical- Without sorting, they generate different cache keys → cache misses
- Sorting ensures consistent key regardless of property order
Why Worker Isolation?
- Playwright workers run in parallel across different browser contexts
- Each worker needs its own cache to avoid state conflicts
- Worker index provides unique namespace per parallel process
Expected Impact: Reduce duplicate API calls by 30-40% (revised from 70-80%)
Phase 2: Root Cause Fixes (Deploy within 72h)
Fix 2.1: Convert Feature Flag Verification to Per-Test Pattern
Files: All test files using waitForFeatureFlagPropagation()
Change: Move feature flag verification into individual test steps where state changes occur.
Pattern:
// ❌ OLD: Global beforeEach polling
test.beforeEach(async ({ page }) => {
await waitForFeatureFlagPropagation(page, { 'cerberus.enabled': true });
});
// ✅ NEW: Per-test verification only when toggled
test('should toggle Cerberus feature', async ({ page }) => {
await test.step('Toggle Cerberus feature', async () => {
const toggle = page.getByRole('switch', { name: /cerberus/i });
const initialState = await toggle.isChecked();
await retryAction(async () => {
const response = await clickSwitchAndWaitForResponse(page, toggle, /\/feature-flags/);
expect(response.ok()).toBeTruthy();
// Only verify propagation after toggle action
await waitForFeatureFlagPropagation(page, {
'cerberus.enabled': !initialState,
});
});
});
});
CRITICAL AUDIT REQUIREMENT:
Before implementing, audit all 31 tests in system-settings.spec.ts to identify:
- Which tests explicitly toggle feature flags (require propagation check)
- Which tests only read feature flag state (no propagation check needed)
- Which tests assume Cerberus is enabled (document dependency)
Audit Template:
| Test Name | Toggles Flags? | Requires Cerberus? | Action |
|-----------|----------------|-------------------|--------|
| "should display security settings" | No | Yes | Add dependency comment |
| "should toggle ACL" | Yes | Yes | Add propagation check |
| "should display CrowdSec status" | No | Yes | Add dependency comment |
Files to Update:
tests/settings/system-settings.spec.ts(31 tests)tests/cerberus/security-dashboard.spec.ts(if applicable)
Expected Impact: 90% reduction in API calls (from 31 per shard to 3-5 per shard)
Fix 2.2: Implement Label Helper for Cross-Browser Compatibility
File: tests/utils/ui-helpers.ts
Implementation:
/**
* Get form field with cross-browser label matching
* Tries multiple strategies: label, placeholder, id, aria-label
*/
export function getFormFieldByLabel(
page: Page,
labelPattern: string | RegExp,
options: { placeholder?: string | RegExp; fieldId?: string } = {}
): Locator {
const baseLocator = page.getByLabel(labelPattern);
// Build fallback chain
let locator = baseLocator;
if (options.placeholder) {
locator = locator.or(page.getByPlaceholder(options.placeholder));
}
if (options.fieldId) {
locator = locator.or(page.locator(`#${options.fieldId}`));
}
// Fallback: role + label text nearby
if (typeof labelPattern === 'string') {
locator = locator.or(
page.getByRole('textbox').filter({
has: page.locator(`label:has-text("${labelPattern}")`),
})
);
}
return locator;
}
Usage in Tests:
await test.step('Verify Script path/command field appears', async () => {
const scriptField = getFormFieldByLabel(
page,
/script.*path/i,
{
placeholder: /dns-challenge\.sh/i,
fieldId: 'field-script_path'
}
);
await expect(scriptField.first()).toBeVisible();
});
Files to Update:
tests/dns-provider-types.spec.ts(3 tests)tests/dns-provider-crud.spec.ts(accessibility tests)
Expected Impact: 100% pass rate on Firefox/WebKit
Fix 2.3: Add Conditional Feature Flag Verification
File: tests/utils/wait-helpers.ts
Change: Skip polling if already in expected state.
Implementation:
export async function waitForFeatureFlagPropagation(
page: Page,
expectedFlags: Record<string, boolean>,
options: FeatureFlagPropagationOptions = {}
): Promise<Record<string, boolean>> {
// Quick check: are we already in expected state?
const currentState = await page.evaluate(async () => {
const res = await fetch('/api/v1/feature-flags');
return res.json();
});
const alreadyMatches = Object.entries(expectedFlags).every(
([key, expectedValue]) => currentState[key] === expectedValue
);
if (alreadyMatches) {
console.log('[POLL] Feature flags already in expected state - skipping poll');
return currentState;
}
// Existing polling logic...
}
Expected Impact: 50% fewer iterations when state is already correct
Phase 3: Prevention & Monitoring (Deploy within 1 week)
Fix 3.1: Add E2E Performance Budget
File: .github/workflows/e2e-tests.yml
Change: Add step to enforce execution time limits per shard.
Implementation:
- name: Verify shard performance budget
if: always()
run: |
SHARD_DURATION=$((SHARD_END - SHARD_START))
MAX_DURATION=900 # 15 minutes
if [[ $SHARD_DURATION -gt $MAX_DURATION ]]; then
echo "::error::Shard exceeded performance budget: ${SHARD_DURATION}s > ${MAX_DURATION}s"
echo "::error::Investigate slow tests or API bottlenecks"
exit 1
fi
echo "✅ Shard completed within budget: ${SHARD_DURATION}s"
Expected Impact: Early detection of performance regressions
Fix 3.2: Add API Call Metrics to Test Reports
File: tests/utils/wait-helpers.ts
Change: Track and report API call counts.
Implementation:
// Track metrics at module level
const apiMetrics = {
featureFlagCalls: 0,
cacheHits: 0,
cacheMisses: 0,
};
export function getAPIMetrics() {
return { ...apiMetrics };
}
export function resetAPIMetrics() {
apiMetrics.featureFlagCalls = 0;
apiMetrics.cacheHits = 0;
apiMetrics.cacheMisses = 0;
}
// Update waitForFeatureFlagPropagation to increment counters
export async function waitForFeatureFlagPropagation(...) {
apiMetrics.featureFlagCalls++;
if (inflightRequests.has(cacheKey)) {
apiMetrics.cacheHits++;
return inflightRequests.get(cacheKey)!;
}
apiMetrics.cacheMisses++;
// ...
}
Add to test teardown:
test.afterAll(async () => {
const metrics = getAPIMetrics();
console.log('API Call Metrics:', metrics);
if (metrics.featureFlagCalls > 50) {
console.warn(`⚠️ High API call count: ${metrics.featureFlagCalls}`);
}
});
Expected Impact: Visibility into API usage patterns
Fix 3.3: Document Best Practices for E2E Tests
File: docs/testing/e2e-best-practices.md (to be created)
Content:
# E2E Testing Best Practices
## Feature Flag Testing
### ❌ AVOID: Polling in beforeEach
```typescript
test.beforeEach(async ({ page }) => {
// This runs before EVERY test - expensive!
await waitForFeatureFlagPropagation(page, { flag: true });
});
✅ PREFER: Per-test verification
test('feature toggle', async ({ page }) => {
// Only verify after we change the flag
await clickToggle(page);
await waitForFeatureFlagPropagation(page, { flag: false });
});
Cross-Browser Locators
❌ AVOID: Label-only locators
page.getByLabel(/script.*path/i) // May fail in Firefox/WebKit
✅ PREFER: Multi-strategy locators
getFormFieldByLabel(page, /script.*path/i, {
placeholder: /dns-challenge/i,
fieldId: 'field-script_path'
})
**Expected Impact**: Prevent future performance regressions
## Implementation Plan
### Sprint 1: Quick Wins (Days 1-2)
- [ ] **Task 1.1**: Remove feature flag polling from `system-settings.spec.ts` beforeEach
- **Assignee**: TBD
- **Files**: `tests/settings/system-settings.spec.ts`
- **Validation**: Run test file locally, verify <5min execution time
- [ ] **Task 1.1b**: Add test isolation with afterEach cleanup
- **Assignee**: TBD
- **Files**: `tests/settings/system-settings.spec.ts`
- **Validation**:
```bash
npx playwright test tests/settings/system-settings.spec.ts \
--repeat-each=5 --workers=4 --project=chromium
```
- [ ] **Task 1.2**: Investigate label locator failures (BEFORE implementing workaround)
- **Assignee**: TBD
- **Files**: `tests/dns-provider-types.spec.ts`, `frontend/src/components/DNSProviderForm.tsx`
- **Validation**: Document investigation findings, create Decision Record if workaround needed
- [ ] **Task 1.3**: Add request coalescing with worker isolation
- **Assignee**: TBD
- **Files**: `tests/utils/wait-helpers.ts`
- **Validation**: Check console logs for cache hits/misses, verify cache clears in afterAll
**Sprint 1 Go/No-Go Checkpoint**:
✅ **PASS Criteria** (all must be green):
1. **Execution Time**: Test file runs in <5min locally
```bash
time npx playwright test tests/settings/system-settings.spec.ts --project=chromium
Expected: <300s
-
Test Isolation: Tests pass with randomization
npx playwright test tests/settings/system-settings.spec.ts \ --repeat-each=5 --workers=4 --shard=1/1Expected: 0 failures
-
Cache Performance: Cache hit rate >30%
grep -o "CACHE HIT" test-output.log | wc -l grep -o "CACHE MISS" test-output.log | wc -l # Calculate: hits / (hits + misses) > 0.30
❌ STOP and Investigate If:
- Execution time >5min (insufficient improvement)
- Test isolation fails (indicates missing cleanup)
- Cache hit rate <20% (worker isolation not working)
- Any new test failures introduced
Action on Failure: Revert changes, root cause analysis, re-plan before proceeding to Sprint 2
Sprint 2: Root Fixes (Days 3-5)
-
Task 2.0: Audit all 31 tests for Cerberus dependencies
- Assignee: TBD
- Files:
tests/settings/system-settings.spec.ts - Validation: Complete audit table identifying which tests require propagation checks
-
Task 2.1: Refactor feature flag verification to per-test pattern
- Assignee: TBD
- Files:
tests/settings/system-settings.spec.ts(only tests that toggle flags) - Validation: All tests pass with <50 API calls total (check metrics)
- Note: Add
test.step()wrapper to all refactored examples
-
Task 2.2: Create
getFormFieldByLabelhelper (only if workaround confirmed needed)- Assignee: TBD
- Files:
tests/utils/ui-helpers.ts - Validation: Use in 3 test files, verify Firefox/WebKit pass
- Prerequisite: Decision Record from Task 1.2 investigation
-
Task 2.3: Add conditional skip to feature flag polling
- Assignee: TBD
- Files:
tests/utils/wait-helpers.ts - Validation: Verify early exit logs appear when state already matches
Sprint 2 Go/No-Go Checkpoint:
✅ PASS Criteria (all must be green):
-
API Call Reduction: <50 calls per shard
# Add instrumentation to count API calls grep "GET /api/v1/feature-flags" charon.log | wc -lExpected: <50 calls
-
Cross-Browser Stability: Firefox/WebKit pass rate >95%
npx playwright test --project=firefox --project=webkitExpected: <5% failure rate
-
Test Coverage: No coverage regression
.github/skills/scripts/skill-runner.sh test-e2e-playwright-coverageExpected: Coverage ≥ baseline
-
Audit Completeness: All 31 tests categorized
- Verify audit table is 100% complete
- All tests have appropriate propagation checks or dependency comments
❌ STOP and Investigate If:
- API calls still >100 per shard (insufficient improvement)
- Firefox/WebKit pass rate <90% (locator fixes inadequate)
- Coverage drops >2% (tests not properly refactored)
- Missing audit entries (incomplete understanding of dependencies)
Action on Failure: Do NOT proceed to Sprint 3. Re-analyze bottlenecks and revise approach.
Sprint 3: Prevention (Days 6-7)
-
Task 3.1: Add performance budget check to CI
- Assignee: TBD
- Files:
.github/workflows/e2e-tests.yml - Validation: Trigger workflow, verify budget check runs
-
Task 3.2: Implement API call metrics tracking
- Assignee: TBD
- Files:
tests/utils/wait-helpers.ts, test files - Validation: Run test suite, verify metrics in console output
-
Task 3.3: Document E2E best practices
- Assignee: TBD
- Files:
docs/testing/e2e-best-practices.md(create) - Validation: Technical review by team
Coverage Impact Analysis
Baseline Coverage Requirements
MANDATORY: Before making ANY changes, establish baseline coverage:
# Create baseline coverage report
.github/skills/scripts/skill-runner.sh test-e2e-playwright-coverage
# Save baseline metrics
cp coverage/e2e/lcov.info coverage/e2e/baseline-lcov.info
cp coverage/e2e/coverage-summary.json coverage/e2e/baseline-summary.json
# Document baseline
echo "Baseline Coverage: $(grep -A 1 'lines' coverage/e2e/coverage-summary.json)" >> docs/plans/coverage-baseline.txt
Baseline Thresholds (from playwright.config.js):
- Lines: ≥80%
- Functions: ≥80%
- Branches: ≥80%
- Statements: ≥80%
Codecov Requirements
100% Patch Coverage (from codecov.yml):
- Every line of production code modified MUST be covered by tests
- Applies to frontend changes in:
tests/settings/system-settings.spec.tstests/dns-provider-types.spec.tstests/utils/wait-helpers.tstests/utils/ui-helpers.ts
Verification Commands:
# After each sprint, verify coverage
.github/skills/scripts/skill-runner.sh test-e2e-playwright-coverage
# Compare to baseline
diff coverage/e2e/baseline-summary.json coverage/e2e/coverage-summary.json
# Check for regressions
if [[ $(jq '.total.lines.pct' coverage/e2e/coverage-summary.json) < $(jq '.total.lines.pct' coverage/e2e/baseline-summary.json) ]]; then
echo "❌ Coverage regression detected"
exit 1
fi
# Upload to Codecov (CI will enforce patch coverage)
git diff --name-only main...HEAD > changed-files.txt
curl -s https://codecov.io/bash | bash -s -- -f coverage/e2e/lcov.info
Impact Analysis by Sprint
Sprint 1 Changes:
- Files:
system-settings.spec.ts,wait-helpers.ts - Risk: Removing polling might reduce coverage of error paths
- Mitigation: Ensure error handling in
afterEachis tested
Sprint 2 Changes:
- Files:
system-settings.spec.ts(31 tests refactored),ui-helpers.ts - Risk: Per-test refactoring might miss edge cases
- Mitigation: Run coverage diff after each test refactored
Sprint 3 Changes:
- Files: E2E workflow, test documentation
- Risk: No production code changes (no coverage impact)
Coverage Failure Protocol
IF coverage drops below baseline:
- Identify uncovered lines:
npx nyc report --reporter=html - Add targeted tests for missed paths
- Re-run coverage verification
- DO NOT merge until coverage restored
IF Codecov reports <100% patch coverage:
- Review Codecov PR comment for specific lines
- Add test cases covering modified lines
- Push fixup commit
- Re-check Codecov status
Validation Strategy
Local Testing (Before push)
# Quick validation: Run affected test file
npx playwright test tests/settings/system-settings.spec.ts --project=chromium
# Cross-browser validation
npx playwright test tests/dns-provider-types.spec.ts --project=firefox --project=webkit
# Full suite (should complete in <20min per shard)
npx playwright test --shard=1/4
CI Validation (After push)
- Green CI: All 12 jobs (4 shards × 3 browsers) pass
- Performance: Each shard completes in <15min (down from 30min)
- API Calls: Feature flag endpoint receives <100 requests per shard (down from ~1000)
Rollback Plan
If fixes introduce failures:
- Revert commits atomically (Fix 1.1, 1.2, 1.3 are independent)
- Re-enable
test.skip()for failing tests temporarily - Document known issues in PR comments
Success Metrics
| Metric | Before | Target | How to Measure |
|---|---|---|---|
| Shard Execution Time | 30+ min | <15 min | GitHub Actions logs |
| Feature Flag API Calls | ~1000/shard | <100/shard | Add metrics to wait-helpers.ts |
| Firefox/WebKit Pass Rate | 70% | 95%+ | CI test results |
| Job Timeout Rate | 30% | <5% | GitHub Actions workflow analytics |
Performance Profiling (Optional Enhancement)
Profiling waitForLoadingComplete()
During Sprint 1, if time permits, profile waitForLoadingComplete() to identify additional bottlenecks:
// Add instrumentation to wait-helpers.ts
export async function waitForLoadingComplete(page: Page, timeout = 30000) {
const startTime = Date.now();
await page.waitForLoadState('networkidle', { timeout });
await page.waitForLoadState('domcontentloaded', { timeout });
const duration = Date.now() - startTime;
if (duration > 5000) {
console.warn(`[SLOW] waitForLoadingComplete took ${duration}ms`);
}
return duration;
}
Analysis:
# Run tests with profiling enabled
npx playwright test tests/settings/system-settings.spec.ts --project=chromium > profile.log
# Extract slow calls
grep "\[SLOW\]" profile.log | sort -t= -k2 -n
# Identify patterns
# - Is networkidle too strict?
# - Are certain pages slower than others?
# - Can we use 'load' state instead of 'networkidle'?
Potential Optimization:
If networkidle is consistently slow, consider using load state for non-critical pages:
// For pages without dynamic content
await page.waitForLoadState('load', { timeout });
// For pages with feature flag updates
await page.waitForLoadState('networkidle', { timeout });
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Breaking existing tests | Medium | High | Run full test suite locally before push |
| Firefox/WebKit still fail | Low | Medium | Add .or() chaining for more fallbacks |
| API server still bottlenecks | Low | Medium | Add rate limiting to test container |
| Regressions in future PRs | Medium | Medium | Add performance budget check to CI |
Infrastructure Considerations
Current Setup
- Workflow:
.github/workflows/e2e-tests.yml - Sharding: 4 shards × 3 browsers = 12 jobs
- Timeout: 30 minutes per job
- Concurrency: All jobs run in parallel
Recommended Changes
- Add caching: Cache Playwright browsers between runs (already implemented)
- Optimize container startup: Health check timeout reduced from 60s to 30s
- Consider: Reduce shards from 4→3 if execution time improves sufficiently
Monitoring
- Track job duration trends in GitHub Actions analytics
- Alert if shard duration exceeds 20min
- Weekly review of flaky test reports
Decision Record Template for Workarounds
Whenever a workaround is implemented instead of fixing root cause (e.g., .or() chaining for label locators), document the decision:
### Decision - [DATE] - [BRIEF TITLE]
**Decision**: [What was decided]
**Context**:
- Original issue: [Describe problem]
- Root cause investigation findings: [Summarize]
- Component/file affected: [Specific paths]
**Options Evaluated**:
1. **Fix root cause** (preferred)
- Pros: [List]
- Cons: [List]
- Why not chosen: [Specific reason]
2. **Workaround with `.or()` chaining** (chosen)
- Pros: [List]
- Cons: [List]
- Why chosen: [Specific reason]
3. **Skip affected browsers** (rejected)
- Pros: [List]
- Cons: [List]
- Why not chosen: [Specific reason]
**Rationale**:
[Detailed explanation of trade-offs and why workaround is acceptable]
**Impact**:
- **Test Reliability**: [Describe expected improvement]
- **Maintenance Burden**: [Describe ongoing cost]
- **Future Considerations**: [What needs to be revisited]
**Review Schedule**:
[When to re-evaluate - e.g., "After Playwright 1.50 release" or "Q2 2026"]
**References**:
- Investigation notes: [Link to investigation findings]
- Related issues: [GitHub issues, if any]
- Component documentation: [Relevant docs]
Where to Store:
- Simple decisions: Inline comment in test file
- Complex decisions:
docs/decisions/workaround-[feature]-[date].md - Reference in PR description when merging
Additional Files to Review
Before implementation, review these files for context:
playwright.config.js- Test configuration, timeout settings.docker/compose/docker-compose.playwright-ci.yml- Container environmenttests/fixtures/auth-fixtures.ts- Login helper usagetests/cerberus/security-dashboard.spec.ts- Other files using feature flag pollingcodecov.yml- Coverage requirements (patch coverage must remain 100%)
References
- Original Issue: GitHub Actions job timeouts in E2E workflow
- Related Docs:
docs/testing/playwright-typescript.instructions.md- Test writing guidelinesdocs/testing/testing.instructions.md- Testing protocols.github/instructions/testing.instructions.md- CI testing protocols
- Prior Plans:
docs/plans/phase4-settings-plan.md- System settings feature implementationdocs/implementation/dns_providers_IMPLEMENTATION.md- DNS provider architecture
Next Steps
- Triage: Assign tasks to team members
- Sprint 1 Kickoff: Implement quick fixes (1-2 days)
- PR Review: All changes require approval before merge
- Monitor: Track metrics for 1 week post-deployment
- Iterate: Adjust thresholds based on real-world performance
Last Updated: 2026-02-02 Owner: TBD Reviewers: TBD