chore: Enhance documentation for E2E testing:

- Added clarity and structure to README files, including recent updates and getting started sections. - Improved manual verification documentation for CrowdSec authentication, emphasizing expected outputs and success criteria. - Updated debugging guide with detailed output examples and automatic trace capture information. - Refined best practices for E2E tests, focusing on efficient polling, locator strategies, and state management. - Documented triage report for DNS Provider feature tests, highlighting issues fixed and test results before and after improvements. - Revised E2E test writing guide to include when to use specific helper functions and patterns for better test reliability. - Enhanced troubleshooting documentation with clear resolutions for common issues, including timeout and token configuration problems. - Updated tests README to provide quick links and best practices for writing robust tests.
2026-03-24 01:47:22 +00:00
parent 7d986f2821
commit ca477c48d4
52 changed files with 983 additions and 198 deletions
--- a/docs/decisions/sprint1-timeout-remediation-findings.md
+++ b/docs/decisions/sprint1-timeout-remediation-findings.md
@@ -11,11 +11,13 @@
 **File**: `tests/settings/system-settings.spec.ts`

 **Changes Made**:
+
 1. **Removed** `waitForFeatureFlagPropagation()` call from `beforeEach` hook (lines 35-46)
   - This was causing 10s × 31 tests = 310s of polling overhead per shard
   - Commented out with clear explanation linking to remediation plan

 2. **Added** `test.afterEach()` hook with direct API state restoration:
+
   ```typescript
   test.afterEach(async ({ page }) => {
     await test.step('Restore default feature flag state', async () => {
@@ -34,12 +36,14 @@
   ```

 **Rationale**:
+
 - Tests already verify feature flag state individually after toggle actions
 - Initial state verification in beforeEach was redundant
 - Explicit cleanup in afterEach ensures test isolation without polling overhead
 - Direct API mutation for state restoration is faster than polling

 **Expected Impact**:
+
 - 310s saved per shard (10s × 31 tests)
 - Elimination of inter-test dependencies
 - No state leakage between tests
@@ -51,12 +55,14 @@
 **Changes Made**:

 1. **Added module-level cache** for in-flight requests:
+
   ```typescript
   // Cache for in-flight requests (per-worker isolation)
   const inflightRequests = new Map<string, Promise<Record<string, boolean>>>();
   ```

 2. **Implemented cache key generation** with sorted keys and worker isolation:
+
   ```typescript
   function generateCacheKey(
     expectedFlags: Record<string, boolean>,
@@ -81,6 +87,7 @@
   - Removes promise from cache after completion (success or failure)

 4. **Added cleanup function**:
+
   ```typescript
   export function clearFeatureFlagCache(): void {
     inflightRequests.clear();
@@ -89,16 +96,19 @@
   ```

 **Why Sorted Keys?**
+
 - `{a:true, b:false}` vs `{b:false, a:true}` are semantically identical
 - Without sorting, they generate different cache keys → cache misses
 - Sorting ensures consistent key regardless of property order

 **Why Worker Isolation?**
+
 - Playwright workers run in parallel across different browser contexts
 - Each worker needs its own cache to avoid state conflicts
 - Worker index provides unique namespace per parallel process

 **Expected Impact**:
+
 - 30-40% reduction in duplicate API calls (revised from original 70-80% estimate)
 - Cache hit rate should be >30% based on similar flag state checks
 - Reduced API server load during parallel test execution
@@ -108,21 +118,26 @@
 **Status**: Partially Investigated

 **Issue**:
+
 - Test: `tests/dns-provider-types.spec.ts` (line 260)
 - Symptom: Label locator `/script.*path/i` passes in Chromium, fails in Firefox/WebKit
 - Test code:
+
  ```typescript
  const scriptField = page.getByLabel(/script.*path/i);
  await expect(scriptField).toBeVisible({ timeout: 10000 });
  ```

 **Investigation Steps Completed**:
+
 1. ✅ Confirmed E2E environment is running and healthy
 2. ✅ Attempted to run DNS provider type tests in Chromium
 3. ⏸️ Further investigation deferred due to test execution issues

 **Investigation Steps Remaining** (per spec):
+
 1. Run with Playwright Inspector to compare accessibility trees:
+
   ```bash
   npx playwright test tests/dns-provider-types.spec.ts --project=chromium --headed --debug
   npx playwright test tests/dns-provider-types.spec.ts --project=firefox --headed --debug
@@ -137,6 +152,7 @@
 5. If not fixable: Use the helper function approach from Phase 2

 **Recommendation**:
+
 - Complete investigation in separate session with headed browser mode
 - DO NOT add `.or()` chains unless investigation proves it's necessary
 - Create formal Decision Record once root cause is identified
@@ -144,31 +160,37 @@
 ## Validation Checkpoints

 ### Checkpoint 1: Execution Time
+
 **Status**: ⏸️ In Progress

 **Target**: <15 minutes (900s) for full test suite

 **Command**:
+
 ```bash
 time npx playwright test tests/settings/system-settings.spec.ts --project=chromium
 ```

 **Results**:
+
 - Test execution interrupted during validation
 - Observed: Tests were picking up multiple spec files from security/ folder
 - Need to investigate test file patterns or run with more specific filtering

 **Action Required**:
+
 - Re-run with corrected test file path or filtering
 - Ensure only system-settings tests are executed
 - Measure execution time and compare to baseline

 ### Checkpoint 2: Test Isolation
+
 **Status**: ⏳ Pending

 **Target**: All tests pass with `--repeat-each=5 --workers=4`

 **Command**:
+
 ```bash
 npx playwright test tests/settings/system-settings.spec.ts --project=chromium --repeat-each=5 --workers=4
 ```
@@ -176,11 +198,13 @@ npx playwright test tests/settings/system-settings.spec.ts --project=chromium --
 **Status**: Not executed yet

 ### Checkpoint 3: Cross-browser
+
 **Status**: ⏳ Pending

 **Target**: Firefox/WebKit pass rate >85%

 **Command**:
+
 ```bash
 npx playwright test tests/settings/system-settings.spec.ts --project=firefox --project=webkit
 ```
@@ -188,11 +212,13 @@ npx playwright test tests/settings/system-settings.spec.ts --project=firefox --p
 **Status**: Not executed yet

 ### Checkpoint 4: DNS provider tests (secondary issue)
+
 **Status**: ⏳ Pending

 **Target**: Firefox tests pass or investigation complete

 **Command**:
+
 ```bash
 npx playwright test tests/dns-provider-types.spec.ts --project=firefox
 ```
@@ -204,11 +230,13 @@ npx playwright test tests/dns-provider-types.spec.ts --project=firefox
 ### Decision: Use Direct API Mutation for State Restoration

 **Context**:
+
 - Tests need to restore default feature flag state after modifications
 - Original approach used polling-based verification in beforeEach
 - Alternative approaches: polling in afterEach vs direct API mutation

 **Options Evaluated**:
+
 1. **Polling in afterEach** - Verify state propagated after mutation
   - Pros: Confirms state is actually restored
   - Cons: Adds 500ms-2s per test (polling overhead)
@@ -219,12 +247,14 @@ npx playwright test tests/dns-provider-types.spec.ts --project=firefox
   - Why chosen: Feature flag updates are synchronous in backend

 **Rationale**:
+
 - Feature flag updates via PUT /api/v1/feature-flags are processed synchronously
 - Database write is immediate (SQLite WAL mode)
 - No async propagation delay in single-process test environment
 - Subsequent tests will verify state on first read, catching any issues

 **Impact**:
+
 - Test runtime reduced by 15-60s per test file (31 tests × 500ms-2s polling)
 - Risk: If state restoration fails, next test will fail loudly (detectable)
 - Acceptable trade-off for 10-20% execution time improvement
@@ -234,15 +264,18 @@ npx playwright test tests/dns-provider-types.spec.ts --project=firefox
 ### Decision: Cache Key Sorting for Semantic Equality

 **Context**:
+
 - Multiple tests may check the same feature flag state but with different property order
 - Without normalization, `{a:true, b:false}` and `{b:false, a:true}` generate different keys

 **Rationale**:
+
 - JavaScript objects have insertion order, but semantically these are identical states
 - Sorting keys ensures cache hits for semantically identical flag states
 - Minimal performance cost (~1ms for sorting 3-5 keys)

 **Impact**:
+
 - Estimated 10-15% cache hit rate improvement
 - No downside - pure optimization