chore: git cache cleanup

2026-03-04 18:34:49 +00:00
parent c32cce2a88
commit 27c252600a
2001 changed files with 683185 additions and 0 deletions
--- a/docs/testing/e2e-best-practices.md
+++ b/docs/testing/e2e-best-practices.md
@@ -0,0 +1,488 @@
+# E2E Testing Best Practices
+
+**Purpose**: Document patterns and anti-patterns discovered during E2E test optimization to prevent future performance regressions and cross-browser failures.
+
+**Target Audience**: Developers writing Playwright E2E tests for Charon.
+
+## Table of Contents
+
+- [Feature Flag Testing](#feature-flag-testing)
+- [Cross-Browser Locators](#cross-browser-locators)
+- [API Call Optimization](#api-call-optimization)
+- [Performance Budget](#performance-budget)
+- [Test Isolation](#test-isolation)
+
+---
+
+## Feature Flag Testing
+
+### ❌ AVOID: Polling in beforeEach Hooks
+
+**Anti-Pattern**:
+```typescript
+test.beforeEach(async ({ page, adminUser }) => {
+  await loginUser(page, adminUser);
+  await page.goto('/settings/system');
+
+  // ⚠️ PROBLEM: Runs before EVERY test
+  await waitForFeatureFlagPropagation(
+    page,
+    {
+      'cerberus.enabled': true,
+      'crowdsec.console_enrollment': false,
+    },
+    { timeout: 10000 } // 10s timeout per test
+  );
+});
+```
+
+**Why This Is Bad**:
+- Polls `/api/v1/feature-flags` endpoint **31 times** per test file (once per test)
+- With 12 parallel processes (4 shards × 3 browsers), causes API server bottleneck
+- Adds 310s minimum execution time per shard (31 tests × 10s timeout)
+- Most tests don't modify feature flags, so polling is unnecessary
+
+**Real Impact**: Test shards exceeded 30-minute GitHub Actions timeout limit, blocking CI/CD pipeline.
+
+---
+
+### ✅ PREFER: Per-Test Verification Only When Toggled
+
+**Correct Pattern**:
+```typescript
+test('should toggle Cerberus feature', async ({ page }) => {
+  await test.step('Navigate to system settings', async () => {
+    await page.goto('/settings/system');
+    await waitForLoadingComplete(page);
+  });
+
+  await test.step('Toggle Cerberus feature', async () => {
+    const toggle = page.getByRole('switch', { name: /cerberus/i });
+    const initialState = await toggle.isChecked();
+
+    await retryAction(async () => {
+      const response = await clickSwitchAndWaitForResponse(page, toggle, /\/feature-flags/);
+      expect(response.ok()).toBeTruthy();
+
+      // ✅ ONLY verify propagation AFTER toggling
+      await waitForFeatureFlagPropagation(page, {
+        'cerberus.enabled': !initialState,
+      });
+    });
+  });
+});
+```
+
+**Why This Is Better**:
+- API calls reduced by **90%** (from 31 per shard to 3-5 per shard)
+- Only tests that actually toggle flags incur the polling cost
+- Faster test execution (shards complete in <15 minutes vs >30 minutes)
+- Clearer test intent—verification is tied to the action that requires it
+
+**Rule of Thumb**:
+- **No toggle, no propagation check**: If a test reads flag state without changing it, don't poll.
+- **Toggle = verify**: Always verify propagation after toggling to ensure state change persisted.
+
+---
+
+## Cross-Browser Locators
+
+### ❌ AVOID: Label-Only Locators
+
+**Anti-Pattern**:
+```typescript
+await test.step('Verify Script path/command field appears', async () => {
+  // ⚠️ PROBLEM: Fails in Firefox/WebKit
+  const scriptField = page.getByLabel(/script.*path/i);
+  await expect(scriptField).toBeVisible({ timeout: 10000 });
+});
+```
+
+**Why This Fails**:
+- Label locators depend on browser-specific DOM rendering
+- Firefox/WebKit may render Label components differently than Chromium
+- Regex patterns may not match if label has extra whitespace or is split across nodes
+- Results in **70% pass rate** on Firefox/WebKit vs 100% on Chromium
+
+---
+
+### ✅ PREFER: Multi-Strategy Locators with Fallbacks
+
+**Correct Pattern**:
+```typescript
+import { getFormFieldByLabel } from './utils/ui-helpers';
+
+await test.step('Verify Script path/command field appears', async () => {
+  // ✅ Tries multiple strategies until one succeeds
+  const scriptField = getFormFieldByLabel(
+    page,
+    /script.*path/i,
+    {
+      placeholder: /dns-challenge\.sh/i,
+      fieldId: 'field-script_path'
+    }
+  );
+  await expect(scriptField.first()).toBeVisible();
+});
+```
+
+**Helper Implementation** (`tests/utils/ui-helpers.ts`):
+```typescript
+/**
+ * Get form field with cross-browser label matching
+ * Tries multiple strategies: label, placeholder, id, aria-label
+ *
+ * @param page - Playwright Page object
+ * @param labelPattern - Regex or string to match label text
+ * @param options - Fallback strategies (placeholder, fieldId)
+ * @returns Locator that works across Chromium, Firefox, and WebKit
+ */
+export function getFormFieldByLabel(
+  page: Page,
+  labelPattern: string | RegExp,
+  options: { placeholder?: string | RegExp; fieldId?: string } = {}
+): Locator {
+  const baseLocator = page.getByLabel(labelPattern);
+
+  // Build fallback chain
+  let locator = baseLocator;
+
+  if (options.placeholder) {
+    locator = locator.or(page.getByPlaceholder(options.placeholder));
+  }
+
+  if (options.fieldId) {
+    locator = locator.or(page.locator(`#${options.fieldId}`));
+  }
+
+  // Fallback: role + label text nearby
+  if (typeof labelPattern === 'string') {
+    locator = locator.or(
+      page.getByRole('textbox').filter({
+        has: page.locator(`label:has-text("${labelPattern}")`),
+      })
+    );
+  }
+
+  return locator;
+}
+```
+
+**Why This Is Better**:
+- **95%+ pass rate** on Firefox/WebKit (up from 70%)
+- Gracefully degrades through fallback strategies
+- No browser-specific workarounds needed in test code
+- Single helper enforces consistent pattern across all tests
+
+**When to Use**:
+- Any test that interacts with form fields
+- Tests that must pass on all three browsers (Chromium, Firefox, WebKit)
+- Accessibility-critical tests (label locators are user-facing)
+
+---
+
+## API Call Optimization
+
+### ❌ AVOID: Duplicate API Requests
+
+**Anti-Pattern**:
+```typescript
+// Multiple tests in parallel all polling the same endpoint
+test('test 1', async ({ page }) => {
+  await waitForFeatureFlagPropagation(page, { flag: true }); // API call
+});
+
+test('test 2', async ({ page }) => {
+  await waitForFeatureFlagPropagation(page, { flag: true }); // Duplicate API call
+});
+```
+
+**Why This Is Bad**:
+- 12 parallel workers all hit `/api/v1/feature-flags` simultaneously
+- No request coalescing or caching
+- API server degrades under concurrent load
+- Tests timeout due to slow responses
+
+---
+
+### ✅ PREFER: Request Coalescing with Worker Isolation
+
+**Correct Pattern** (`tests/utils/wait-helpers.ts`):
+```typescript
+// Cache in-flight requests per worker
+const inflightRequests = new Map<string, Promise<Record<string, boolean>>>();
+
+function generateCacheKey(
+  expectedFlags: Record<string, boolean>,
+  workerIndex: number
+): string {
+  // Sort keys to ensure {a:true, b:false} === {b:false, a:true}
+  const sortedFlags = Object.keys(expectedFlags)
+    .sort()
+    .reduce((acc, key) => {
+      acc[key] = expectedFlags[key];
+      return acc;
+    }, {} as Record<string, boolean>);
+
+  // Include worker index to isolate parallel processes
+  return `${workerIndex}:${JSON.stringify(sortedFlags)}`;
+}
+
+export async function waitForFeatureFlagPropagation(
+  page: Page,
+  expectedFlags: Record<string, boolean>,
+  options: FeatureFlagPropagationOptions = {}
+): Promise<Record<string, boolean>> {
+  const workerIndex = test.info().parallelIndex;
+  const cacheKey = generateCacheKey(expectedFlags, workerIndex);
+
+  // Return existing promise if already in flight
+  if (inflightRequests.has(cacheKey)) {
+    console.log(`[CACHE HIT] Worker ${workerIndex}: ${cacheKey}`);
+    return inflightRequests.get(cacheKey)!;
+  }
+
+  console.log(`[CACHE MISS] Worker ${workerIndex}: ${cacheKey}`);
+
+  // Poll API endpoint (existing logic)...
+}
+```
+
+**Why This Is Better**:
+- **30-40% reduction** in duplicate API calls
+- Multiple tests requesting same state share one API call
+- Worker isolation prevents cache collisions between parallel processes
+- Sorted keys ensure semantic equivalence (`{a:true, b:false}` === `{b:false, a:true}`)
+
+**Cache Behavior**:
+- **Hit**: Another test in same worker already polling for same state
+- **Miss**: First test in worker to request this state OR different state requested
+- **Clear**: Cache cleared after all tests in worker complete (`test.afterAll()`)
+
+---
+
+## Performance Budget
+
+### ❌ PROBLEM: Shards Exceeding Timeout
+
+**Symptom**:
+```bash
+# GitHub Actions logs
+Error: The operation was canceled.
+Job duration: 31m 45s (exceeds 30m limit)
+```
+
+**Root Causes**:
+1. Feature flag polling in beforeEach (31 tests × 10s = 310s minimum)
+2. API bottleneck under parallel load
+3. Slow browser startup in CI environment
+4. Network latency for external resources
+
+---
+
+### ✅ SOLUTION: Enforce 15-Minute Budget Per Shard
+
+**CI Configuration** (`.github/workflows/e2e-tests.yml`):
+```yaml
+- name: Verify shard performance budget
+  if: always()
+  run: |
+    SHARD_DURATION=$((SHARD_END - SHARD_START))
+    MAX_DURATION=900  # 15 minutes = 900 seconds
+
+    if [[ $SHARD_DURATION -gt $MAX_DURATION ]]; then
+      echo "::error::Shard exceeded performance budget: ${SHARD_DURATION}s > ${MAX_DURATION}s"
+      echo "::error::Investigate slow tests or API bottlenecks"
+      exit 1
+    fi
+
+    echo "✅ Shard completed within budget: ${SHARD_DURATION}s"
+```
+
+**Why This Is Better**:
+- **Early detection** of performance regressions in CI
+- Forces developers to optimize slow tests before merge
+- Prevents accumulation of "death by a thousand cuts" slowdowns
+- Clear failure message directs investigation to bottleneck
+
+**How to Debug Timeouts**:
+1. **Check metrics**: Review API call counts in test output
+   ```bash
+   grep "CACHE HIT\|CACHE MISS" test-output.log
+   ```
+2. **Profile locally**: Instrument slow helpers
+   ```typescript
+   const startTime = Date.now();
+   await waitForLoadingComplete(page);
+   console.log(`Loading took ${Date.now() - startTime}ms`);
+   ```
+3. **Isolate shard**: Run failing shard locally to reproduce
+   ```bash
+   npx playwright test --shard=2/4 --project=firefox
+   ```
+
+---
+
+## Test Isolation
+
+### ❌ AVOID: State Leakage Between Tests
+
+**Anti-Pattern**:
+```typescript
+test('enable Cerberus', async ({ page }) => {
+  await toggleCerberus(page, true);
+  // ⚠️ PROBLEM: Doesn't restore state
+});
+
+test('ACL settings require Cerberus', async ({ page }) => {
+  // Assumes Cerberus is enabled from previous test
+  await page.goto('/settings/acl');
+  // ❌ FLAKY: Fails if first test didn't run or failed
+});
+```
+
+**Why This Is Bad**:
+- Tests depend on execution order (serial execution works, parallel fails)
+- Flakiness when running with `--workers=4` or `--repeat-each=5`
+- Hard to debug failures (root cause is in different test file)
+
+---
+
+### ✅ PREFER: Explicit State Restoration
+
+**Correct Pattern**:
+```typescript
+test.afterEach(async ({ page }) => {
+  await test.step('Restore default feature flag state', async () => {
+    const defaultFlags = {
+      'cerberus.enabled': true,
+      'crowdsec.console_enrollment': false,
+      'uptime.enabled': false,
+    };
+
+    // Direct API call to reset flags (no polling needed)
+    for (const [flag, value] of Object.entries(defaultFlags)) {
+      await page.evaluate(async ({ flag, value }) => {
+        await fetch(`/api/v1/feature-flags/${flag}`, {
+          method: 'PUT',
+          headers: { 'Content-Type': 'application/json' },
+          body: JSON.stringify({ enabled: value }),
+        });
+      }, { flag, value });
+    }
+  });
+});
+```
+
+**Why This Is Better**:
+- **Zero inter-test dependencies**: Tests can run in any order
+- Passes randomization testing: `--repeat-each=5 --workers=4`
+- Explicit cleanup makes state management visible in code
+- Fast restoration (no polling required, direct API call)
+
+**Validation Command**:
+```bash
+# Verify test isolation with randomization
+npx playwright test tests/settings/system-settings.spec.ts \
+  --repeat-each=5 \
+  --workers=4 \
+  --project=chromium
+
+# Should pass consistently regardless of execution order
+```
+
+---
+
+## Robust Assertions for Dynamic Content
+
+### ❌ AVOID: Boolean Logic on Transient States
+
+**Anti-Pattern**:
+```typescript
+const hasEmptyMessage = await emptyCellMessage.isVisible().catch(() => false);
+const hasTable = await table.isVisible().catch(() => false);
+expect(hasEmptyMessage || hasTable).toBeTruthy();
+```
+
+**Why This Is Bad**:
+- Fails during the split second where neither element is fully visible (loading transitions).
+- Playwright's auto-retrying logic is bypassed by the `catch()` block.
+- Leads to flaky "false negatives" where both checks return false before content loads.
+
+### ✅ PREFER: Locator Composition with `.or()`
+
+**Correct Pattern**:
+```typescript
+await expect(
+  page.getByRole('table').or(page.getByText(/no.*certificates.*found/i))
+).toBeVisible({ timeout: 10000 });
+```
+
+**Why This Is Better**:
+- Leverages Playwright's built-in **auto-retry** mechanism.
+- Waits for *either* condition to become true.
+- Handles loading spinners and layout shifts gracefully.
+- Reduces boilerplate code.
+
+---
+
+## Resilient Actions
+
+### ❌ AVOID: Fixed Timeouts or Custom Loops
+
+**Anti-Pattern**:
+```typescript
+// Flaky custom retry loop
+for (let i = 0; i < 3; i++) {
+  try {
+    await action();
+    break;
+  } catch (e) {
+    await page.waitForTimeout(1000);
+  }
+}
+```
+
+### ✅ PREFER: `.toPass()` for Verification Loops
+
+**Correct Pattern**:
+```typescript
+await expect(async () => {
+  const response = await request.post('/endpoint');
+  expect(response.ok()).toBeTruthy();
+}).toPass({
+  intervals: [1000, 2000, 5000],
+  timeout: 15_000
+});
+```
+
+**Why This Is Better**:
+- Built-in assertion retry logic.
+- Configurable backoff intervals.
+- Cleaner syntax for verifying eventual success (e.g. valid API response after background processing).
+
+---
+
+## Summary Checklist
+
+Before writing E2E tests, verify:
+
+- [ ] **Feature flags**: Only poll after toggling, not in beforeEach
+- [ ] **Locators**: Use `getFormFieldByLabel()` for form fields
+- [ ] **API calls**: Check for cache hit/miss logs, expect >30% hit rate
+- [ ] **Performance**: Local execution <5 minutes, CI shard <15 minutes
+- [ ] **Isolation**: Add `afterEach` cleanup if test modifies state
+- [ ] **Cross-browser**: Test passes on all three browsers (Chromium, Firefox, WebKit)
+
+---
+
+## References
+
+- **Implementation Details**: See `docs/plans/current_spec.md` (Fix 3.3)
+- **Helper Library**: `tests/utils/ui-helpers.ts`
+- **Playwright Config**: `playwright.config.js`
+- **CI Workflow**: `.github/workflows/e2e-tests.yml`
+
+---
+
+**Last Updated**: 2026-02-02