chore: Enhance documentation for E2E testing:

- Added clarity and structure to README files, including recent updates and getting started sections. - Improved manual verification documentation for CrowdSec authentication, emphasizing expected outputs and success criteria. - Updated debugging guide with detailed output examples and automatic trace capture information. - Refined best practices for E2E tests, focusing on efficient polling, locator strategies, and state management. - Documented triage report for DNS Provider feature tests, highlighting issues fixed and test results before and after improvements. - Revised E2E test writing guide to include when to use specific helper functions and patterns for better test reliability. - Enhanced troubleshooting documentation with clear resolutions for common issues, including timeout and token configuration problems. - Updated tests README to provide quick links and best practices for writing robust tests.
2026-03-24 01:47:22 +00:00
parent 7d986f2821
commit ca477c48d4
52 changed files with 983 additions and 198 deletions
--- a/docs/testing/FAILURE_DIAGNOSIS_GUIDE.md
+++ b/docs/testing/FAILURE_DIAGNOSIS_GUIDE.md
@@ -5,6 +5,7 @@ This guide explains how to use the comprehensive debugging infrastructure to dia
 ## Quick Access Tools

 ### 1. **Playwright HTML Report** (Visual Analysis)
+
 ```bash
 # When tests complete, open the report
 npx playwright show-report
@@ -14,6 +15,7 @@ npx playwright show-report --port 9323
 ```

 **What to look for:**
+
 - Click on each failed test
 - View the trace timeline (shows each action, network request, assertion)
 - Check the video recording to see exactly what went wrong
@@ -21,30 +23,35 @@ npx playwright show-report --port 9323
 - Check browser console logs

 ### 2. **Debug Logger CSV Export** (Network Analysis)
+
 ```bash
 # After tests complete, check for network logs in test-results
 find test-results -name "*.csv" -type f
 ```

 **What to look for:**
+
 - HTTP requests that failed or timed out
 - Slow network operations (>1000ms)
 - Authentication failures (401/403)
 - API response errors

 ### 3. **Trace Files** (Step-by-Step Replay)
+
 ```bash
 # View detailed trace for a failed test
 npx playwright show-trace test-results/[test-name]/trace.zip
 ```

 **Features:**
+
 - Pause and step through each action
 - Inspect DOM at any point
 - Review network timing
 - Check locator matching

 ### 4. **Video Recordings** (Visual Feedback Loop)
+
 - Located in: `test-results/.playwright-artifacts-1/`
 - Map filenames to test names in Playwright report
 - Watch to understand timing and UI state when failure occurred
@@ -54,24 +61,28 @@ npx playwright show-trace test-results/[test-name]/trace.zip
 Based on the summary showing "other" category failures, these issues likely fall into:

 ### Category A: Timing/Flakiness Issues
+
 - Tests intermittently fail due to timeouts
 - Elements not appearing in expected timeframe
 - **Diagnosis**: Check videos for loading spinners, network delays
 - **Fix**: Increase timeout or add wait for specific condition

 ### Category B: Locator Issues
+
 - Selectors matching wrong elements or multiple elements
 - Elements appearing in different UI states
 - **Diagnosis**: Check traces to see selector matching logic
 - **Fix**: Make selectors more specific or use role-based locators

 ### Category C: State/Data Issues
+
 - Form data not persisting
 - Navigation not working correctly
 - **Diagnosis**: Check network logs for API failures
 - **Fix**: Add wait for API completion, verify mock data

 ### Category D: Accessibility/Keyboard Navigation
+
 - Keyboard events not triggering actions
 - Focus not moving as expected
 - **Diagnosis**: Review traces for keyboard action handling
@@ -79,7 +90,7 @@ Based on the summary showing "other" category failures, these issues likely fall

 ## Step-by-Step Failure Analysis Process

-### For Each Failed Test:
+### For Each Failed Test

 1. **Get Test Name**
   - Open Playwright report
@@ -87,9 +98,11 @@ Based on the summary showing "other" category failures, these issues likely fall
   - Note the test file + test name

 2. **View the Trace**
+
   ```bash
   npx playwright show-trace test-results/[test-name-hash]/trace.zip
   ```
+
   - Go through each step
   - Note which step failed and why
   - Check the actual error message
@@ -129,60 +142,75 @@ Our debug logger outputs structured messages like:
 ## Common Failure Patterns & Solutions

 ### Pattern 1: "Timeout waiting for locator"
+
 **Cause**: Element not appearing within timeout
 **Diagnosis**:
+
 - Check video - is the page still loading?
 - Check network tab - any pending requests?
 - Check DOM snapshot - does element exist but hidden?

 **Solution**:
+
 - Add `await page.waitForLoadState('networkidle')`
 - Use more robust locators (role-based instead of ID)
 - Increase timeout if it's a legitimate slow operation

 ### Pattern 2: "Assertion failed: expect(locator).toBeDisabled()"
+
 **Cause**: Button not in expected state
 **Diagnosis**:
+
 - Check trace - what's the button's actual state?
 - Check console - any JS errors?
 - Check network - is a form submission in progress?

 **Solution**:
+
 - Add explicit wait: `await expect(button).toBeDisabled({timeout: 10000})`
 - Wait for preceding action: `await page.getByRole('button').click(); await page.waitForLoadState()`
 - Check form library state

 ### Pattern 3: "Strict mode violation: multiple elements found"
+
 **Cause**: Selector matches 2+ elements
 **Diagnosis**:
+
 - Check trace DOM snapshots - count matching elements
 - Check test file - is selector too broad?

 **Solution**:
+
 - Scope to container: `page.getByRole('dialog').getByRole('button', {name: 'Save'})`
 - Use .first() or .nth(0): `getByRole('button').first()`
 - Make selector more specific

 ### Pattern 4: "Element not found by getByRole(...)"
+
 **Cause**: Accessibility attributes missing
 **Diagnosis**:
+
 - Check DOM in trace - what tags/attributes exist?
 - Is it missing role attribute?
 - Is aria-label/aria-labelledby correct?

 **Solution**:
+
 - Add role attribute to element
 - Add accessible name (aria-label, aria-labelledby, or text content)
 - Use more forgiving selectors temporarily to confirm

 ### Pattern 5: "Test timed out after 30000ms"
+
 **Cause**: Test execution exceeded timeout
 **Diagnosis**:
+
 - Check videos - where did it hang?
 - Check traces - last action before timeout?
 - Check network - any concurrent long-running requests?

 **Solution**:
+
 - Break test into smaller steps
 - Add explicit waits between actions
 - Check for infinite loops or blocking operations
@@ -208,6 +236,7 @@ other        │ ██░░░░░░░░░░░░░░░░░░ 2/
 ```

 **Key insights:**
+
 - **Timeout**: Look for network delays or missing waits
 - **Assertion**: Check state management and form validation
 - **Locator**: Focus on selector robustness
@@ -216,6 +245,7 @@ other        │ ██░░░░░░░░░░░░░░░░░░ 2/
 ## Advanced Debugging Techniques

 ### 1. Run Single Failed Test Locally
+
 ```bash
 # Get exact test name from report, then:
 npx playwright test --grep "should show user status badges"
@@ -225,6 +255,7 @@ DEBUG=charon:* npx playwright test --grep "should show user status badges" --deb
 ```

 ### 2. Inspect Network Logs CSV
+
 ```bash
 # Convert CSV to readable format
 column -t -s',' tests/network-logs.csv | less
@@ -233,16 +264,19 @@ column -t -s',' tests/network-logs.csv | less
 ```

 ### 3. Compare Videos Side-by-Side
+
 - Download videos from test-results/.playwright-artifacts-1/
 - Open in VLC with playlist
 - Play at 2x speed to spot behavior differences

 ### 4. Check Browser Console
+
 - In trace player, click "Console" tab
 - Look for JS errors or warnings
 - Check for 404/500 API responses in network tab

 ### 5. Reproduce Locally with Same Conditions
+
 ```bash
 # Use the exact same seed (if randomization is involved)
 SEED=12345 npx playwright test --grep "failing-test"
@@ -256,6 +290,7 @@ npx playwright test --grep "failing-test" --project=chromium --debug
 If tests pass locally but fail in CI Docker container:

 ### Check Container Logs
+
 ```bash
 # View Docker container output
 docker compose -f .docker/compose/docker-compose.test.yml logs charon
@@ -265,12 +300,14 @@ docker compose logs --tail=50
 ```

 ### Compare Environments
+
 - Docker: Running on 0.0.0.0:8080
- Local: Running on localhost:8080/http://127.0.0.1:8080
+- Local: Running on localhost:8080/<http://127.0.0.1:8080>
 - **Check**: Are there IPv4/IPv6 differences?
 - **Check**: Are there DNS resolution issues?

 ### Port Accessibility
+
 ```bash
 # From inside Docker, check if ports are accessible
 docker exec charon curl -v http://localhost:8080
@@ -281,6 +318,7 @@ docker exec charon curl -v http://localhost:2020
 ## Escalation Path

 ### When to Investigate Code
+
 - Same tests fail consistently (not flaky)
 - Error message points to specific feature
 - Video shows incorrect behavior
@@ -289,12 +327,14 @@ docker exec charon curl -v http://localhost:2020
 **Action**: Fix the code/feature being tested

 ### When to Improve Test
+
 - Tests flaky (fail 1 in 5 times)
 - Timeout errors on slow operations
 - Intermittent locator matching issues
 - **Action**: Add waits, use more robust selectors, increase timeouts

 ### When to Update Test Infrastructure
+
 - Port/networking issues
 - Authentication failures
 - Global setup incomplete