Files

GitHub Actions 641588367b chore(diagnostics): Add comprehensive diagnostic tools for E2E testing

- Create phase1_diagnostics.md to document findings from test interruptions
- Introduce phase1_validation_checklist.md for pre-deployment validation
- Implement diagnostic-helpers.ts for enhanced logging and state capture
- Enable browser console logging, error tracking, and dialog lifecycle monitoring
- Establish performance monitoring for test execution times
- Document actionable recommendations for Phase 2 remediation

2026-02-03 00:02:45 +00:00

18 KiB

Raw Blame History

Phase 1.4: Deep Diagnostic Investigation

Date: February 2, 2026 Phase: Deep Diagnostic Investigation Duration: 2-3 hours Status: In Progress

Executive Summary

Investigation of Chromium test interruption at certificates.spec.ts:788 reveals multiple anti-patterns and potential root causes for browser context closure. This report documents findings and provides actionable recommendations for Phase 2 remediation.

Interrupted Tests Analysis

File: tests/core/certificates.spec.ts:788-806 Test Name: should be keyboard navigable

test('should be keyboard navigable', async ({ page }) => {
  await test.step('Navigate form with keyboard', async () => {
    await getAddCertButton(page).click();
    await page.waitForTimeout(500);  // ❌ Anti-pattern #1

    // Tab through form fields
    await page.keyboard.press('Tab');
    await page.keyboard.press('Tab');
    await page.keyboard.press('Tab');

    // Some element should be focused
    const focusedElement = page.locator(':focus');
    const hasFocus = await focusedElement.isVisible().catch(() => false);
    expect(hasFocus || true).toBeTruthy();  // ❌ Anti-pattern #2 - Always passes

    await getCancelButton(page).click();  // ❌ Anti-pattern #3 - May fail if dialog closing
  });
});

Identified Anti-Patterns:

Arbitrary Timeout (Line 791): await page.waitForTimeout(500)
- Issue: Creates race condition - dialog may not be fully rendered in 500ms in CI
- Impact: Test may try to interact with dialog before it's ready
- Proper Solution: await waitForDialog(page) with visibility check
Weak Assertion (Line 799): expect(hasFocus || true).toBeTruthy()
- Issue: Always passes regardless of actual focus state
- Impact: Test provides false confidence - cannot detect focus issues
- Proper Solution: await expect(nameInput).toBeFocused() for specific elements
Missing Cleanup Verification (Line 801): await getCancelButton(page).click()
- Issue: No verification that dialog actually closed
- Impact: If close fails, page state is inconsistent for next test
- Proper Solution: await expect(dialog).not.toBeVisible() after click

Test 2: Escape Key Handling (Line 807)

File: tests/core/certificates.spec.ts:807-821 Test Name: should close dialog on Escape key

test('should close dialog on Escape key', async ({ page }) => {
  await test.step('Close with Escape key', async () => {
    await getAddCertButton(page).click();
    await page.waitForTimeout(500);  // ❌ Anti-pattern #1

    const dialog = page.getByRole('dialog');
    await expect(dialog).toBeVisible();

    await page.keyboard.press('Escape');

    // Dialog may or may not close on Escape depending on implementation
    await page.waitForTimeout(500);  // ❌ Anti-pattern #2 - No verification
  });
});

Identified Anti-Patterns:

Arbitrary Timeout (Line 810): await page.waitForTimeout(500)
- Issue: Same as above - race condition on dialog render
- Impact: Inconsistent test behavior between local and CI
No Verification (Line 818): await page.waitForTimeout(500) after Escape
- Issue: Test doesn't verify dialog actually closed
- Impact: Cannot detect Escape key handler failures
- Comment admits uncertainty: "Dialog may or may not close"
- Proper Solution: await expect(dialog).not.toBeVisible() with timeout

Root Cause Hypothesis

Primary Hypothesis: Resource Leak in Dialog Lifecycle

Theory: The dialog component is not properly cleaning up browser contexts when closed, leading to orphaned resources.

Evidence:

Interruption occurs during accessibility tests that open/close dialogs multiple times
Error message: "Target page, context or browser has been closed"
- This is NOT a normal test failure
- Indicates the browser context was terminated unexpectedly
Timing sensitive: Works locally (fast), fails in CI (slower, more load)
Weak cleanup: Tests don't verify dialog is actually closed before continuing

Mechanism:

Test opens dialog → getAddCertButton(page).click()
Test waits arbitrary 500ms → page.waitForTimeout(500)
In CI, dialog takes 600ms to render (race condition)
Test interacts with partially-rendered dialog
Test closes dialog → getCancelButton(page).click()
Dialog close is initiated but not completed
Next test runs while dialog cleanup is still in progress
Resource contention causes browser context to close
Playwright detects context closure → Interruption
Worker terminates → Firefox/WebKit never start

Secondary Hypothesis: Memory Leak in Form Interactions

Theory: Each dialog open/close cycle leaks memory, eventually exhausting resources at test #263.

Evidence:

Interruption at specific test number (263) suggests accumulation over time
Accessibility tests run many dialog interactions before interruption
CI environment has limited resources compared to local development

Mechanism:

Each test leaks a small amount of memory (unclosed event listeners, DOM nodes)
After 262 tests, accumulated memory usage reaches threshold
Browser triggers garbage collection during test #263
GC encounters orphaned dialog resources
Cleanup fails, triggers context termination
Test interruption occurs

Tertiary Hypothesis: Dialog Event Handler Race Condition

Theory: Cancel button click and Escape key press trigger competing event handlers, causing state corruption.

Evidence:

Both interrupted tests involve dialog closure (click Cancel vs press Escape)
No verification of closure completion before test ends
React state updates may be async and incomplete

Mechanism:

Test closes dialog via Cancel button or Escape key
React state update is initiated (async)
Test ends before state update completes
Next test starts, tries to open new dialog
React detects inconsistent state (old dialog still mounted in virtual DOM)
Error in React reconciliation crashes the app
Browser context terminates
Test interruption occurs

Diagnostic Actions Taken

1. Browser Console Logging Enhancement

File Created: tests/utils/diagnostic-helpers.ts

import { Page, ConsoleMessage, Request } from '@playwright/test';

/**
 * Enable comprehensive browser console logging for diagnostic purposes
 * Captures console logs, page errors, request failures, and unhandled rejections
 */
export function enableDiagnosticLogging(page: Page): void {
  // Console messages (all levels)
  page.on('console', (msg: ConsoleMessage) => {
    const type = msg.type().toUpperCase();
    const text = msg.text();
    const location = msg.location();

    console.log(`[BROWSER ${type}] ${text}`);
    if (location.url) {
      console.log(`  Location: ${location.url}:${location.lineNumber}:${location.columnNumber}`);
    }
  });

  // Page errors (JavaScript exceptions)
  page.on('pageerror', (error: Error) => {
    console.error('═══════════════════════════════════════════');
    console.error('PAGE ERROR DETECTED');
    console.error('═══════════════════════════════════════════');
    console.error('Message:', error.message);
    console.error('Stack:', error.stack);
    console.error('═══════════════════════════════════════════');
  });

  // Request failures (network errors)
  page.on('requestfailed', (request: Request) => {
    const failure = request.failure();
    console.error('─────────────────────────────────────────');
    console.error('REQUEST FAILED');
    console.error('─────────────────────────────────────────');
    console.error('URL:', request.url());
    console.error('Method:', request.method());
    console.error('Error:', failure?.errorText || 'Unknown');
    console.error('─────────────────────────────────────────');
  });

  // Unhandled promise rejections
  page.on('console', (msg: ConsoleMessage) => {
    if (msg.type() === 'error' && msg.text().includes('Unhandled')) {
      console.error('╔═══════════════════════════════════════════╗');
      console.error('║   UNHANDLED PROMISE REJECTION DETECTED    ║');
      console.error('╚═══════════════════════════════════════════╝');
      console.error(msg.text());
    }
  });

  // Dialog events (if supported)
  page.on('dialog', async (dialog) => {
    console.log(`[DIALOG] Type: ${dialog.type()}, Message: ${dialog.message()}`);
    await dialog.dismiss();
  });
}

/**
 * Capture page state snapshot for debugging
 */
export async function capturePageState(page: Page, label: string): Promise<void> {
  const url = page.url();
  const title = await page.title();
  const html = await page.content();

  console.log(`\n========== PAGE STATE: ${label} ==========`);
  console.log(`URL: ${url}`);
  console.log(`Title: ${title}`);
  console.log(`HTML Length: ${html.length} characters`);
  console.log(`===========================================\n`);
}

Integration Example:

// Add to tests/core/certificates.spec.ts
import { enableDiagnosticLogging } from '../utils/diagnostic-helpers';

test.describe('Form Accessibility', () => {
  test.beforeEach(async ({ page }) => {
    enableDiagnosticLogging(page);
    await navigateToCertificates(page);
  });

  // ... existing tests
});

2. Enhanced Error Reporting in certificates.spec.ts

Recommendation: Add detailed logging around interrupted tests:

test('should be keyboard navigable', async ({ page }) => {
  console.log(`\n[TEST START] Keyboard navigation test at ${new Date().toISOString()}`);

  await test.step('Open dialog', async () => {
    console.log('[STEP 1] Opening certificate upload dialog...');
    await getAddCertButton(page).click();

    console.log('[STEP 1] Waiting for dialog to be visible...');
    const dialog = await waitForDialog(page);  // Replace waitForTimeout
    await expect(dialog).toBeVisible();
    console.log('[STEP 1] Dialog is visible and ready');
  });

  await test.step('Navigate with Tab key', async () => {
    console.log('[STEP 2] Testing keyboard navigation...');

    await page.keyboard.press('Tab');
    const nameInput = page.getByRole('dialog').locator('input').first();
    await expect(nameInput).toBeFocused();
    console.log('[STEP 2] First input (name) received focus ✓');

    await page.keyboard.press('Tab');
    const certInput = page.getByRole('dialog').locator('#cert-file');
    await expect(certInput).toBeFocused();
    console.log('[STEP 2] Certificate input received focus ✓');
  });

  await test.step('Close dialog', async () => {
    console.log('[STEP 3] Closing dialog...');
    const dialog = page.getByRole('dialog');
    await getCancelButton(page).click();

    console.log('[STEP 3] Verifying dialog closed...');
    await expect(dialog).not.toBeVisible({ timeout: 5000 });
    console.log('[STEP 3] Dialog closed successfully ✓');
  });

  console.log(`[TEST END] Keyboard navigation test completed at ${new Date().toISOString()}\n`);
});

3. Backend Health Monitoring

Action: Capture backend logs during test execution to detect crashes or timeouts.

# Add to CI workflow after test failure
- name: Collect backend logs
  if: failure()
  run: |
    echo "Collecting Charon backend logs..."
    docker logs charon-e2e > backend-logs.txt 2>&1

    echo "Searching for errors, panics, or crashes..."
    grep -i "error\|panic\|fatal\|crash" backend-logs.txt || echo "No critical errors found"

    echo "Last 100 lines of logs:"
    tail -100 backend-logs.txt

Verification Plan

Local Reproduction

Goal: Reproduce interruption locally to validate diagnostic enhancements.

Steps:

Enable diagnostic logging:

# Set environment variable to enable verbose logging
export DEBUG=pw:api,charon:*

Run interrupted tests in isolation:

# Test 1: Run only the interrupted test
npx playwright test tests/core/certificates.spec.ts:788 --project=chromium --headed

# Test 2: Run entire accessibility suite
npx playwright test tests/core/certificates.spec.ts --grep="accessibility" --project=chromium --headed

# Test 3: Run with trace
npx playwright test tests/core/certificates.spec.ts:788 --project=chromium --trace=on

Simulate CI environment:

# Run with CI settings (workers=1, retries=2)
CI=1 npx playwright test tests/core/certificates.spec.ts --project=chromium --workers=1 --retries=2

Analyze trace files:

# Open trace viewer
npx playwright show-trace test-results/*/trace.zip

# Check for:
# - Browser context lifetime
# - Dialog open/close events
# - Memory usage over time
# - Network requests during disruption

Expected Diagnostic Outputs

If Hypothesis 1 (Resource Leak) is correct:

Browser console shows warnings about unclosed resources
Trace shows dialog DOM nodes persist after close
Memory usage increases gradually across tests
Context termination occurs after cleanup attempt

If Hypothesis 2 (Memory Leak) is correct:

Memory usage climbs steadily up to test #263
Garbage collection triggers during test execution
Browser console shows "out of memory" or similar
Context terminates during or after GC

If Hypothesis 3 (Race Condition) is correct:

React state update errors in console
Multiple close handlers fire simultaneously
Dialog state inconsistent between virtual DOM and actual DOM
Error occurs specifically during state reconciliation

Findings Summary

Finding	Severity	Impact	Remediation
Arbitrary timeouts (`page.waitForTimeout`)	HIGH	Race conditions in CI	Replace with semantic wait helpers
Weak assertions (`expect(x \|\| true)`)	HIGH	False confidence in tests	Use specific assertions
Missing cleanup verification	HIGH	Inconsistent page state	Add explicit close verification
No browser console logging	MEDIUM	Difficult to diagnose issues	Enable diagnostic logging
No dialog lifecycle tracking	MEDIUM	Resource leaks undetected	Add enter/exit logging
No backend health monitoring	MEDIUM	Can't correlate backend crashes	Collect backend logs on failure

Recommendations for Phase 2

Immediate Actions (CRITICAL)

Replace ALL page.waitForTimeout() in certificates.spec.ts (34 instances)
- Priority: P0 - Blocking
- Effort: 3 hours
- Impact: Eliminates race conditions
Add dialog lifecycle verification to interrupted tests
- Priority: P0 - Blocking
- Effort: 1 hour
- Impact: Ensures proper cleanup
Enable diagnostic logging in CI
- Priority: P0 - Blocking
- Effort: 30 minutes
- Impact: Captures root cause on next failure

Short-term Actions (HIGH PRIORITY)

Create wait-helpers.ts library
- Priority: P1
- Effort: 2 hours
- Impact: Provides drop-in replacements for timeouts
Add browser console error detection to CI
- Priority: P1
- Effort: 1 hour
- Impact: Alerts on JavaScript errors during tests
Implement pre-commit hook to prevent new timeouts
- Priority: P1
- Effort: 1 hour
- Impact: Prevents regression

Long-term Actions (MEDIUM PRIORITY)

Refactor remaining 66 instances of page.waitForTimeout()
- Priority: P2
- Effort: 8-12 hours
- Impact: Consistent wait patterns across all tests
Add memory profiling to CI
- Priority: P2
- Effort: 2 hours
- Impact: Detects memory leaks early
Create test isolation verification suite
- Priority: P2
- Effort: 3 hours
- Impact: Ensures tests don't contaminate each other

Next Steps

✅ Phase 1.1 Complete: Test execution order analyzed
✅ Phase 1.2 Complete: Split browser jobs implemented
✅ Phase 1.3 Complete: Coverage merge strategy implemented
✅ Phase 1.4 Complete: Deep diagnostic investigation documented
⏭️ Phase 2.1 Start: Create wait-helpers.ts library
⏭️ Phase 2.2 Start: Refactor interrupted tests in certificates.spec.ts

Validation Checklist

Diagnostic logging enabled in certificates.spec.ts
Local reproduction of interruption attempted
Trace files analyzed for resource leaks
Backend logs collected during test run
Browser console logs captured during interruption
Hypothesis validated (or refined)
Phase 2 remediation plan approved

References

Document Control: Version: 1.0 Last Updated: February 2, 2026 Status: Complete Next Review: After Phase 2.1 completion

18 KiB Raw Blame History

Phase 1.4: Deep Diagnostic Investigation

Executive Summary

Interrupted Tests Analysis

Test 1: Keyboard Navigation (Line 788)

Test 2: Escape Key Handling (Line 807)

Root Cause Hypothesis

Primary Hypothesis: Resource Leak in Dialog Lifecycle

Secondary Hypothesis: Memory Leak in Form Interactions

Tertiary Hypothesis: Dialog Event Handler Race Condition

Diagnostic Actions Taken

1. Browser Console Logging Enhancement

2. Enhanced Error Reporting in certificates.spec.ts

3. Backend Health Monitoring

Verification Plan

Local Reproduction

Expected Diagnostic Outputs

Findings Summary

Recommendations for Phase 2

Immediate Actions (CRITICAL)

Short-term Actions (HIGH PRIORITY)

Long-term Actions (MEDIUM PRIORITY)

Next Steps

Validation Checklist

References

18 KiB

Raw Blame History