11 KiB
Playwright E2E Green Plan (Single Objective)
Date: 2026-02-15 Owner: Planning Agent Scope: Achieve 100% green Playwright E2E quickly through root-cause fixes and performance/stability optimization.
Objective
Deliver a fully green Playwright E2E suite with deterministic results and controlled runtime by fixing root causes first (not symptom retries).
This file contains one objective only for today’s request.
Requirements (EARS)
- WHEN the E2E environment is invalid or stale, THE SYSTEM SHALL stop execution and rebuild before test reproduction.
- WHEN a failure is reproduced, THE SYSTEM SHALL identify root cause in frontend/backend/helpers before mitigation.
- WHEN synchronization is required, THE SYSTEM SHALL use condition-based waits and deterministic locators rather than fixed sleeps.
- WHEN
tests/core/data-consistency.spec.tsis relevant to the failing path, THE SYSTEM SHALL include it consistently in targeted reproduction, impacted rerun, browser fan-out, and final confirmation. - WHEN all impacted fixes are complete, THE SYSTEM SHALL pass security matrix and full split confirmation without regressions.
- WHEN production code is modified, THE SYSTEM SHALL complete coverage runs before final sign-off.
- WHEN Codecov Patch view reports missing or partial coverage, THE SYSTEM SHALL require 100% patch coverage for modified lines before go-live.
- WHEN patch coverage is below 100%, THE SYSTEM SHALL capture exact missing/partial line ranges and map each range to targeted tests.
- WHEN stability is being validated, THE SYSTEM SHALL use Playwright-native
--repeat-eachconsecutive-pass gates rather than ad hoc loops. - WHEN classifying possible state contamination, THE SYSTEM SHALL run an early single-worker diagnostic branch (
--workers=1) before parallel reruns. - WHEN retries are enabled for instrumentation, THE SYSTEM SHALL enforce
--fail-on-flaky-testsso flaky-pass results do not qualify as green.
Hard-Gated Execution Order (Stop/Go)
Gate 1: Environment Validity and Rebuild Decision
Stop:
charon-e2eunhealthy, missing required env, stale runtime after app/runtime changes.
Go:
- Health checks pass and runtime mode matches test scope.
Order:
- Validate health and required env.
- Prefer runner-managed setup where feasible:
- Use Playwright project-dependency setup flow (for example, setup project + dependent browser projects) instead of external-only setup scripts when equivalent.
- Use
webServerreadiness gating option when applicable so runner controls startup/ready checks.
- Decide rebuild using testing protocol:
- Rebuild required for runtime changes (
backend/**,frontend/**, runtime/Docker inputs). - Reuse container for test-only changes if healthy.
- Rebuild required for runtime changes (
Gate 2: Targeted Shard Reproduction
Stop:
- Failure not reproducible in targeted shard/spec.
Go:
- Failure reproduced with clear signal and trace.
Order:
- Reproduce in smallest shard/spec containing failure.
- Capture trace/artifacts once failure is reproduced.
Gate 3: Root-Cause Fix Loop
Stop:
- Change does not map to verified cause.
- Proposed fix is only timeout/retry inflation.
Go:
- Cause mapped to specific file/function/component.
Loop:
- Classify cause (
SYNC_WAIT,LOCATOR_AMBIGUITY,STATE_LEAK,ENV_ORCHESTRATION,PERF_TIMEOUT). - Run taxonomy-first fastest diagnostic move:
SYNC_WAIT: run failing test with trace + UI mode/timeouts unchanged; verify missing awaited state transition first.LOCATOR_AMBIGUITY: run strict locator count/assertion first (toHaveCount(1)/role+name narrowing) before selector rewrites.STATE_LEAK: run early deterministic branch with--workers=1and same shard/spec to confirm isolation issue.ENV_ORCHESTRATION: verify setup project execution order andwebServerready signal before app-code changes.PERF_TIMEOUT: run smallest repro with unchanged expectations and capture slow step timings before raising timeout.
- Apply root-cause fix.
- Re-run smallest failing scope.
- Repeat until deterministic pass.
Gate 4: Impacted Shard Rerun
Stop:
- Any failure in impacted shard after fix.
Go:
- Impacted shard fully green.
Gate 5: Browser Fan-Out
Stop:
- Any browser fails on impacted scope.
Go:
- Chromium + Firefox + WebKit pass impacted scope.
Order:
- Single-browser deterministic validation first (primary browser baseline).
- Cross-browser fan-out second (Chromium + Firefox + WebKit).
Gate 6: Security Matrix and Full Split Confirmation
Stop:
- Security suites fail or split topology regresses.
Go:
- Security matrix green and full split pipeline green.
Gate 7: Mandatory Coverage and Codecov Patch Triage
Stop:
- Coverage runs not completed for modified production code.
- Codecov Patch view not reviewed after coverage upload.
- Patch coverage for modified lines is < 100%.
- Missing/partial patch line ranges are not documented with mapped targeted tests.
Go:
- Coverage runs completed.
- Codecov Patch coverage for modified lines is exactly 100%.
- All missing/partial ranges are triaged and resolved with targeted tests.
Order:
- Run required coverage suites for modified areas.
- Open Codecov Patch view.
- Copy exact missing/partial modified line ranges.
- Map each range to targeted tests.
- Add/adjust tests and rerun coverage until patch coverage is 100%.
Performance and Stability Thresholds
Runtime Thresholds
- Baseline source: most recent known-good split run on same branch/runtime profile.
- Non-security shard runtime regression allowed: <= 10% per shard.
- Total non-security wall-clock regression allowed: <= 10%.
- Security matrix runtime regression allowed: <= 15%.
Runtime fail:
- Any threshold exceeded without explicit documented acceptance as follow-up.
Stability / Flake Thresholds
- Targeted repaired failure: 3 consecutive passes required.
- Impacted shard: 2 consecutive passes required.
- Browser fan-out: 2 consecutive passes per browser on impacted scope.
- Final full split confirmation: 1 full run with zero retry-required failures.
- Consecutive-pass gates use Playwright-native
--repeat-each. - Retry runs are instrumentation only (small CI retry count) and must be paired with
--fail-on-flaky-tests.
Stability fail:
- Any non-deterministic reappearance inside required consecutive pass window.
- Any flaky-pass classified by Playwright as flaky.
Auth-State Reuse Nuance
- Shared
storageStatereuse is acceptable only when server-side state mutation conflicts are controlled. - If tests mutate shared server-side entities (user/session/settings records), isolate state per test/suite or reset deterministically before reuse.
Root-Cause-First Focus Areas
Orchestration and helpers
tests/global-setup.ts(waitForContainer,emergencySecurityReset)tests/auth.setup.ts(performLoginAndSaveState,resetAdminCredentials)tests/utils/wait-helpers.tstests/utils/ui-helpers.tstests/utils/TestDataManager.ts
Flake-prone suites
tests/core/navigation.spec.tstests/core/proxy-hosts.spec.tstests/core/data-consistency.spec.tstests/settings/user-management.spec.tstests/settings/smtp-settings.spec.tstests/settings/notifications.spec.tstests/tasks/*.spec.ts
UI/component hotspots
frontend/src/components/ProxyHostForm.tsxfrontend/src/pages/ProxyHosts.tsxfrontend/src/pages/UsersPage.tsxfrontend/src/pages/Settings.tsxfrontend/src/pages/Certificates.tsx
Backend integrity path
backend/internal/api/handlers/proxy_host_handler.go(Update)backend/internal/services/proxyhost_service.go(Update, validation paths)backend/internal/models/proxy_host.go
Data-Consistency Spec Policy (Aligned)
tests/core/data-consistency.spec.ts is consistently in scope for this plan:
- Included in targeted reproduction when active failure signal.
- Included in impacted shard reruns after relevant fixes.
- Included in cross-browser fan-out for impacted scope.
- Included in final full split confirmation.
No phase excludes this spec while claiming full-green readiness.
Phased Task Plan
Phase 1: Environment and Reproduction
- Complete Gate 1 and Gate 2.
- Produce failure map with taxonomy and ownership.
Phase 2: Root-Cause Fix Loop
- Complete Gate 3.
- Prioritize helper/contract/product fixes over retries/timeouts.
Phase 3: Impacted Validation
- Complete Gate 4 and Gate 5.
- Enforce consecutive-pass thresholds.
Phase 4: Full Confirmation
- Complete Gate 6.
- Complete Gate 7.
- Verify full-green state for split topology and security matrix.
Phase 5: Patch Coverage Triage Closure
- This phase runs after implementation changes and after coverage is executed/uploaded.
- Capture exact missing/partial line ranges from Codecov Patch view.
- Maintain line-range-to-test mapping until each range is covered.
- Re-run only targeted suites first, then required final confirmation suite.
Status convention for this phase:
Pending Execution: acceptable before implementation and before coverage + Codecov Patch review are run.Closed: required at completion for every triage entry.
Codecov Patch Triage Table (Mandatory)
Note: Codecov Patch line range and related placeholders below are execution artifacts. Populate them only in Phase 5 after running coverage and opening the Codecov Patch view.
| Codecov Patch line range | File | Coverage status | Targeted test(s) to add/run | Owner | Status |
|---|---|---|---|---|---|
<paste exact range from Codecov> |
<path> |
Missing/Partial | <test file + test name> |
<name> |
Pending Execution |
Critical Path Exclusions
Removed from critical path for this request:
.gitignoreauditcodecov.ymlaudit.dockerignoreauditDockerfileaudit
These are non-blocking unless directly proven as root cause of active E2E failures.
Non-Blocking Follow-Up (Optional)
- Config hygiene review for
.gitignore,codecov.yml,.dockerignore,Dockerfile. - Additional CI/runtime optimization outside current pass criteria.
Definition of Done
- Hard-gated execution order completed without skipped stop/go checks.
- Active failures fixed via verified root causes.
tests/core/data-consistency.spec.tshandled consistently per policy.- Impacted shards green with required consecutive passes.
- Browser fan-out green with required consecutive passes.
- Security matrix and full split confirmation green.
- Runtime/stability thresholds satisfied, or explicit follow-up recorded for approved exceptions.
- Coverage completion is documented for all modified production code.
- Codecov Patch coverage for modified lines is 100%.
- Codecov Patch missing/partial line ranges are explicitly captured and each is mapped to targeted tests, with all entries closed.
Policy-Bound Caveat (Coverage)
- Repository policy remains a hard gate: modified production lines require 100% Codecov patch coverage.
- No exceptions are allowed unless repository policy itself is changed.
- Filler tests are not acceptable; each missing/partial patch line must be covered by behavior-linked targeted tests tied to the affected scenario.