290 lines
11 KiB
Markdown
290 lines
11 KiB
Markdown
## Playwright E2E Green Plan (Single Objective)
|
||
|
||
Date: 2026-02-15
|
||
Owner: Planning Agent
|
||
Scope: Achieve 100% green Playwright E2E quickly through root-cause fixes and performance/stability optimization.
|
||
|
||
---
|
||
|
||
## Objective
|
||
|
||
Deliver a fully green Playwright E2E suite with deterministic results and controlled runtime by fixing root causes first (not symptom retries).
|
||
|
||
This file contains one objective only for today’s request.
|
||
|
||
---
|
||
|
||
## Requirements (EARS)
|
||
|
||
- WHEN the E2E environment is invalid or stale, THE SYSTEM SHALL stop execution and rebuild before test reproduction.
|
||
- WHEN a failure is reproduced, THE SYSTEM SHALL identify root cause in frontend/backend/helpers before mitigation.
|
||
- WHEN synchronization is required, THE SYSTEM SHALL use condition-based waits and deterministic locators rather than fixed sleeps.
|
||
- WHEN `tests/core/data-consistency.spec.ts` is relevant to the failing path, THE SYSTEM SHALL include it consistently in targeted reproduction, impacted rerun, browser fan-out, and final confirmation.
|
||
- WHEN all impacted fixes are complete, THE SYSTEM SHALL pass security matrix and full split confirmation without regressions.
|
||
- WHEN production code is modified, THE SYSTEM SHALL complete coverage runs before final sign-off.
|
||
- WHEN Codecov Patch view reports missing or partial coverage, THE SYSTEM SHALL require 100% patch coverage for modified lines before go-live.
|
||
- WHEN patch coverage is below 100%, THE SYSTEM SHALL capture exact missing/partial line ranges and map each range to targeted tests.
|
||
- WHEN stability is being validated, THE SYSTEM SHALL use Playwright-native `--repeat-each` consecutive-pass gates rather than ad hoc loops.
|
||
- WHEN classifying possible state contamination, THE SYSTEM SHALL run an early single-worker diagnostic branch (`--workers=1`) before parallel reruns.
|
||
- WHEN retries are enabled for instrumentation, THE SYSTEM SHALL enforce `--fail-on-flaky-tests` so flaky-pass results do not qualify as green.
|
||
|
||
---
|
||
|
||
## Hard-Gated Execution Order (Stop/Go)
|
||
|
||
### Gate 1: Environment Validity and Rebuild Decision
|
||
|
||
Stop:
|
||
- `charon-e2e` unhealthy, missing required env, stale runtime after app/runtime changes.
|
||
|
||
Go:
|
||
- Health checks pass and runtime mode matches test scope.
|
||
|
||
Order:
|
||
1. Validate health and required env.
|
||
2. Prefer runner-managed setup where feasible:
|
||
- Use Playwright project-dependency setup flow (for example, setup project + dependent browser projects) instead of external-only setup scripts when equivalent.
|
||
- Use `webServer` readiness gating option when applicable so runner controls startup/ready checks.
|
||
3. Decide rebuild using testing protocol:
|
||
- Rebuild required for runtime changes (`backend/**`, `frontend/**`, runtime/Docker inputs).
|
||
- Reuse container for test-only changes if healthy.
|
||
|
||
### Gate 2: Targeted Shard Reproduction
|
||
|
||
Stop:
|
||
- Failure not reproducible in targeted shard/spec.
|
||
|
||
Go:
|
||
- Failure reproduced with clear signal and trace.
|
||
|
||
Order:
|
||
1. Reproduce in smallest shard/spec containing failure.
|
||
2. Capture trace/artifacts once failure is reproduced.
|
||
|
||
### Gate 3: Root-Cause Fix Loop
|
||
|
||
Stop:
|
||
- Change does not map to verified cause.
|
||
- Proposed fix is only timeout/retry inflation.
|
||
|
||
Go:
|
||
- Cause mapped to specific file/function/component.
|
||
|
||
Loop:
|
||
1. Classify cause (`SYNC_WAIT`, `LOCATOR_AMBIGUITY`, `STATE_LEAK`, `ENV_ORCHESTRATION`, `PERF_TIMEOUT`).
|
||
2. Run taxonomy-first fastest diagnostic move:
|
||
- `SYNC_WAIT`: run failing test with trace + UI mode/timeouts unchanged; verify missing awaited state transition first.
|
||
- `LOCATOR_AMBIGUITY`: run strict locator count/assertion first (`toHaveCount(1)`/role+name narrowing) before selector rewrites.
|
||
- `STATE_LEAK`: run early deterministic branch with `--workers=1` and same shard/spec to confirm isolation issue.
|
||
- `ENV_ORCHESTRATION`: verify setup project execution order and `webServer` ready signal before app-code changes.
|
||
- `PERF_TIMEOUT`: run smallest repro with unchanged expectations and capture slow step timings before raising timeout.
|
||
3. Apply root-cause fix.
|
||
4. Re-run smallest failing scope.
|
||
5. Repeat until deterministic pass.
|
||
|
||
### Gate 4: Impacted Shard Rerun
|
||
|
||
Stop:
|
||
- Any failure in impacted shard after fix.
|
||
|
||
Go:
|
||
- Impacted shard fully green.
|
||
|
||
### Gate 5: Browser Fan-Out
|
||
|
||
Stop:
|
||
- Any browser fails on impacted scope.
|
||
|
||
Go:
|
||
- Chromium + Firefox + WebKit pass impacted scope.
|
||
|
||
Order:
|
||
1. Single-browser deterministic validation first (primary browser baseline).
|
||
2. Cross-browser fan-out second (Chromium + Firefox + WebKit).
|
||
|
||
### Gate 6: Security Matrix and Full Split Confirmation
|
||
|
||
Stop:
|
||
- Security suites fail or split topology regresses.
|
||
|
||
Go:
|
||
- Security matrix green and full split pipeline green.
|
||
|
||
### Gate 7: Mandatory Coverage and Codecov Patch Triage
|
||
|
||
Stop:
|
||
- Coverage runs not completed for modified production code.
|
||
- Codecov Patch view not reviewed after coverage upload.
|
||
- Patch coverage for modified lines is < 100%.
|
||
- Missing/partial patch line ranges are not documented with mapped targeted tests.
|
||
|
||
Go:
|
||
- Coverage runs completed.
|
||
- Codecov Patch coverage for modified lines is exactly 100%.
|
||
- All missing/partial ranges are triaged and resolved with targeted tests.
|
||
|
||
Order:
|
||
1. Run required coverage suites for modified areas.
|
||
2. Open Codecov Patch view.
|
||
3. Copy exact missing/partial modified line ranges.
|
||
4. Map each range to targeted tests.
|
||
5. Add/adjust tests and rerun coverage until patch coverage is 100%.
|
||
|
||
---
|
||
|
||
## Performance and Stability Thresholds
|
||
|
||
### Runtime Thresholds
|
||
|
||
- Baseline source: most recent known-good split run on same branch/runtime profile.
|
||
- Non-security shard runtime regression allowed: <= 10% per shard.
|
||
- Total non-security wall-clock regression allowed: <= 10%.
|
||
- Security matrix runtime regression allowed: <= 15%.
|
||
|
||
Runtime fail:
|
||
- Any threshold exceeded without explicit documented acceptance as follow-up.
|
||
|
||
### Stability / Flake Thresholds
|
||
|
||
- Targeted repaired failure: 3 consecutive passes required.
|
||
- Impacted shard: 2 consecutive passes required.
|
||
- Browser fan-out: 2 consecutive passes per browser on impacted scope.
|
||
- Final full split confirmation: 1 full run with zero retry-required failures.
|
||
- Consecutive-pass gates use Playwright-native `--repeat-each`.
|
||
- Retry runs are instrumentation only (small CI retry count) and must be paired with `--fail-on-flaky-tests`.
|
||
|
||
Stability fail:
|
||
- Any non-deterministic reappearance inside required consecutive pass window.
|
||
- Any flaky-pass classified by Playwright as flaky.
|
||
|
||
### Auth-State Reuse Nuance
|
||
|
||
- Shared `storageState` reuse is acceptable only when server-side state mutation conflicts are controlled.
|
||
- If tests mutate shared server-side entities (user/session/settings records), isolate state per test/suite or reset deterministically before reuse.
|
||
|
||
---
|
||
|
||
## Root-Cause-First Focus Areas
|
||
|
||
### Orchestration and helpers
|
||
- `tests/global-setup.ts` (`waitForContainer`, `emergencySecurityReset`)
|
||
- `tests/auth.setup.ts` (`performLoginAndSaveState`, `resetAdminCredentials`)
|
||
- `tests/utils/wait-helpers.ts`
|
||
- `tests/utils/ui-helpers.ts`
|
||
- `tests/utils/TestDataManager.ts`
|
||
|
||
### Flake-prone suites
|
||
- `tests/core/navigation.spec.ts`
|
||
- `tests/core/proxy-hosts.spec.ts`
|
||
- `tests/core/data-consistency.spec.ts`
|
||
- `tests/settings/user-management.spec.ts`
|
||
- `tests/settings/smtp-settings.spec.ts`
|
||
- `tests/settings/notifications.spec.ts`
|
||
- `tests/tasks/*.spec.ts`
|
||
|
||
### UI/component hotspots
|
||
- `frontend/src/components/ProxyHostForm.tsx`
|
||
- `frontend/src/pages/ProxyHosts.tsx`
|
||
- `frontend/src/pages/UsersPage.tsx`
|
||
- `frontend/src/pages/Settings.tsx`
|
||
- `frontend/src/pages/Certificates.tsx`
|
||
|
||
### Backend integrity path
|
||
- `backend/internal/api/handlers/proxy_host_handler.go` (`Update`)
|
||
- `backend/internal/services/proxyhost_service.go` (`Update`, validation paths)
|
||
- `backend/internal/models/proxy_host.go`
|
||
|
||
---
|
||
|
||
## Data-Consistency Spec Policy (Aligned)
|
||
|
||
`tests/core/data-consistency.spec.ts` is consistently in scope for this plan:
|
||
|
||
- Included in targeted reproduction when active failure signal.
|
||
- Included in impacted shard reruns after relevant fixes.
|
||
- Included in cross-browser fan-out for impacted scope.
|
||
- Included in final full split confirmation.
|
||
|
||
No phase excludes this spec while claiming full-green readiness.
|
||
|
||
---
|
||
|
||
## Phased Task Plan
|
||
|
||
### Phase 1: Environment and Reproduction
|
||
- Complete Gate 1 and Gate 2.
|
||
- Produce failure map with taxonomy and ownership.
|
||
|
||
### Phase 2: Root-Cause Fix Loop
|
||
- Complete Gate 3.
|
||
- Prioritize helper/contract/product fixes over retries/timeouts.
|
||
|
||
### Phase 3: Impacted Validation
|
||
- Complete Gate 4 and Gate 5.
|
||
- Enforce consecutive-pass thresholds.
|
||
|
||
### Phase 4: Full Confirmation
|
||
- Complete Gate 6.
|
||
- Complete Gate 7.
|
||
- Verify full-green state for split topology and security matrix.
|
||
|
||
### Phase 5: Patch Coverage Triage Closure
|
||
- This phase runs after implementation changes and after coverage is executed/uploaded.
|
||
- Capture exact missing/partial line ranges from Codecov Patch view.
|
||
- Maintain line-range-to-test mapping until each range is covered.
|
||
- Re-run only targeted suites first, then required final confirmation suite.
|
||
|
||
Status convention for this phase:
|
||
- `Pending Execution`: acceptable before implementation and before coverage + Codecov Patch review are run.
|
||
- `Closed`: required at completion for every triage entry.
|
||
|
||
#### Codecov Patch Triage Table (Mandatory)
|
||
|
||
Note: `Codecov Patch line range` and related placeholders below are execution artifacts. Populate them only in Phase 5 after running coverage and opening the Codecov Patch view.
|
||
|
||
| Codecov Patch line range | File | Coverage status | Targeted test(s) to add/run | Owner | Status |
|
||
| --- | --- | --- | --- | --- | --- |
|
||
| `<paste exact range from Codecov>` | `<path>` | Missing/Partial | `<test file + test name>` | `<name>` | Pending Execution |
|
||
|
||
---
|
||
|
||
## Critical Path Exclusions
|
||
|
||
Removed from critical path for this request:
|
||
- `.gitignore` audit
|
||
- `codecov.yml` audit
|
||
- `.dockerignore` audit
|
||
- `Dockerfile` audit
|
||
|
||
These are non-blocking unless directly proven as root cause of active E2E failures.
|
||
|
||
---
|
||
|
||
## Non-Blocking Follow-Up (Optional)
|
||
|
||
- Config hygiene review for `.gitignore`, `codecov.yml`, `.dockerignore`, `Dockerfile`.
|
||
- Additional CI/runtime optimization outside current pass criteria.
|
||
|
||
---
|
||
|
||
## Definition of Done
|
||
|
||
1. Hard-gated execution order completed without skipped stop/go checks.
|
||
2. Active failures fixed via verified root causes.
|
||
3. `tests/core/data-consistency.spec.ts` handled consistently per policy.
|
||
4. Impacted shards green with required consecutive passes.
|
||
5. Browser fan-out green with required consecutive passes.
|
||
6. Security matrix and full split confirmation green.
|
||
7. Runtime/stability thresholds satisfied, or explicit follow-up recorded for approved exceptions.
|
||
8. Coverage completion is documented for all modified production code.
|
||
9. Codecov Patch coverage for modified lines is 100%.
|
||
10. Codecov Patch missing/partial line ranges are explicitly captured and each is mapped to targeted tests, with all entries closed.
|
||
|
||
---
|
||
|
||
## Policy-Bound Caveat (Coverage)
|
||
|
||
- Repository policy remains a hard gate: modified production lines require 100% Codecov patch coverage.
|
||
- No exceptions are allowed unless repository policy itself is changed.
|
||
- Filler tests are not acceptable; each missing/partial patch line must be covered by behavior-linked targeted tests tied to the affected scenario.
|