Charon/docs/plans/current_spec.md

## Playwright E2E Green Plan (Single Objective)

Date: 2026-02-15
Owner: Planning Agent
Scope: Achieve 100% green Playwright E2E quickly through root-cause fixes and performance/stability optimization.

---

## Objective

Deliver a fully green Playwright E2E suite with deterministic results and controlled runtime by fixing root causes first (not symptom retries).

This file contains one objective only for today’s request.

---

## Requirements (EARS)

- WHEN the E2E environment is invalid or stale, THE SYSTEM SHALL stop execution and rebuild before test reproduction.
- WHEN a failure is reproduced, THE SYSTEM SHALL identify root cause in frontend/backend/helpers before mitigation.
- WHEN synchronization is required, THE SYSTEM SHALL use condition-based waits and deterministic locators rather than fixed sleeps.
- WHEN `tests/core/data-consistency.spec.ts` is relevant to the failing path, THE SYSTEM SHALL include it consistently in targeted reproduction, impacted rerun, browser fan-out, and final confirmation.
- WHEN all impacted fixes are complete, THE SYSTEM SHALL pass security matrix and full split confirmation without regressions.
- WHEN production code is modified, THE SYSTEM SHALL complete coverage runs before final sign-off.
- WHEN Codecov Patch view reports missing or partial coverage, THE SYSTEM SHALL require 100% patch coverage for modified lines before go-live.
- WHEN patch coverage is below 100%, THE SYSTEM SHALL capture exact missing/partial line ranges and map each range to targeted tests.
- WHEN stability is being validated, THE SYSTEM SHALL use Playwright-native `--repeat-each` consecutive-pass gates rather than ad hoc loops.
- WHEN classifying possible state contamination, THE SYSTEM SHALL run an early single-worker diagnostic branch (`--workers=1`) before parallel reruns.
- WHEN retries are enabled for instrumentation, THE SYSTEM SHALL enforce `--fail-on-flaky-tests` so flaky-pass results do not qualify as green.

---

## Hard-Gated Execution Order (Stop/Go)

### Gate 1: Environment Validity and Rebuild Decision

Stop:
- `charon-e2e` unhealthy, missing required env, stale runtime after app/runtime changes.

Go:
- Health checks pass and runtime mode matches test scope.

Order:
1. Validate health and required env.
2. Prefer runner-managed setup where feasible:
   - Use Playwright project-dependency setup flow (for example, setup project + dependent browser projects) instead of external-only setup scripts when equivalent.
   - Use `webServer` readiness gating option when applicable so runner controls startup/ready checks.
3. Decide rebuild using testing protocol:
   - Rebuild required for runtime changes (`backend/**`, `frontend/**`, runtime/Docker inputs).
   - Reuse container for test-only changes if healthy.

### Gate 2: Targeted Shard Reproduction

Stop:
- Failure not reproducible in targeted shard/spec.

Go:
- Failure reproduced with clear signal and trace.

Order:
1. Reproduce in smallest shard/spec containing failure.
2. Capture trace/artifacts once failure is reproduced.

### Gate 3: Root-Cause Fix Loop

Stop:
- Change does not map to verified cause.
- Proposed fix is only timeout/retry inflation.

Go:
- Cause mapped to specific file/function/component.

Loop:
1. Classify cause (`SYNC_WAIT`, `LOCATOR_AMBIGUITY`, `STATE_LEAK`, `ENV_ORCHESTRATION`, `PERF_TIMEOUT`).
2. Run taxonomy-first fastest diagnostic move:
   - `SYNC_WAIT`: run failing test with trace + UI mode/timeouts unchanged; verify missing awaited state transition first.
   - `LOCATOR_AMBIGUITY`: run strict locator count/assertion first (`toHaveCount(1)`/role+name narrowing) before selector rewrites.
   - `STATE_LEAK`: run early deterministic branch with `--workers=1` and same shard/spec to confirm isolation issue.
   - `ENV_ORCHESTRATION`: verify setup project execution order and `webServer` ready signal before app-code changes.
   - `PERF_TIMEOUT`: run smallest repro with unchanged expectations and capture slow step timings before raising timeout.
3. Apply root-cause fix.
4. Re-run smallest failing scope.
5. Repeat until deterministic pass.

### Gate 4: Impacted Shard Rerun

Stop:
- Any failure in impacted shard after fix.

Go:
- Impacted shard fully green.

### Gate 5: Browser Fan-Out

Stop:
- Any browser fails on impacted scope.

Go:
- Chromium + Firefox + WebKit pass impacted scope.

Order:
1. Single-browser deterministic validation first (primary browser baseline).
2. Cross-browser fan-out second (Chromium + Firefox + WebKit).

### Gate 6: Security Matrix and Full Split Confirmation

Stop:
- Security suites fail or split topology regresses.

Go:
- Security matrix green and full split pipeline green.

### Gate 7: Mandatory Coverage and Codecov Patch Triage

Stop:
- Coverage runs not completed for modified production code.
- Codecov Patch view not reviewed after coverage upload.
- Patch coverage for modified lines is < 100%.
- Missing/partial patch line ranges are not documented with mapped targeted tests.

Go:
- Coverage runs completed.
- Codecov Patch coverage for modified lines is exactly 100%.
- All missing/partial ranges are triaged and resolved with targeted tests.

Order:
1. Run required coverage suites for modified areas.
2. Open Codecov Patch view.
3. Copy exact missing/partial modified line ranges.
4. Map each range to targeted tests.
5. Add/adjust tests and rerun coverage until patch coverage is 100%.

---

## Performance and Stability Thresholds

### Runtime Thresholds

- Baseline source: most recent known-good split run on same branch/runtime profile.
- Non-security shard runtime regression allowed: <= 10% per shard.
- Total non-security wall-clock regression allowed: <= 10%.
- Security matrix runtime regression allowed: <= 15%.

Runtime fail:
- Any threshold exceeded without explicit documented acceptance as follow-up.

### Stability / Flake Thresholds

- Targeted repaired failure: 3 consecutive passes required.
- Impacted shard: 2 consecutive passes required.
- Browser fan-out: 2 consecutive passes per browser on impacted scope.
- Final full split confirmation: 1 full run with zero retry-required failures.
- Consecutive-pass gates use Playwright-native `--repeat-each`.
- Retry runs are instrumentation only (small CI retry count) and must be paired with `--fail-on-flaky-tests`.

Stability fail:
- Any non-deterministic reappearance inside required consecutive pass window.
- Any flaky-pass classified by Playwright as flaky.

### Auth-State Reuse Nuance

- Shared `storageState` reuse is acceptable only when server-side state mutation conflicts are controlled.
- If tests mutate shared server-side entities (user/session/settings records), isolate state per test/suite or reset deterministically before reuse.

---

## Root-Cause-First Focus Areas

### Orchestration and helpers
- `tests/global-setup.ts` (`waitForContainer`, `emergencySecurityReset`)
- `tests/auth.setup.ts` (`performLoginAndSaveState`, `resetAdminCredentials`)
- `tests/utils/wait-helpers.ts`
- `tests/utils/ui-helpers.ts`
- `tests/utils/TestDataManager.ts`

### Flake-prone suites
- `tests/core/navigation.spec.ts`
- `tests/core/proxy-hosts.spec.ts`
- `tests/core/data-consistency.spec.ts`
- `tests/settings/user-management.spec.ts`
- `tests/settings/smtp-settings.spec.ts`
- `tests/settings/notifications.spec.ts`
- `tests/tasks/*.spec.ts`

### UI/component hotspots
- `frontend/src/components/ProxyHostForm.tsx`
- `frontend/src/pages/ProxyHosts.tsx`
- `frontend/src/pages/UsersPage.tsx`
- `frontend/src/pages/Settings.tsx`
- `frontend/src/pages/Certificates.tsx`

### Backend integrity path
- `backend/internal/api/handlers/proxy_host_handler.go` (`Update`)
- `backend/internal/services/proxyhost_service.go` (`Update`, validation paths)
- `backend/internal/models/proxy_host.go`

---

## Data-Consistency Spec Policy (Aligned)

`tests/core/data-consistency.spec.ts` is consistently in scope for this plan:

- Included in targeted reproduction when active failure signal.
- Included in impacted shard reruns after relevant fixes.
- Included in cross-browser fan-out for impacted scope.
- Included in final full split confirmation.

No phase excludes this spec while claiming full-green readiness.

---

## Phased Task Plan

### Phase 1: Environment and Reproduction
- Complete Gate 1 and Gate 2.
- Produce failure map with taxonomy and ownership.

### Phase 2: Root-Cause Fix Loop
- Complete Gate 3.
- Prioritize helper/contract/product fixes over retries/timeouts.

### Phase 3: Impacted Validation
- Complete Gate 4 and Gate 5.
- Enforce consecutive-pass thresholds.

### Phase 4: Full Confirmation
- Complete Gate 6.
- Complete Gate 7.
- Verify full-green state for split topology and security matrix.

### Phase 5: Patch Coverage Triage Closure
- This phase runs after implementation changes and after coverage is executed/uploaded.
- Capture exact missing/partial line ranges from Codecov Patch view.
- Maintain line-range-to-test mapping until each range is covered.
- Re-run only targeted suites first, then required final confirmation suite.

Status convention for this phase:
- `Pending Execution`: acceptable before implementation and before coverage + Codecov Patch review are run.
- `Closed`: required at completion for every triage entry.

#### Codecov Patch Triage Table (Mandatory)

Note: `Codecov Patch line range` and related placeholders below are execution artifacts. Populate them only in Phase 5 after running coverage and opening the Codecov Patch view.

| Codecov Patch line range | File | Coverage status | Targeted test(s) to add/run | Owner | Status |
| --- | --- | --- | --- | --- | --- |
| `<paste exact range from Codecov>` | `<path>` | Missing/Partial | `<test file + test name>` | `<name>` | Pending Execution |

---

## Critical Path Exclusions

Removed from critical path for this request:
- `.gitignore` audit
- `codecov.yml` audit
- `.dockerignore` audit
- `Dockerfile` audit

These are non-blocking unless directly proven as root cause of active E2E failures.

---

## Non-Blocking Follow-Up (Optional)

- Config hygiene review for `.gitignore`, `codecov.yml`, `.dockerignore`, `Dockerfile`.
- Additional CI/runtime optimization outside current pass criteria.

---

## Definition of Done

1. Hard-gated execution order completed without skipped stop/go checks.
2. Active failures fixed via verified root causes.
3. `tests/core/data-consistency.spec.ts` handled consistently per policy.
4. Impacted shards green with required consecutive passes.
5. Browser fan-out green with required consecutive passes.
6. Security matrix and full split confirmation green.
7. Runtime/stability thresholds satisfied, or explicit follow-up recorded for approved exceptions.
8. Coverage completion is documented for all modified production code.
9. Codecov Patch coverage for modified lines is 100%.
10. Codecov Patch missing/partial line ranges are explicitly captured and each is mapped to targeted tests, with all entries closed.

---

## Policy-Bound Caveat (Coverage)

- Repository policy remains a hard gate: modified production lines require 100% Codecov patch coverage.
- No exceptions are allowed unless repository policy itself is changed.
- Filler tests are not acceptable; each missing/partial patch line must be covered by behavior-linked targeted tests tied to the affected scenario.