Files
Charon/docs/plans/current_spec.md

290 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Playwright E2E Green Plan (Single Objective)
Date: 2026-02-15
Owner: Planning Agent
Scope: Achieve 100% green Playwright E2E quickly through root-cause fixes and performance/stability optimization.
---
## Objective
Deliver a fully green Playwright E2E suite with deterministic results and controlled runtime by fixing root causes first (not symptom retries).
This file contains one objective only for todays request.
---
## Requirements (EARS)
- WHEN the E2E environment is invalid or stale, THE SYSTEM SHALL stop execution and rebuild before test reproduction.
- WHEN a failure is reproduced, THE SYSTEM SHALL identify root cause in frontend/backend/helpers before mitigation.
- WHEN synchronization is required, THE SYSTEM SHALL use condition-based waits and deterministic locators rather than fixed sleeps.
- WHEN `tests/core/data-consistency.spec.ts` is relevant to the failing path, THE SYSTEM SHALL include it consistently in targeted reproduction, impacted rerun, browser fan-out, and final confirmation.
- WHEN all impacted fixes are complete, THE SYSTEM SHALL pass security matrix and full split confirmation without regressions.
- WHEN production code is modified, THE SYSTEM SHALL complete coverage runs before final sign-off.
- WHEN Codecov Patch view reports missing or partial coverage, THE SYSTEM SHALL require 100% patch coverage for modified lines before go-live.
- WHEN patch coverage is below 100%, THE SYSTEM SHALL capture exact missing/partial line ranges and map each range to targeted tests.
- WHEN stability is being validated, THE SYSTEM SHALL use Playwright-native `--repeat-each` consecutive-pass gates rather than ad hoc loops.
- WHEN classifying possible state contamination, THE SYSTEM SHALL run an early single-worker diagnostic branch (`--workers=1`) before parallel reruns.
- WHEN retries are enabled for instrumentation, THE SYSTEM SHALL enforce `--fail-on-flaky-tests` so flaky-pass results do not qualify as green.
---
## Hard-Gated Execution Order (Stop/Go)
### Gate 1: Environment Validity and Rebuild Decision
Stop:
- `charon-e2e` unhealthy, missing required env, stale runtime after app/runtime changes.
Go:
- Health checks pass and runtime mode matches test scope.
Order:
1. Validate health and required env.
2. Prefer runner-managed setup where feasible:
- Use Playwright project-dependency setup flow (for example, setup project + dependent browser projects) instead of external-only setup scripts when equivalent.
- Use `webServer` readiness gating option when applicable so runner controls startup/ready checks.
3. Decide rebuild using testing protocol:
- Rebuild required for runtime changes (`backend/**`, `frontend/**`, runtime/Docker inputs).
- Reuse container for test-only changes if healthy.
### Gate 2: Targeted Shard Reproduction
Stop:
- Failure not reproducible in targeted shard/spec.
Go:
- Failure reproduced with clear signal and trace.
Order:
1. Reproduce in smallest shard/spec containing failure.
2. Capture trace/artifacts once failure is reproduced.
### Gate 3: Root-Cause Fix Loop
Stop:
- Change does not map to verified cause.
- Proposed fix is only timeout/retry inflation.
Go:
- Cause mapped to specific file/function/component.
Loop:
1. Classify cause (`SYNC_WAIT`, `LOCATOR_AMBIGUITY`, `STATE_LEAK`, `ENV_ORCHESTRATION`, `PERF_TIMEOUT`).
2. Run taxonomy-first fastest diagnostic move:
- `SYNC_WAIT`: run failing test with trace + UI mode/timeouts unchanged; verify missing awaited state transition first.
- `LOCATOR_AMBIGUITY`: run strict locator count/assertion first (`toHaveCount(1)`/role+name narrowing) before selector rewrites.
- `STATE_LEAK`: run early deterministic branch with `--workers=1` and same shard/spec to confirm isolation issue.
- `ENV_ORCHESTRATION`: verify setup project execution order and `webServer` ready signal before app-code changes.
- `PERF_TIMEOUT`: run smallest repro with unchanged expectations and capture slow step timings before raising timeout.
3. Apply root-cause fix.
4. Re-run smallest failing scope.
5. Repeat until deterministic pass.
### Gate 4: Impacted Shard Rerun
Stop:
- Any failure in impacted shard after fix.
Go:
- Impacted shard fully green.
### Gate 5: Browser Fan-Out
Stop:
- Any browser fails on impacted scope.
Go:
- Chromium + Firefox + WebKit pass impacted scope.
Order:
1. Single-browser deterministic validation first (primary browser baseline).
2. Cross-browser fan-out second (Chromium + Firefox + WebKit).
### Gate 6: Security Matrix and Full Split Confirmation
Stop:
- Security suites fail or split topology regresses.
Go:
- Security matrix green and full split pipeline green.
### Gate 7: Mandatory Coverage and Codecov Patch Triage
Stop:
- Coverage runs not completed for modified production code.
- Codecov Patch view not reviewed after coverage upload.
- Patch coverage for modified lines is < 100%.
- Missing/partial patch line ranges are not documented with mapped targeted tests.
Go:
- Coverage runs completed.
- Codecov Patch coverage for modified lines is exactly 100%.
- All missing/partial ranges are triaged and resolved with targeted tests.
Order:
1. Run required coverage suites for modified areas.
2. Open Codecov Patch view.
3. Copy exact missing/partial modified line ranges.
4. Map each range to targeted tests.
5. Add/adjust tests and rerun coverage until patch coverage is 100%.
---
## Performance and Stability Thresholds
### Runtime Thresholds
- Baseline source: most recent known-good split run on same branch/runtime profile.
- Non-security shard runtime regression allowed: <= 10% per shard.
- Total non-security wall-clock regression allowed: <= 10%.
- Security matrix runtime regression allowed: <= 15%.
Runtime fail:
- Any threshold exceeded without explicit documented acceptance as follow-up.
### Stability / Flake Thresholds
- Targeted repaired failure: 3 consecutive passes required.
- Impacted shard: 2 consecutive passes required.
- Browser fan-out: 2 consecutive passes per browser on impacted scope.
- Final full split confirmation: 1 full run with zero retry-required failures.
- Consecutive-pass gates use Playwright-native `--repeat-each`.
- Retry runs are instrumentation only (small CI retry count) and must be paired with `--fail-on-flaky-tests`.
Stability fail:
- Any non-deterministic reappearance inside required consecutive pass window.
- Any flaky-pass classified by Playwright as flaky.
### Auth-State Reuse Nuance
- Shared `storageState` reuse is acceptable only when server-side state mutation conflicts are controlled.
- If tests mutate shared server-side entities (user/session/settings records), isolate state per test/suite or reset deterministically before reuse.
---
## Root-Cause-First Focus Areas
### Orchestration and helpers
- `tests/global-setup.ts` (`waitForContainer`, `emergencySecurityReset`)
- `tests/auth.setup.ts` (`performLoginAndSaveState`, `resetAdminCredentials`)
- `tests/utils/wait-helpers.ts`
- `tests/utils/ui-helpers.ts`
- `tests/utils/TestDataManager.ts`
### Flake-prone suites
- `tests/core/navigation.spec.ts`
- `tests/core/proxy-hosts.spec.ts`
- `tests/core/data-consistency.spec.ts`
- `tests/settings/user-management.spec.ts`
- `tests/settings/smtp-settings.spec.ts`
- `tests/settings/notifications.spec.ts`
- `tests/tasks/*.spec.ts`
### UI/component hotspots
- `frontend/src/components/ProxyHostForm.tsx`
- `frontend/src/pages/ProxyHosts.tsx`
- `frontend/src/pages/UsersPage.tsx`
- `frontend/src/pages/Settings.tsx`
- `frontend/src/pages/Certificates.tsx`
### Backend integrity path
- `backend/internal/api/handlers/proxy_host_handler.go` (`Update`)
- `backend/internal/services/proxyhost_service.go` (`Update`, validation paths)
- `backend/internal/models/proxy_host.go`
---
## Data-Consistency Spec Policy (Aligned)
`tests/core/data-consistency.spec.ts` is consistently in scope for this plan:
- Included in targeted reproduction when active failure signal.
- Included in impacted shard reruns after relevant fixes.
- Included in cross-browser fan-out for impacted scope.
- Included in final full split confirmation.
No phase excludes this spec while claiming full-green readiness.
---
## Phased Task Plan
### Phase 1: Environment and Reproduction
- Complete Gate 1 and Gate 2.
- Produce failure map with taxonomy and ownership.
### Phase 2: Root-Cause Fix Loop
- Complete Gate 3.
- Prioritize helper/contract/product fixes over retries/timeouts.
### Phase 3: Impacted Validation
- Complete Gate 4 and Gate 5.
- Enforce consecutive-pass thresholds.
### Phase 4: Full Confirmation
- Complete Gate 6.
- Complete Gate 7.
- Verify full-green state for split topology and security matrix.
### Phase 5: Patch Coverage Triage Closure
- This phase runs after implementation changes and after coverage is executed/uploaded.
- Capture exact missing/partial line ranges from Codecov Patch view.
- Maintain line-range-to-test mapping until each range is covered.
- Re-run only targeted suites first, then required final confirmation suite.
Status convention for this phase:
- `Pending Execution`: acceptable before implementation and before coverage + Codecov Patch review are run.
- `Closed`: required at completion for every triage entry.
#### Codecov Patch Triage Table (Mandatory)
Note: `Codecov Patch line range` and related placeholders below are execution artifacts. Populate them only in Phase 5 after running coverage and opening the Codecov Patch view.
| Codecov Patch line range | File | Coverage status | Targeted test(s) to add/run | Owner | Status |
| --- | --- | --- | --- | --- | --- |
| `<paste exact range from Codecov>` | `<path>` | Missing/Partial | `<test file + test name>` | `<name>` | Pending Execution |
---
## Critical Path Exclusions
Removed from critical path for this request:
- `.gitignore` audit
- `codecov.yml` audit
- `.dockerignore` audit
- `Dockerfile` audit
These are non-blocking unless directly proven as root cause of active E2E failures.
---
## Non-Blocking Follow-Up (Optional)
- Config hygiene review for `.gitignore`, `codecov.yml`, `.dockerignore`, `Dockerfile`.
- Additional CI/runtime optimization outside current pass criteria.
---
## Definition of Done
1. Hard-gated execution order completed without skipped stop/go checks.
2. Active failures fixed via verified root causes.
3. `tests/core/data-consistency.spec.ts` handled consistently per policy.
4. Impacted shards green with required consecutive passes.
5. Browser fan-out green with required consecutive passes.
6. Security matrix and full split confirmation green.
7. Runtime/stability thresholds satisfied, or explicit follow-up recorded for approved exceptions.
8. Coverage completion is documented for all modified production code.
9. Codecov Patch coverage for modified lines is 100%.
10. Codecov Patch missing/partial line ranges are explicitly captured and each is mapped to targeted tests, with all entries closed.
---
## Policy-Bound Caveat (Coverage)
- Repository policy remains a hard gate: modified production lines require 100% Codecov patch coverage.
- No exceptions are allowed unless repository policy itself is changed.
- Filler tests are not acceptable; each missing/partial patch line must be covered by behavior-linked targeted tests tied to the affected scenario.