Charon/docs/plans/current_spec.md at b835a59b21c7ae7abadeeff0e2afddee00aacb3e

Files

GitHub Actions 43c6317f82 fix: trim whitespace for domain names and forward host, enforce DNS provider requirement for DNS challenge

2026-02-15 20:11:53 +00:00

11 KiB

Raw Blame History

Playwright E2E Green Plan (Single Objective)

Date: 2026-02-15 Owner: Planning Agent Scope: Achieve 100% green Playwright E2E quickly through root-cause fixes and performance/stability optimization.

Objective

Deliver a fully green Playwright E2E suite with deterministic results and controlled runtime by fixing root causes first (not symptom retries).

This file contains one objective only for today’s request.

Requirements (EARS)

WHEN the E2E environment is invalid or stale, THE SYSTEM SHALL stop execution and rebuild before test reproduction.
WHEN a failure is reproduced, THE SYSTEM SHALL identify root cause in frontend/backend/helpers before mitigation.
WHEN synchronization is required, THE SYSTEM SHALL use condition-based waits and deterministic locators rather than fixed sleeps.
WHEN tests/core/data-consistency.spec.ts is relevant to the failing path, THE SYSTEM SHALL include it consistently in targeted reproduction, impacted rerun, browser fan-out, and final confirmation.
WHEN all impacted fixes are complete, THE SYSTEM SHALL pass security matrix and full split confirmation without regressions.
WHEN production code is modified, THE SYSTEM SHALL complete coverage runs before final sign-off.
WHEN Codecov Patch view reports missing or partial coverage, THE SYSTEM SHALL require 100% patch coverage for modified lines before go-live.
WHEN patch coverage is below 100%, THE SYSTEM SHALL capture exact missing/partial line ranges and map each range to targeted tests.
WHEN stability is being validated, THE SYSTEM SHALL use Playwright-native --repeat-each consecutive-pass gates rather than ad hoc loops.
WHEN classifying possible state contamination, THE SYSTEM SHALL run an early single-worker diagnostic branch (--workers=1) before parallel reruns.
WHEN retries are enabled for instrumentation, THE SYSTEM SHALL enforce --fail-on-flaky-tests so flaky-pass results do not qualify as green.

Hard-Gated Execution Order (Stop/Go)

Gate 1: Environment Validity and Rebuild Decision

Stop:

charon-e2e unhealthy, missing required env, stale runtime after app/runtime changes.

Go:

Health checks pass and runtime mode matches test scope.

Order:

Validate health and required env.
Prefer runner-managed setup where feasible:
- Use Playwright project-dependency setup flow (for example, setup project + dependent browser projects) instead of external-only setup scripts when equivalent.
- Use webServer readiness gating option when applicable so runner controls startup/ready checks.
Decide rebuild using testing protocol:
- Rebuild required for runtime changes (backend/**, frontend/**, runtime/Docker inputs).
- Reuse container for test-only changes if healthy.

Gate 2: Targeted Shard Reproduction

Stop:

Failure not reproducible in targeted shard/spec.

Go:

Failure reproduced with clear signal and trace.

Order:

Reproduce in smallest shard/spec containing failure.
Capture trace/artifacts once failure is reproduced.

Gate 3: Root-Cause Fix Loop

Stop:

Change does not map to verified cause.
Proposed fix is only timeout/retry inflation.

Go:

Cause mapped to specific file/function/component.

Loop:

Classify cause (SYNC_WAIT, LOCATOR_AMBIGUITY, STATE_LEAK, ENV_ORCHESTRATION, PERF_TIMEOUT).
Run taxonomy-first fastest diagnostic move:
- SYNC_WAIT: run failing test with trace + UI mode/timeouts unchanged; verify missing awaited state transition first.
- LOCATOR_AMBIGUITY: run strict locator count/assertion first (toHaveCount(1)/role+name narrowing) before selector rewrites.
- STATE_LEAK: run early deterministic branch with --workers=1 and same shard/spec to confirm isolation issue.
- ENV_ORCHESTRATION: verify setup project execution order and webServer ready signal before app-code changes.
- PERF_TIMEOUT: run smallest repro with unchanged expectations and capture slow step timings before raising timeout.
Apply root-cause fix.
Re-run smallest failing scope.
Repeat until deterministic pass.

Gate 4: Impacted Shard Rerun

Stop:

Any failure in impacted shard after fix.

Go:

Impacted shard fully green.

Gate 5: Browser Fan-Out

Stop:

Any browser fails on impacted scope.

Go:

Chromium + Firefox + WebKit pass impacted scope.

Order:

Single-browser deterministic validation first (primary browser baseline).
Cross-browser fan-out second (Chromium + Firefox + WebKit).

Gate 6: Security Matrix and Full Split Confirmation

Stop:

Security suites fail or split topology regresses.

Go:

Security matrix green and full split pipeline green.

Gate 7: Mandatory Coverage and Codecov Patch Triage

Stop:

Coverage runs not completed for modified production code.
Codecov Patch view not reviewed after coverage upload.
Patch coverage for modified lines is < 100%.
Missing/partial patch line ranges are not documented with mapped targeted tests.

Go:

Coverage runs completed.
Codecov Patch coverage for modified lines is exactly 100%.
All missing/partial ranges are triaged and resolved with targeted tests.

Order:

Run required coverage suites for modified areas.
Open Codecov Patch view.
Copy exact missing/partial modified line ranges.
Map each range to targeted tests.
Add/adjust tests and rerun coverage until patch coverage is 100%.

Performance and Stability Thresholds

Runtime Thresholds

Baseline source: most recent known-good split run on same branch/runtime profile.
Non-security shard runtime regression allowed: <= 10% per shard.
Total non-security wall-clock regression allowed: <= 10%.
Security matrix runtime regression allowed: <= 15%.

Runtime fail:

Any threshold exceeded without explicit documented acceptance as follow-up.

Stability / Flake Thresholds

Targeted repaired failure: 3 consecutive passes required.
Impacted shard: 2 consecutive passes required.
Browser fan-out: 2 consecutive passes per browser on impacted scope.
Final full split confirmation: 1 full run with zero retry-required failures.
Consecutive-pass gates use Playwright-native --repeat-each.
Retry runs are instrumentation only (small CI retry count) and must be paired with --fail-on-flaky-tests.

Stability fail:

Any non-deterministic reappearance inside required consecutive pass window.
Any flaky-pass classified by Playwright as flaky.

Auth-State Reuse Nuance

Shared storageState reuse is acceptable only when server-side state mutation conflicts are controlled.
If tests mutate shared server-side entities (user/session/settings records), isolate state per test/suite or reset deterministically before reuse.

Root-Cause-First Focus Areas

Orchestration and helpers

tests/global-setup.ts (waitForContainer, emergencySecurityReset)
tests/auth.setup.ts (performLoginAndSaveState, resetAdminCredentials)
tests/utils/wait-helpers.ts
tests/utils/ui-helpers.ts
tests/utils/TestDataManager.ts

Flake-prone suites

tests/core/navigation.spec.ts
tests/core/proxy-hosts.spec.ts
tests/core/data-consistency.spec.ts
tests/settings/user-management.spec.ts
tests/settings/smtp-settings.spec.ts
tests/settings/notifications.spec.ts
tests/tasks/*.spec.ts

UI/component hotspots

frontend/src/components/ProxyHostForm.tsx
frontend/src/pages/ProxyHosts.tsx
frontend/src/pages/UsersPage.tsx
frontend/src/pages/Settings.tsx
frontend/src/pages/Certificates.tsx

Backend integrity path

backend/internal/api/handlers/proxy_host_handler.go (Update)
backend/internal/services/proxyhost_service.go (Update, validation paths)
backend/internal/models/proxy_host.go

Data-Consistency Spec Policy (Aligned)

tests/core/data-consistency.spec.ts is consistently in scope for this plan:

Included in targeted reproduction when active failure signal.
Included in impacted shard reruns after relevant fixes.
Included in cross-browser fan-out for impacted scope.
Included in final full split confirmation.

No phase excludes this spec while claiming full-green readiness.

Phased Task Plan

Phase 1: Environment and Reproduction

Complete Gate 1 and Gate 2.
Produce failure map with taxonomy and ownership.

Phase 2: Root-Cause Fix Loop

Complete Gate 3.
Prioritize helper/contract/product fixes over retries/timeouts.

Phase 3: Impacted Validation

Complete Gate 4 and Gate 5.
Enforce consecutive-pass thresholds.

Phase 4: Full Confirmation

Complete Gate 6.
Complete Gate 7.
Verify full-green state for split topology and security matrix.

Phase 5: Patch Coverage Triage Closure

This phase runs after implementation changes and after coverage is executed/uploaded.
Capture exact missing/partial line ranges from Codecov Patch view.
Maintain line-range-to-test mapping until each range is covered.
Re-run only targeted suites first, then required final confirmation suite.

Status convention for this phase:

Pending Execution: acceptable before implementation and before coverage + Codecov Patch review are run.
Closed: required at completion for every triage entry.

Codecov Patch Triage Table (Mandatory)

Note: Codecov Patch line range and related placeholders below are execution artifacts. Populate them only in Phase 5 after running coverage and opening the Codecov Patch view.

Codecov Patch line range	File	Coverage status	Targeted test(s) to add/run	Owner	Status
`<paste exact range from Codecov>`	`<path>`	Missing/Partial	`<test file + test name>`	`<name>`	Pending Execution

Critical Path Exclusions

Removed from critical path for this request:

.gitignore audit
codecov.yml audit
.dockerignore audit
Dockerfile audit

These are non-blocking unless directly proven as root cause of active E2E failures.

Non-Blocking Follow-Up (Optional)

Config hygiene review for .gitignore, codecov.yml, .dockerignore, Dockerfile.
Additional CI/runtime optimization outside current pass criteria.

Definition of Done

Hard-gated execution order completed without skipped stop/go checks.
Active failures fixed via verified root causes.
tests/core/data-consistency.spec.ts handled consistently per policy.
Impacted shards green with required consecutive passes.
Browser fan-out green with required consecutive passes.
Security matrix and full split confirmation green.
Runtime/stability thresholds satisfied, or explicit follow-up recorded for approved exceptions.
Coverage completion is documented for all modified production code.
Codecov Patch coverage for modified lines is 100%.
Codecov Patch missing/partial line ranges are explicitly captured and each is mapped to targeted tests, with all entries closed.

Policy-Bound Caveat (Coverage)

Repository policy remains a hard gate: modified production lines require 100% Codecov patch coverage.
No exceptions are allowed unless repository policy itself is changed.
Filler tests are not acceptable; each missing/partial patch line must be covered by behavior-linked targeted tests tied to the affected scenario.

11 KiB Raw Blame History Unescape Escape

Playwright E2E Green Plan (Single Objective)

Objective

Requirements (EARS)

Hard-Gated Execution Order (Stop/Go)

Gate 1: Environment Validity and Rebuild Decision

Gate 2: Targeted Shard Reproduction

Gate 3: Root-Cause Fix Loop

Gate 4: Impacted Shard Rerun

Gate 5: Browser Fan-Out

Gate 6: Security Matrix and Full Split Confirmation

Gate 7: Mandatory Coverage and Codecov Patch Triage

Performance and Stability Thresholds

Runtime Thresholds

Stability / Flake Thresholds

Auth-State Reuse Nuance

Root-Cause-First Focus Areas

Orchestration and helpers

Flake-prone suites

UI/component hotspots

Backend integrity path

Data-Consistency Spec Policy (Aligned)

Phased Task Plan

Phase 1: Environment and Reproduction

Phase 2: Root-Cause Fix Loop

Phase 3: Impacted Validation

Phase 4: Full Confirmation

Phase 5: Patch Coverage Triage Closure

Codecov Patch Triage Table (Mandatory)

Critical Path Exclusions

Non-Blocking Follow-Up (Optional)

Definition of Done

Policy-Bound Caveat (Coverage)

11 KiB

Raw Blame History