14 KiB
post_title, author1, post_slug, microsoft_alias, featured_image, categories, tags, ai_note, summary, post_date
| post_title | author1 | post_slug | microsoft_alias | featured_image | categories | tags | ai_note | summary | post_date | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| E2E Test Remediation Plan | Charon Team | e2e-test-remediation-plan | charon-team | https://wikid82.github.io/charon/assets/images/featured/charon.png |
|
|
true | Phased remediation plan for Charon Playwright E2E tests, covering inventory, dependencies, runtime estimates, and quick start commands. | 2026-01-28 |
1. Introduction
This plan replaces the current spec with a comprehensive, phased remediation strategy for the Playwright E2E test suite under tests. The goal is to stabilize execution, align dependencies, and sequence remediation work so that core management flows, security controls, and integration workflows become reliable in Docker-based E2E runs.
2. Research Findings
2.1 Test Harness and Global Dependencies
- Global setup and teardown are enforced by tests/global-setup.ts, tests/auth.setup.ts, and tests/security-teardown.setup.ts.
- Global setup validates the emergency token, checks health endpoints, and resets security settings, which impacts all security-enforcement suites.
- Multiple suites depend on the emergency server (port 2020) and Cerberus modules with explicit admin whitelist configuration.
2.2 Test Inventory and Feature Areas
- Core management flows: authentication, navigation, dashboard, proxy hosts, certificates, access lists in tests/core.
- DNS providers and ACME workflows: [tests/dns-provider-crud.spec.ts] (tests/dns-provider-crud.spec.ts), tests/dns-provider-types.spec.ts, tests/manual-dns-provider.spec.ts.
- Monitoring: uptime and log streaming in tests/monitoring.
- Settings: system, account, SMTP, notifications, encryption, user management in tests/settings.
- Tasks and imports: backups, Caddyfile import flows, CrowdSec import, and log viewing in tests/tasks.
- Security UI: dashboard, WAF, CrowdSec, headers, rate limiting, and audit logs in tests/security.
- Security enforcement: ACL, WAF, rate limits, CrowdSec, emergency token, and break-glass recovery in tests/security-enforcement.
- Integration workflows: cross-feature scenarios in tests/integration.
- Browser-specific regressions for import flows in tests/webkit-specific and tests/firefox-specific.
- Debug and diagnostics: certificates and Caddy import debug coverage in tests/debug/certificates-debug.spec.ts, tests/tasks/caddy-import-gaps.spec.ts, tests/tasks/caddy-import-cross-browser.spec.ts, and tests/debug.
- UI triage and regression coverage: dropdown/modal coverage in tests/modal-dropdown-triage.spec.ts and tests/proxy-host-dropdown-fix.spec.ts.
- Shared utilities validation: wait helpers in tests/utils/wait-helpers.spec.ts.
2.3 Dependency and Ordering Constraints
- The security-enforcement suite assumes Cerberus can be toggled on, and its final tests intentionally restore admin whitelist state (see [tests/security-enforcement/zzzz-break-glass-recovery.spec.ts] (tests/security-enforcement/zzzz-break-glass-recovery.spec.ts)).
- Admin whitelist blocking is designed to run last using a zzz prefix (see [tests/security-enforcement/zzz-admin-whitelist-blocking.spec.ts] (tests/security-enforcement/zzz-admin-whitelist-blocking.spec.ts)).
- Emergency server tests depend on port 2020 availability (see tests/security-enforcement/emergency-server).
- Some import suites use real APIs and TestDataManager cleanup; others mock requests. Remediation must avoid mixing mocked and real flows in a single phase without clear isolation.
2.4 Runtime and Flake Hotspots
- Security-enforcement suites include extended retries, network propagation delays, and rate limit loops.
- Import debug and gap-coverage suites perform real uploads, data creation, and commit flows, making them sensitive to backend state and Caddy reload timing.
- Monitoring WebSocket tests require stable log streaming state.
3. Technical Specifications
3.1 Test Grouping and Shards
- Foundation: global setup, auth storage state, security teardown.
- Core UI: authentication, navigation, dashboard, proxy hosts, certificates, access lists.
- Settings: system, account, SMTP, notifications, encryption, users.
- Tasks: backups, logs, Caddyfile import, CrowdSec import.
- Monitoring: uptime monitoring and real-time logs.
- Security UI: Cerberus dashboard, WAF config, headers, rate limiting, CrowdSec config, audit logs.
- Security Enforcement: ACL/WAF/CrowdSec/rate limit enforcement, emergency token and break-glass recovery, admin whitelist blocking.
- Integration: proxy + cert, proxy + DNS, backup restore, import workflows, multi-feature workflows.
- Browser-specific: WebKit and Firefox import regressions.
- Debug/POC: diagnostics and investigation suites (Caddy import debug).
3.2 Dependency Graph (High-Level)
flowchart TD
A[global-setup + auth.setup] --> B[Core UI + Settings]
A --> C[Tasks + Monitoring]
A --> D[Security UI]
D --> E[Security Enforcement]
E --> F[Break-Glass Recovery]
B --> G[Integration Workflows]
C --> G
G --> H[Browser-specific Suites]
3.3 Runtime Estimates (Docker Mode)
| Group | Suite Examples | Expected Runtime | Prerequisites |
|---|---|---|---|
| Foundation | global setup + auth | 1-2 min | Docker E2E container, emergency token |
| Core UI | core specs | 6-10 min | Auth storage state, clean data |
| Settings | settings specs | 6-10 min | Auth storage state |
| Tasks | backups/import/logs | 10-16 min | Auth storage state, API mocks and real flows |
| Monitoring | monitoring specs | 5-8 min | WebSocket stability |
| Security UI | security specs | 10-14 min | Cerberus enabled, admin whitelist |
| Security Enforcement | enforcement specs | 15-25 min | Emergency token, port 2020, admin whitelist |
| Integration | integration specs | 12-20 min | Stable core + settings + tasks |
| Browser-specific | firefox/webkit | 8-12 min | Import baseline stable |
| Debug/POC | caddy import debug | 4-6 min | Docker logs available |
Assumed worker count: 4 (default) except security-enforcement which requires
--workers=1. Serial execution increases runtime for enforcement suites.
3.4 Environment Preconditions
- E2E container built and healthy via
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e. - Ports 8080 (UI/API) and 2020 (emergency server) reachable.
CHARON_EMERGENCY_TOKENconfigured and valid.- Admin whitelist includes test runner ranges when Cerberus is enabled.
- Caddy admin health endpoints reachable for import workflows.
3.5 Emergency Server and Security Prerequisites
- Port 2020 (emergency server) available and reachable for tests/security-enforcement/emergency-server.
- Port 2019 is reserved for the Caddy admin API; use 2020 for emergency server tests to avoid conflicts.
- Basic Auth credentials required for emergency server tests. Defaults in test
fixtures are
admin/changemeand should match the E2E compose config. - Admin whitelist bypass must be configured before enforcement tests that toggle Cerberus settings.
4. Implementation Plan
Phase 1: Foundation and Test Harness Reliability
Objective: Ensure the shared test harness is stable before touching feature flows.
- Validate global setup and storage state creation (see tests/global-setup.ts and tests/auth.setup.ts).
- Confirm emergency server availability and credentials for break-glass suites.
- Establish baseline run for core login/navigation suites.
Estimated runtime: 2-4 minutes
Success criteria:
- Storage state created once and reused without re-auth flake.
- Emergency token validation passes and security reset executes.
Phase 2: Core UI, Settings, Monitoring, and Task Flows
Objective: Remediate the highest-traffic user journeys and tasks.
- Core UI: authentication, navigation, dashboard, proxy hosts, certificates, access lists (core CRUD and navigation).
- Settings: system, account, SMTP, notifications, encryption, users.
- Monitoring: uptime and real-time logs.
- Tasks: backups, logs viewing, and base Caddyfile import flows.
- Include modal/dropdown triage coverage and wait helpers validation.
Estimated runtime: 25-40 minutes
Success criteria:
- Core CRUD and navigation pass without retries.
- Monitoring WebSocket tests pass without timeouts.
- Backups and log viewing flows pass with mocks and deterministic waits.
Phase 3: Security UI and Enforcement
Objective: Stabilize Cerberus UI configuration and enforcement workflows.
- Security dashboard and configuration pages.
- WAF, headers, rate limiting, CrowdSec, audit logs.
- Enforcement suites, including emergency token and whitelist blocking order.
Estimated runtime: 30-45 minutes
Success criteria:
- Security UI toggles and pages load without state leakage.
- Enforcement suites pass with Cerberus enabled and whitelist configured.
- Break-glass recovery restores bypass state for subsequent suites.
Phase 4: Integration, Browser-Specific, and Debug Suites
Objective: Close cross-feature and browser-specific regressions.
- Integration workflows: proxy + cert, proxy + DNS, backup restore, import to production, multi-feature workflows.
- Browser-specific Caddy import regressions (Firefox/WebKit).
- Debug/POC suites (Caddy import debug, diagnostics) run as opt-in, including caddy-import-gaps and cross-browser import coverage.
Estimated runtime: 25-40 minutes
Success criteria:
- Integration workflows pass with stable TestDataManager cleanup.
- Browser-specific import tests show consistent API request handling.
- Debug suites remain optional and do not block core pipelines.
5. Acceptance Criteria (EARS)
- WHEN the E2E harness initializes, THE SYSTEM SHALL validate emergency token and create a reusable auth state without flake.
- WHEN core management tests execute, THE SYSTEM SHALL complete CRUD flows without manual retries or timeouts.
- WHEN security enforcement suites execute, THE SYSTEM SHALL apply Cerberus settings with admin whitelist bypass and SHALL restore security state after completion.
- WHEN integration workflows execute, THE SYSTEM SHALL complete cross-feature journeys without data collisions or residual state.
6. Quick Start Commands
# Rebuild and start E2E container
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
# PHASE 1: Foundation
cd /projects/Charon
npx playwright test tests/global-setup.ts tests/auth.setup.ts --project=firefox
# PHASE 2: Core UI, Settings, Tasks, Monitoring
# NOTE: PLAYWRIGHT_SKIP_SECURITY_DEPS=1 is automatically set in E2E scripts
# Security suites will NOT execute as dependencies
npx playwright test tests/core --project=firefox
npx playwright test tests/settings --project=firefox
npx playwright test tests/tasks --project=firefox
npx playwright test tests/monitoring --project=firefox
# PHASE 3: Security UI and Enforcement (SERIAL)
npx playwright test tests/security --project=firefox
npx playwright test tests/security-enforcement --project=firefox --workers=1
# PHASE 4: Integration, Browser-Specific, Debug (Optional)
npx playwright test tests/integration --project=firefox
npx playwright test tests/firefox-specific --project=firefox
npx playwright test tests/webkit-specific --project=webkit
npx playwright test tests/debug --project=firefox
npx playwright test tests/tasks/caddy-import-gaps.spec.ts --project=firefox
7. Risks and Mitigations
- Risk: Security suite state leaks across tests. Mitigation: enforce admin whitelist reset and break-glass recovery ordering.
- Risk: File-name ordering (zzz-) not enforced without
--workers=1. Mitigation: document--workers=1requirement and make it mandatory in CI and quick-start commands. - Risk: Emergency server unavailable. Mitigation: gate enforcement suites on health checks and document port 2020 requirements.
- Risk: Import suites combine mocked and real flows. Mitigation: isolate by phase and keep debug suites opt-in.
- Risk: Missing test suites hide regressions. Mitigation: inventory now includes all suites and maps them to phases.
8. Dependencies and Impacted Files
- Harness: tests/global-setup.ts, tests/auth.setup.ts, tests/security-teardown.setup.ts.
- Core UI: tests/core.
- Settings: tests/settings.
- Tasks: tests/tasks.
- Monitoring: tests/monitoring.
- Security UI: tests/security.
- Security enforcement: tests/security-enforcement.
- Integration: tests/integration.
- Browser-specific: tests/firefox-specific, tests/webkit-specific.
9. Confidence Score
Confidence: 79 percent
Rationale: The suite inventory and dependencies are well understood. The main unknowns are timing-sensitive security propagation and emergency server availability in varied environments.