Files
Charon/docs/plans/current_spec.md

14 KiB

post_title, author1, post_slug, microsoft_alias, featured_image, categories, tags, ai_note, summary, post_date
post_title author1 post_slug microsoft_alias featured_image categories tags ai_note summary post_date
E2E Test Remediation Plan Charon Team e2e-test-remediation-plan charon-team https://wikid82.github.io/charon/assets/images/featured/charon.png
testing
playwright
e2e
remediation
security
true Phased remediation plan for Charon Playwright E2E tests, covering inventory, dependencies, runtime estimates, and quick start commands. 2026-01-28

1. Introduction

This plan replaces the current spec with a comprehensive, phased remediation strategy for the Playwright E2E test suite under tests. The goal is to stabilize execution, align dependencies, and sequence remediation work so that core management flows, security controls, and integration workflows become reliable in Docker-based E2E runs.

2. Research Findings

2.1 Test Harness and Global Dependencies

  • Global setup and teardown are enforced by tests/global-setup.ts, tests/auth.setup.ts, and tests/security-teardown.setup.ts.
  • Global setup validates the emergency token, checks health endpoints, and resets security settings, which impacts all security-enforcement suites.
  • Multiple suites depend on the emergency server (port 2020) and Cerberus modules with explicit admin whitelist configuration.

2.2 Test Inventory and Feature Areas

2.3 Dependency and Ordering Constraints

  • The security-enforcement suite assumes Cerberus can be toggled on, and its final tests intentionally restore admin whitelist state (see [tests/security-enforcement/zzzz-break-glass-recovery.spec.ts] (tests/security-enforcement/zzzz-break-glass-recovery.spec.ts)).
  • Admin whitelist blocking is designed to run last using a zzz prefix (see [tests/security-enforcement/zzz-admin-whitelist-blocking.spec.ts] (tests/security-enforcement/zzz-admin-whitelist-blocking.spec.ts)).
  • Emergency server tests depend on port 2020 availability (see tests/security-enforcement/emergency-server).
  • Some import suites use real APIs and TestDataManager cleanup; others mock requests. Remediation must avoid mixing mocked and real flows in a single phase without clear isolation.

2.4 Runtime and Flake Hotspots

  • Security-enforcement suites include extended retries, network propagation delays, and rate limit loops.
  • Import debug and gap-coverage suites perform real uploads, data creation, and commit flows, making them sensitive to backend state and Caddy reload timing.
  • Monitoring WebSocket tests require stable log streaming state.

3. Technical Specifications

3.1 Test Grouping and Shards

  • Foundation: global setup, auth storage state, security teardown.
  • Core UI: authentication, navigation, dashboard, proxy hosts, certificates, access lists.
  • Settings: system, account, SMTP, notifications, encryption, users.
  • Tasks: backups, logs, Caddyfile import, CrowdSec import.
  • Monitoring: uptime monitoring and real-time logs.
  • Security UI: Cerberus dashboard, WAF config, headers, rate limiting, CrowdSec config, audit logs.
  • Security Enforcement: ACL/WAF/CrowdSec/rate limit enforcement, emergency token and break-glass recovery, admin whitelist blocking.
  • Integration: proxy + cert, proxy + DNS, backup restore, import workflows, multi-feature workflows.
  • Browser-specific: WebKit and Firefox import regressions.
  • Debug/POC: diagnostics and investigation suites (Caddy import debug).

3.2 Dependency Graph (High-Level)

flowchart TD
   A[global-setup + auth.setup] --> B[Core UI + Settings]
   A --> C[Tasks + Monitoring]
   A --> D[Security UI]
   D --> E[Security Enforcement]
   E --> F[Break-Glass Recovery]
   B --> G[Integration Workflows]
   C --> G
   G --> H[Browser-specific Suites]

3.3 Runtime Estimates (Docker Mode)

Group Suite Examples Expected Runtime Prerequisites
Foundation global setup + auth 1-2 min Docker E2E container, emergency token
Core UI core specs 6-10 min Auth storage state, clean data
Settings settings specs 6-10 min Auth storage state
Tasks backups/import/logs 10-16 min Auth storage state, API mocks and real flows
Monitoring monitoring specs 5-8 min WebSocket stability
Security UI security specs 10-14 min Cerberus enabled, admin whitelist
Security Enforcement enforcement specs 15-25 min Emergency token, port 2020, admin whitelist
Integration integration specs 12-20 min Stable core + settings + tasks
Browser-specific firefox/webkit 8-12 min Import baseline stable
Debug/POC caddy import debug 4-6 min Docker logs available

Assumed worker count: 4 (default) except security-enforcement which requires --workers=1. Serial execution increases runtime for enforcement suites.

3.4 Environment Preconditions

  • E2E container built and healthy via .github/skills/scripts/skill-runner.sh docker-rebuild-e2e.
  • Ports 8080 (UI/API) and 2020 (emergency server) reachable.
  • CHARON_EMERGENCY_TOKEN configured and valid.
  • Admin whitelist includes test runner ranges when Cerberus is enabled.
  • Caddy admin health endpoints reachable for import workflows.

3.5 Emergency Server and Security Prerequisites

  • Port 2020 (emergency server) available and reachable for tests/security-enforcement/emergency-server.
  • Port 2019 is reserved for the Caddy admin API; use 2020 for emergency server tests to avoid conflicts.
  • Basic Auth credentials required for emergency server tests. Defaults in test fixtures are admin / changeme and should match the E2E compose config.
  • Admin whitelist bypass must be configured before enforcement tests that toggle Cerberus settings.

4. Implementation Plan

Phase 1: Foundation and Test Harness Reliability

Objective: Ensure the shared test harness is stable before touching feature flows.

  • Validate global setup and storage state creation (see tests/global-setup.ts and tests/auth.setup.ts).
  • Confirm emergency server availability and credentials for break-glass suites.
  • Establish baseline run for core login/navigation suites.

Estimated runtime: 2-4 minutes

Success criteria:

  • Storage state created once and reused without re-auth flake.
  • Emergency token validation passes and security reset executes.

Phase 2: Core UI, Settings, Monitoring, and Task Flows

Objective: Remediate the highest-traffic user journeys and tasks.

  • Core UI: authentication, navigation, dashboard, proxy hosts, certificates, access lists (core CRUD and navigation).
  • Settings: system, account, SMTP, notifications, encryption, users.
  • Monitoring: uptime and real-time logs.
  • Tasks: backups, logs viewing, and base Caddyfile import flows.
  • Include modal/dropdown triage coverage and wait helpers validation.

Estimated runtime: 25-40 minutes

Success criteria:

  • Core CRUD and navigation pass without retries.
  • Monitoring WebSocket tests pass without timeouts.
  • Backups and log viewing flows pass with mocks and deterministic waits.

Phase 3: Security UI and Enforcement

Objective: Stabilize Cerberus UI configuration and enforcement workflows.

  • Security dashboard and configuration pages.
  • WAF, headers, rate limiting, CrowdSec, audit logs.
  • Enforcement suites, including emergency token and whitelist blocking order.

Estimated runtime: 30-45 minutes

Success criteria:

  • Security UI toggles and pages load without state leakage.
  • Enforcement suites pass with Cerberus enabled and whitelist configured.
  • Break-glass recovery restores bypass state for subsequent suites.

Phase 4: Integration, Browser-Specific, and Debug Suites

Objective: Close cross-feature and browser-specific regressions.

  • Integration workflows: proxy + cert, proxy + DNS, backup restore, import to production, multi-feature workflows.
  • Browser-specific Caddy import regressions (Firefox/WebKit).
  • Debug/POC suites (Caddy import debug, diagnostics) run as opt-in, including caddy-import-gaps and cross-browser import coverage.

Estimated runtime: 25-40 minutes

Success criteria:

  • Integration workflows pass with stable TestDataManager cleanup.
  • Browser-specific import tests show consistent API request handling.
  • Debug suites remain optional and do not block core pipelines.

5. Acceptance Criteria (EARS)

  • WHEN the E2E harness initializes, THE SYSTEM SHALL validate emergency token and create a reusable auth state without flake.
  • WHEN core management tests execute, THE SYSTEM SHALL complete CRUD flows without manual retries or timeouts.
  • WHEN security enforcement suites execute, THE SYSTEM SHALL apply Cerberus settings with admin whitelist bypass and SHALL restore security state after completion.
  • WHEN integration workflows execute, THE SYSTEM SHALL complete cross-feature journeys without data collisions or residual state.

6. Quick Start Commands

# Rebuild and start E2E container
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e

# PHASE 1: Foundation
cd /projects/Charon
npx playwright test tests/global-setup.ts tests/auth.setup.ts --project=firefox

# PHASE 2: Core UI, Settings, Tasks, Monitoring
# NOTE: PLAYWRIGHT_SKIP_SECURITY_DEPS=1 is automatically set in E2E scripts
# Security suites will NOT execute as dependencies
npx playwright test tests/core --project=firefox
npx playwright test tests/settings --project=firefox
npx playwright test tests/tasks --project=firefox
npx playwright test tests/monitoring --project=firefox

# PHASE 3: Security UI and Enforcement (SERIAL)
npx playwright test tests/security --project=firefox
npx playwright test tests/security-enforcement --project=firefox --workers=1

# PHASE 4: Integration, Browser-Specific, Debug (Optional)
npx playwright test tests/integration --project=firefox
npx playwright test tests/firefox-specific --project=firefox
npx playwright test tests/webkit-specific --project=webkit
npx playwright test tests/debug --project=firefox
npx playwright test tests/tasks/caddy-import-gaps.spec.ts --project=firefox

7. Risks and Mitigations

  • Risk: Security suite state leaks across tests. Mitigation: enforce admin whitelist reset and break-glass recovery ordering.
  • Risk: File-name ordering (zzz-) not enforced without --workers=1. Mitigation: document --workers=1 requirement and make it mandatory in CI and quick-start commands.
  • Risk: Emergency server unavailable. Mitigation: gate enforcement suites on health checks and document port 2020 requirements.
  • Risk: Import suites combine mocked and real flows. Mitigation: isolate by phase and keep debug suites opt-in.
  • Risk: Missing test suites hide regressions. Mitigation: inventory now includes all suites and maps them to phases.

8. Dependencies and Impacted Files

9. Confidence Score

Confidence: 79 percent

Rationale: The suite inventory and dependencies are well understood. The main unknowns are timing-sensitive security propagation and emergency server availability in varied environments.