- Added URL validation for notification providers to ensure only valid http/https URLs are accepted. - Implemented tests for URL validation scenarios in the Notifications component. - Updated translations for error messages related to invalid URLs in multiple languages. - Introduced new hooks for managing security headers and access lists in tests. - Enhanced the ProviderForm component to reset state correctly when switching between add and edit modes. - Improved user feedback with update indicators after saving changes to notification providers. - Added mock implementations for new hooks in various test files to ensure consistent testing behavior.
802 lines
37 KiB
Markdown
802 lines
37 KiB
Markdown
# E2E Playwright Shard Timeout Investigation — Current Spec
|
||
|
||
Last updated: 2026-02-10
|
||
|
||
## Goal
|
||
- Concise summary: investigate GitHub Actions run https://github.com/Wikid82/Charon/actions/runs/21865692694 where the E2E Playwright job reports Shard 3 stopping at ~30 minutes despite configured timeouts of ~40 minutes. Produce reproducible diagnostics, collect artifacts/logs, identify root cause hypotheses, and provide prioritized remediations and short-term unblock steps.
|
||
|
||
## Phases
|
||
- Discover: collect logs and artifacts.
|
||
- Analyze: review config and correlate shard → tests.
|
||
- Remediate: short-term and long-term fixes.
|
||
- Verify: reproduce and confirm the fix.
|
||
|
||
---
|
||
|
||
## 1) Discover — exact places to collect logs & artifacts
|
||
|
||
### GitHub Actions (run-level)
|
||
- Run page: https://github.com/Wikid82/Charon/actions/runs/21865692694
|
||
- Run logs (zip): GET https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/logs
|
||
- Programmatic commands:
|
||
```bash
|
||
export GITHUB_OWNER=Wikid82
|
||
export GITHUB_REPO=Charon
|
||
export RUN_ID=21865692694
|
||
# Requires GITHUB_TOKEN set with repo access
|
||
curl -H "Accept: application/vnd.github+json" \
|
||
-H "Authorization: token $GITHUB_TOKEN" \
|
||
-L "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/logs" \
|
||
-o run-${RUN_ID}-logs.zip
|
||
unzip -d run-${RUN_ID}-logs run-${RUN_ID}-logs.zip
|
||
```
|
||
- Artifacts list (API):
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" \
|
||
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/artifacts" | jq '.'
|
||
```
|
||
- gh CLI (interactive/script):
|
||
```bash
|
||
gh run view $RUN_ID --repo $GITHUB_OWNER/$GITHUB_REPO --log > run-$RUN_ID-summary.log
|
||
gh run download $RUN_ID --repo $GITHUB_OWNER/$GITHUB_REPO --dir artifacts-$RUN_ID
|
||
```
|
||
|
||
### GitHub Actions (job-level)
|
||
- List jobs for the run and find Playwright shard job(s):
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" \
|
||
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
|
||
```
|
||
- For JOB_ID identified as the shard job, download job logs:
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" -L \
|
||
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID/logs" -o job-${JOB_ID}-logs.zip
|
||
unzip -d job-${JOB_ID}-logs job-${JOB_ID}-logs.zip
|
||
```
|
||
|
||
### Playwright test outputs used by this project
|
||
- Search and collect the following files in the repo root (or workflow-run directories):
|
||
- `playwright.config.ts`, `playwright.config.js`, `playwright.config.mjs`
|
||
- `package.json` scripts invoking Playwright (e.g., `test:e2e`, `e2e:ci`)
|
||
- `.github/workflows/*` steps that run Playwright
|
||
- Typical Playwright outputs to collect (per-shard):
|
||
- `<outputDir>/trace.zip`
|
||
- `<outputDir>/test-results.json` or `test-results/*`
|
||
- `<outputDir>/video/*`
|
||
- `<outputDir>/*.log` (stdout/stderr)
|
||
|
||
Observed local example (for context): the developer ran
|
||
`npx playwright test --project=chromium --output=/tmp/playwright-chromium-output --reporter=list > /tmp/playwright-chromium.log 2>&1` — look for similar invocations in workflows/scripts.
|
||
|
||
### Repository container logs (containers/)
|
||
- containers/charon:
|
||
- Files to check: `containers/charon/docker-compose.yml`, any `logs/` or `data/` directories under `containers/charon/`.
|
||
- Local commands (when reproducing):
|
||
```bash
|
||
docker compose -f containers/charon/docker-compose.yml logs --no-color --timestamps > containers-charon-logs.txt
|
||
docker logs --timestamps --since "1h" charon-e2e > charon-e2e.log 2>&1 || true
|
||
```
|
||
- containers/caddy:
|
||
- Files: `containers/caddy/Caddyfile`, `containers/caddy/config/`, `containers/caddy/logs/`
|
||
- Local checks:
|
||
```bash
|
||
docker logs --timestamps caddy > caddy.log 2>&1 || true
|
||
curl -sS http://127.0.0.1:2019/ || true # admin
|
||
curl -sS http://127.0.0.1:2020/ || true # emergency
|
||
```
|
||
|
||
---
|
||
|
||
## 2) Analyze — specific files and config to review (exact paths)
|
||
|
||
- Workflows (search these paths):
|
||
- `.github/workflows/*.yml` — likely candidates: `.github/workflows/e2e.yml`, `.github/workflows/ci.yml`, `.github/workflows/playwright.yml` (run `grep -R "playwright" .github/workflows || true`).
|
||
- Look for `timeout-minutes:` either at top-level workflow or under `jobs:<job>.timeout-minutes`.
|
||
|
||
- Playwright config files:
|
||
- `/projects/Charon/playwright.config.ts`
|
||
- `/projects/Charon/playwright.config.js`
|
||
- `/projects/Charon/playwright.config.mjs`
|
||
- Inspect `projects`, `workers`, `retries`, `outputDir`, `reporter` sections.
|
||
|
||
- package.json and scripts:
|
||
- `/projects/Charon/package.json` — inspect `scripts` for e.g. `test:e2e`, `e2e:ci` and the exact Playwright CLI flags used by CI.
|
||
|
||
- GitHub skill scripts & E2E runner:
|
||
- `.github/skills/scripts/skill-runner.sh` — used in `docs` and testing instructions; check for `docker-rebuild-e2e`, `test-e2e-playwright-coverage`.
|
||
- Commands:
|
||
```bash
|
||
sed -n '1,240p' .github/skills/scripts/skill-runner.sh
|
||
grep -n "docker-rebuild-e2e\|test-e2e-playwright-coverage\|playwright" -n .github/skills || true
|
||
```
|
||
|
||
- Makefile:
|
||
- `/projects/Charon/Makefile` — search for targets related to `e2e`, `playwright`, `rebuild`.
|
||
|
||
---
|
||
|
||
## 3) Steps to download GitHub Actions logs & artifacts for run 21865692694
|
||
|
||
### Programmatic (API)
|
||
1. List artifacts for run:
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" \
|
||
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/artifacts" | jq '.'
|
||
```
|
||
2. Download run logs (zip):
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" -L \
|
||
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/logs" -o run-21865692694-logs.zip
|
||
unzip -d run-21865692694-logs run-21865692694-logs.zip
|
||
```
|
||
3. List jobs to find Playwright shard job id(s):
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" \
|
||
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
|
||
```
|
||
4. Download job logs by JOB_ID:
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" -L \
|
||
"https://api.github.com/repos/Wikid82/Charon/actions/jobs/$JOB_ID/logs" -o job-$JOB_ID-logs.zip
|
||
unzip -d job-$JOB_ID-logs job-$JOB_ID-logs.zip
|
||
```
|
||
|
||
### Using gh CLI
|
||
```bash
|
||
gh run view 21865692694 --repo Wikid82/Charon --log > run-21865692694-summary.log
|
||
gh run download 21865692694 --repo Wikid82/Charon --dir artifacts-21865692694
|
||
```
|
||
|
||
### Manual web UI
|
||
- Visit run page and download artifacts and job logs from the job view.
|
||
|
||
---
|
||
|
||
## 4) How to locate shard-specific logs and correlate shard indices to tests
|
||
|
||
- Typical patterns to inspect:
|
||
- Look for Playwright CLI flags in the job step (e.g., `--shard=INDEX/TOTAL`, `--output=/tmp/...`).
|
||
- If the job ran `npx playwright test --output=/tmp/...`, search the downloaded job logs for that exact command to find the shard index.
|
||
|
||
- Commands to list tests assigned to a shard (dry-run):
|
||
```bash
|
||
# Show which tests a given shard would run (no execution)
|
||
npx playwright test --list --shard=INDEX/TOTAL
|
||
|
||
# Or run with reporter=list (shows test items as executed)
|
||
npx playwright test --shard=INDEX/TOTAL --reporter=list
|
||
```
|
||
|
||
- Note: Playwright shard index is zero-based. If CI logs show `--shard=3/4`, double-check whether the team used zero-based numbering; confirm by re-running the `--list` command.
|
||
|
||
Expected per-shard artifact names (if implemented):
|
||
- `e2e-shard-<INDEX>-output` containing `trace.zip`, `video/*`, `test-results.json`, and shard-specific logs (stdout/stderr files).
|
||
|
||
---
|
||
|
||
## 5) Runner/container logs to inspect
|
||
|
||
- GitHub-hosted runner: review the Actions job logs for runner messages and any `Runner` diagnostic lines. You cannot access host-level logs.
|
||
|
||
- Self-hosted runner (if used): retrieve host system logs (requires access to runner host):
|
||
```bash
|
||
sudo journalctl -u actions.runner.* -n 1000 > runner-service-journal.log
|
||
sudo journalctl -k --since "1 hour ago" | grep -i oom > runner-kernel-oom.log || true
|
||
sudo journalctl -u docker.service -n 200 > docker-journal.log
|
||
```
|
||
|
||
- Docker container logs (charon, caddy, charon-e2e):
|
||
```bash
|
||
docker ps -a --filter "name=charon" --format "{{.Names}} {{.Status}}" > containers-ps.txt
|
||
docker logs --since "1h" charon-e2e > charon-e2e.log 2>&1 || true
|
||
docker logs --since "1h" caddy > caddy.log 2>&1 || true
|
||
```
|
||
|
||
Check Caddy admin/emergency ports (2019 & 2020) to confirm the proxy was healthy during the test run:
|
||
```bash
|
||
curl -sS --max-time 5 http://127.0.0.1:2019/ || echo "admin not responding"
|
||
curl -sS --max-time 5 http://127.0.0.1:2020/ || echo "emergency not responding"
|
||
```
|
||
|
||
---
|
||
|
||
## 6) Hypotheses for why Shard 3 stopped at ~30m (descriptions + exact artifacts to search)
|
||
|
||
H1 — Workflow/job timeout configured smaller than expected
|
||
- Search:
|
||
- `.github/workflows/*` for `timeout-minutes:`
|
||
- job logs for `Timeout` or `Job execution time exceeded`
|
||
- Commands:
|
||
```bash
|
||
grep -n "timeout-minutes" .github/workflows -R || true
|
||
grep -i "timeout" -R run-${RUN_ID}-logs || true
|
||
```
|
||
- Confirmed by: `timeout-minutes: 30` or job logs showing `aborting execution due to timeout`.
|
||
|
||
H2 — Runner preemption / connection loss
|
||
- Search job logs for: `Runner lost`, `The runner has been shutdown`, `Connection to the server was lost`.
|
||
- Commands:
|
||
```bash
|
||
grep -iE "runner lost|runner.*shutdown|connection.*lost|Job canceled|cancelled by" -R run-${RUN_ID}-logs || true
|
||
```
|
||
- Confirmed by: runner disconnect lines and abrupt end of logs with no Playwright stack trace.
|
||
|
||
H3 — E2E environment container (charon/caddy) died or became unhealthy
|
||
- Search container logs for crash/fatal/panic messages and timestamps matching the job stop time.
|
||
- Commands:
|
||
```bash
|
||
docker ps -a --filter "name=charon" --format '{{.Names}} {{.Status}}'
|
||
docker logs charon-e2e --since "2h" | sed -n '1,200p'
|
||
grep -iE "panic|fatal|segfault|exited|health.*unhealthy|503|502" containers -R || true
|
||
```
|
||
- Confirmed by: container exit matching job finish time and Caddy returning 502/503 during run.
|
||
|
||
H4 — Playwright/Node process killed by OOM
|
||
- Search for `Killed`, kernel `oom_reaper` lines, system `dmesg` outputs.
|
||
- Commands:
|
||
```bash
|
||
grep -R "Killed" job-${JOB_ID}-logs || true
|
||
# on self-hosted runner host
|
||
sudo journalctl -k --since '2 hours ago' | grep -i oom || true
|
||
```
|
||
- Confirmed by: kernel OOM logs at same timestamp or `Killed` in job logs.
|
||
|
||
H5 — Script-level early timeout (explicit `timeout 30m` or `kill`)
|
||
- Search `.github/skills` and workflow steps for `timeout 30m`, `timeout 1800`, or `kill` calls.
|
||
- Commands:
|
||
```bash
|
||
grep -R "\btimeout\b\|kill -9\|kill -15\|pkill" -n .github || true
|
||
```
|
||
- Confirmed by: a script with `timeout 30m` or similar wrapper used in the job.
|
||
|
||
H6 — Misinterpreted units or mis-configuration (seconds vs minutes)
|
||
- Search for numeric values used in scripts and steps (e.g., `1800` used where minutes expected).
|
||
- Commands:
|
||
```bash
|
||
grep -R "\b1800\b\|\b3600\b\|timeout-minutes" -n .github || true
|
||
```
|
||
- Confirmed by: a value of `1800` where `timeout-minutes` or similar was expected to be minutes.
|
||
|
||
For each hypothesis, the exact lines/entries returned by the grep/journal/docker commands are the evidence to confirm or refute it. Keep timestamps to correlate with the job start/completion times in the run logs.
|
||
|
||
---
|
||
|
||
## 7) Prioritized remediation plan (short-term → long-term)
|
||
|
||
### Short-term (unblock re-runs quickly)
|
||
1. Download and attach all logs/artifacts for run 21865692694 (use `gh run download`) and share with E2E test author.
|
||
2. Temporarily bump `timeout-minutes` for the failing workflow to 60 to allow full runs while diagnosing.
|
||
3. Add an `if: always()` step to the E2E job that collects diagnostics and uploads them as artifacts (free memory, `dmesg`, `ps aux`, `docker ps -a`, `docker logs charon-e2e`).
|
||
4. Re-run just the failing shard with added `DEBUG=pw:api` and `PWDEBUG=1` and persist shard outputs.
|
||
|
||
### Medium-term
|
||
1. Persist per-shard Playwright outputs via `actions/upload-artifact@v4` for traces/videos/test-results.
|
||
2. Add Playwright `retries` for transient failures and `--trace`/`--video` options.
|
||
3. Add a CI smoke check before full shard execution to confirm env health.
|
||
4. If self-hosted, add runner health checks and alerting (memory, disk, Docker status).
|
||
|
||
### Long-term
|
||
1. Implement stable test splitting based on historical test durations rather than equal-file sharding.
|
||
2. Introduce resource constraints and monitoring to protect against OOM and flapping containers.
|
||
3. Build a golden-minimal E2E smoke job that must pass before running full shards.
|
||
|
||
---
|
||
|
||
## 8) Minimal reproduction checklist (local)
|
||
|
||
1. Rebuild E2E image used by CI (per repo skill):
|
||
```bash
|
||
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
|
||
```
|
||
2. Start the environment (example):
|
||
```bash
|
||
docker compose -f containers/charon/docker-compose.yml up -d
|
||
```
|
||
3. Set base URL and run the same shard (replace INDEX/TOTAL with values from CI):
|
||
```bash
|
||
export PLAYWRIGHT_BASE_URL=http://localhost:5173
|
||
DEBUG=pw:api PWDEBUG=1 \
|
||
npx playwright test --shard=INDEX/TOTAL --project=chromium \
|
||
--output=/tmp/playwright-shard-INDEX --reporter=list > /tmp/playwright-shard-INDEX.log 2>&1
|
||
```
|
||
4. If reproducing a timeout, immediately collect:
|
||
```bash
|
||
docker ps -a --format '{{.Names}} {{.Status}}' > reproduce-docker-ps.txt
|
||
docker logs --since '1h' charon-e2e > reproduce-charon-e2e.log || true
|
||
tail -n 500 /tmp/playwright-shard-INDEX.log > reproduce-pw-tail.log
|
||
```
|
||
|
||
---
|
||
|
||
## 9) Required workflow/scripts changes to improve diagnostics & prevent recurrence
|
||
|
||
- Add `timeout-minutes: 60` to `.github/workflows/<e2e workflow>.yml` while diagnosing; later set to a reasoned SLA (e.g., 50m).
|
||
- Add an `always()` step to collect diagnostics on failure and upload artifacts. Example YAML snippet:
|
||
```yaml
|
||
- name: Collect diagnostics
|
||
if: always()
|
||
run: |
|
||
uptime > uptime.txt
|
||
free -m > free-m.txt
|
||
df -h > df-h.txt
|
||
ps aux > ps-aux.txt
|
||
docker ps -a > docker-ps.txt || true
|
||
docker logs --tail 500 charon-e2e > docker-charon-e2e.log || true
|
||
- uses: actions/upload-artifact@v4
|
||
with:
|
||
name: e2e-diagnostics-${{ github.run_id }}
|
||
path: |
|
||
uptime.txt
|
||
free-m.txt
|
||
df-h.txt
|
||
ps-aux.txt
|
||
docker-ps.txt
|
||
docker-charon-e2e.log
|
||
```
|
||
|
||
- Ensure each Playwright shard runs with `--output` pointing to a shard-specific path and upload that path as artifact:
|
||
- artifact name convention: `e2e-shard-${{ matrix.index }}-output`.
|
||
|
||
---
|
||
|
||
## 10) People/roles to notify & recommended next actions
|
||
|
||
- Notify:
|
||
- CI/Infra owner or person in `CODEOWNERS` for `.github/workflows`
|
||
- E2E test author(s) (owners of failing tests)
|
||
- Self-hosted runner owner (if runner_name in job JSON indicates self-hosted)
|
||
|
||
- Recommended immediate actions for them:
|
||
1. Download run artifacts and job logs for run 21865692694 and share them with the test author.
|
||
2. Re-run the shard with `DEBUG=pw:api` and `PWDEBUG=1` enabled and ensure per-shard artifacts are uploaded.
|
||
3. If self-hosted, check runner host kernel logs for OOM and Docker container exits at the job time.
|
||
|
||
---
|
||
|
||
## 11) Verification steps (post-remediation)
|
||
|
||
1. Re-run E2E workflow end-to-end; verify Shard 3 completes.
|
||
2. Confirm artifacts `e2e-shard-3-output` exist and contain `trace.zip`, `video/*`, and `test-results.json`.
|
||
3. Confirm no `oom_reaper` or `Killed` messages in runner host logs during the run.
|
||
|
||
---
|
||
|
||
## Appendix — quick extraction commands summary
|
||
```bash
|
||
# Download all artifacts and logs for RUN_ID
|
||
gh run download 21865692694 --repo Wikid82/Charon --dir ./artifacts-21865692694
|
||
|
||
# List jobs and find Playwright shard job(s)
|
||
curl -H "Authorization: token $GITHUB_TOKEN" \
|
||
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
|
||
|
||
# Download job logs for JOB_ID
|
||
curl -H "Authorization: token $GITHUB_TOKEN" -L \
|
||
"https://api.github.com/repos/Wikid82/Charon/actions/jobs/$JOB_ID/logs" -o job-$JOB_ID-logs.zip
|
||
unzip -d job-$JOB_ID-logs job-$JOB_ID-logs.zip
|
||
|
||
# Grep for likely causes
|
||
grep -iE "timeout|minut|runner lost|cancelled|Killed|OOM|oom_reaper|Out of memory|panic|fatal" -R run-21865692694-logs || true
|
||
```
|
||
|
||
---
|
||
|
||
## Next three immediate actions (checklist)
|
||
1. Run `gh run download 21865692694 --repo Wikid82/Charon --dir ./artifacts-21865692694` and unzip the run logs.
|
||
2. Search the downloaded logs for `timeout-minutes`, `Runner lost`, `Killed`, and `oom_reaper` to triage H1–H4.
|
||
3. Re-run the failing shard locally with `DEBUG=pw:api PWDEBUG=1` and `--output=/tmp/playwright-shard-INDEX`, capture outputs, and upload them as artifacts.
|
||
|
||
---
|
||
|
||
If you want, I can now (A) download the run artifacts & logs for run 21865692694 using gh/API (requires your GITHUB_TOKEN) and list the job IDs, or (B) open the workflow files in `.github/workflows` and search for `timeout-minutes` and Playwright invocations. Which would you like me to do first?
|
||
---
|
||
post_title: "E2E Test Remediation Plan"
|
||
author1: "Charon Team"
|
||
post_slug: "e2e-test-remediation-plan"
|
||
microsoft_alias: "charon-team"
|
||
featured_image: "https://wikid82.github.io/charon/assets/images/featured/charon.png"
|
||
categories: ["testing"]
|
||
tags: ["playwright", "e2e", "remediation", "security"]
|
||
ai_note: "true"
|
||
summary: "Phased remediation plan for Charon Playwright E2E tests, covering
|
||
inventory, dependencies, runtime estimates, and quick start commands."
|
||
post_date: "2026-01-28"
|
||
---
|
||
|
||
## 1. Introduction
|
||
|
||
This plan replaces the current spec with a comprehensive, phased remediation
|
||
strategy for the Playwright E2E test suite under [tests](tests). The goal is to
|
||
stabilize execution, align dependencies, and sequence remediation work so that
|
||
core management flows, security controls, and integration workflows become
|
||
reliable in Docker-based E2E runs.
|
||
|
||
## 2. Research Findings
|
||
|
||
### 2.1 Test Harness and Global Dependencies
|
||
|
||
- Global setup and teardown are enforced by
|
||
[tests/global-setup.ts](tests/global-setup.ts),
|
||
[tests/auth.setup.ts](tests/auth.setup.ts), and
|
||
[tests/security-teardown.setup.ts](tests/security-teardown.setup.ts).
|
||
- Global setup validates the emergency token, checks health endpoints, and
|
||
resets security settings, which impacts all security-enforcement suites.
|
||
- Multiple suites depend on the emergency server (port 2020) and Cerberus
|
||
modules with explicit admin whitelist configuration.
|
||
|
||
### 2.2 Test Inventory and Feature Areas
|
||
|
||
- Core management flows: authentication, navigation, dashboard, proxy hosts,
|
||
certificates, access lists in [tests/core](tests/core).
|
||
- DNS providers and ACME workflows: [tests/dns-provider-crud.spec.ts]
|
||
(tests/dns-provider-crud.spec.ts),
|
||
[tests/dns-provider-types.spec.ts](tests/dns-provider-types.spec.ts),
|
||
[tests/manual-dns-provider.spec.ts](tests/manual-dns-provider.spec.ts).
|
||
- Monitoring: uptime and log streaming in
|
||
[tests/monitoring](tests/monitoring).
|
||
- Settings: system, account, SMTP, notifications, encryption, user management
|
||
in [tests/settings](tests/settings).
|
||
- Tasks and imports: backups, Caddyfile import flows, CrowdSec import, and log
|
||
viewing in [tests/tasks](tests/tasks).
|
||
- Security UI: dashboard, WAF, CrowdSec, headers, rate limiting, and audit logs
|
||
in [tests/security](tests/security).
|
||
- Security enforcement: ACL, WAF, rate limits, CrowdSec, emergency token, and
|
||
break-glass recovery in [tests/security-enforcement](tests/security-enforcement).
|
||
- Integration workflows: cross-feature scenarios in
|
||
[tests/integration](tests/integration).
|
||
- Browser-specific regressions for import flows in
|
||
[tests/webkit-specific](tests/webkit-specific) and
|
||
[tests/firefox-specific](tests/firefox-specific).
|
||
- Debug and diagnostics: certificates and Caddy import debug coverage in
|
||
[tests/debug/certificates-debug.spec.ts](tests/debug/certificates-debug.spec.ts),
|
||
[tests/tasks/caddy-import-gaps.spec.ts](tests/tasks/caddy-import-gaps.spec.ts),
|
||
[tests/tasks/caddy-import-cross-browser.spec.ts](tests/tasks/caddy-import-cross-browser.spec.ts),
|
||
and [tests/debug](tests/debug).
|
||
- UI triage and regression coverage: dropdown/modal coverage in
|
||
[tests/modal-dropdown-triage.spec.ts](tests/modal-dropdown-triage.spec.ts) and
|
||
[tests/proxy-host-dropdown-fix.spec.ts](tests/proxy-host-dropdown-fix.spec.ts).
|
||
- Shared utilities validation: wait helpers in
|
||
[tests/utils/wait-helpers.spec.ts](tests/utils/wait-helpers.spec.ts).
|
||
|
||
### 2.3 Dependency and Ordering Constraints
|
||
|
||
- The security-enforcement suite assumes Cerberus can be toggled on, and its
|
||
final tests intentionally restore admin whitelist state
|
||
(see [tests/security-enforcement/zzzz-break-glass-recovery.spec.ts]
|
||
(tests/security-enforcement/zzzz-break-glass-recovery.spec.ts)).
|
||
- Admin whitelist blocking is designed to run last using a zzz prefix
|
||
(see [tests/security-enforcement/zzz-admin-whitelist-blocking.spec.ts]
|
||
(tests/security-enforcement/zzz-admin-whitelist-blocking.spec.ts)).
|
||
- Emergency server tests depend on port 2020 availability
|
||
(see [tests/security-enforcement/emergency-server](tests/security-enforcement/emergency-server)).
|
||
- Some import suites use real APIs and TestDataManager cleanup; others mock
|
||
requests. Remediation must avoid mixing mocked and real flows in a single
|
||
phase without clear isolation.
|
||
|
||
### 2.4 Runtime and Flake Hotspots
|
||
|
||
- Security-enforcement suites include extended retries, network propagation
|
||
delays, and rate limit loops.
|
||
- Import debug and gap-coverage suites perform real uploads, data creation, and
|
||
commit flows, making them sensitive to backend state and Caddy reload timing.
|
||
- Monitoring WebSocket tests require stable log streaming state.
|
||
|
||
## 3. Technical Specifications
|
||
|
||
### 3.1 Test Grouping and Shards
|
||
|
||
- **Foundation:** global setup, auth storage state, security teardown.
|
||
- **Core UI:** authentication, navigation, dashboard, proxy hosts, certificates,
|
||
access lists.
|
||
- **Settings:** system, account, SMTP, notifications, encryption, users.
|
||
- **Tasks:** backups, logs, Caddyfile import, CrowdSec import.
|
||
- **Monitoring:** uptime monitoring and real-time logs.
|
||
- **Security UI:** Cerberus dashboard, WAF config, headers, rate limiting,
|
||
CrowdSec config, audit logs.
|
||
- **Security Enforcement:** ACL/WAF/CrowdSec/rate limit enforcement, emergency
|
||
token and break-glass recovery, admin whitelist blocking.
|
||
- **Integration:** proxy + cert, proxy + DNS, backup restore, import workflows,
|
||
multi-feature workflows.
|
||
- **Browser-specific:** WebKit and Firefox import regressions.
|
||
- **Debug/POC:** diagnostics and investigation suites (Caddy import debug).
|
||
|
||
### 3.2 Dependency Graph (High-Level)
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
A[global-setup + auth.setup] --> B[Core UI + Settings]
|
||
A --> C[Tasks + Monitoring]
|
||
A --> D[Security UI]
|
||
D --> E[Security Enforcement]
|
||
E --> F[Break-Glass Recovery]
|
||
B --> G[Integration Workflows]
|
||
C --> G
|
||
G --> H[Browser-specific Suites]
|
||
```
|
||
|
||
### 3.3 Runtime Estimates (Docker Mode)
|
||
|
||
| Group | Suite Examples | Expected Runtime | Prerequisites |
|
||
| --- | --- | --- | --- |
|
||
| Foundation | global setup + auth | 1-2 min | Docker E2E container, emergency token |
|
||
| Core UI | core specs | 6-10 min | Auth storage state, clean data |
|
||
| Settings | settings specs | 6-10 min | Auth storage state |
|
||
| Tasks | backups/import/logs | 10-16 min | Auth storage state, API mocks and real flows |
|
||
| Monitoring | monitoring specs | 5-8 min | WebSocket stability |
|
||
| Security UI | security specs | 10-14 min | Cerberus enabled, admin whitelist |
|
||
| Security Enforcement | enforcement specs | 15-25 min | Emergency token, port 2020, admin whitelist |
|
||
| Integration | integration specs | 12-20 min | Stable core + settings + tasks |
|
||
| Browser-specific | firefox/webkit | 8-12 min | Import baseline stable |
|
||
| Debug/POC | caddy import debug | 4-6 min | Docker logs available |
|
||
|
||
Assumed worker count: 4 (default) except security-enforcement which requires
|
||
`--workers=1`. Serial execution increases runtime for enforcement suites.
|
||
|
||
### 3.4 Environment Preconditions
|
||
|
||
- E2E container built and healthy via
|
||
`.github/skills/scripts/skill-runner.sh docker-rebuild-e2e`.
|
||
- Ports 8080 (UI/API) and 2020 (emergency server) reachable.
|
||
- `CHARON_EMERGENCY_TOKEN` configured and valid.
|
||
- Admin whitelist includes test runner ranges when Cerberus is enabled.
|
||
- Caddy admin health endpoints reachable for import workflows.
|
||
|
||
### 3.5 Emergency Server and Security Prerequisites
|
||
|
||
- Port 2020 (emergency server) available and reachable for
|
||
[tests/security-enforcement/emergency-server](tests/security-enforcement/emergency-server).
|
||
- Port 2019 is reserved for the Caddy admin API; use 2020 for emergency server
|
||
tests to avoid conflicts.
|
||
- Basic Auth credentials required for emergency server tests. Defaults in test
|
||
fixtures are `admin` / `changeme` and should match the E2E compose config.
|
||
- Admin whitelist bypass must be configured before enforcement tests that
|
||
toggle Cerberus settings.
|
||
|
||
## 4. Implementation Plan
|
||
|
||
### Phase 1: Foundation and Test Harness Reliability
|
||
|
||
Objective: Ensure the shared test harness is stable before touching feature
|
||
flows.
|
||
|
||
- Validate global setup and storage state creation
|
||
(see [tests/global-setup.ts](tests/global-setup.ts) and
|
||
[tests/auth.setup.ts](tests/auth.setup.ts)).
|
||
- Confirm emergency server availability and credentials for break-glass suites.
|
||
- Establish baseline run for core login/navigation suites.
|
||
|
||
Estimated runtime: 2-4 minutes
|
||
|
||
Success criteria:
|
||
|
||
- Storage state created once and reused without re-auth flake.
|
||
- Emergency token validation passes and security reset executes.
|
||
|
||
### Phase 2: Core UI, Settings, Monitoring, and Task Flows
|
||
|
||
Objective: Remediate the highest-traffic user journeys and tasks.
|
||
|
||
- Core UI: authentication, navigation, dashboard, proxy hosts, certificates,
|
||
access lists (core CRUD and navigation).
|
||
- Settings: system, account, SMTP, notifications, encryption, users.
|
||
- Monitoring: uptime and real-time logs.
|
||
- Tasks: backups, logs viewing, and base Caddyfile import flows.
|
||
- Include modal/dropdown triage coverage and wait helpers validation.
|
||
|
||
Estimated runtime: 25-40 minutes
|
||
|
||
Success criteria:
|
||
|
||
- Core CRUD and navigation pass without retries.
|
||
- Monitoring WebSocket tests pass without timeouts.
|
||
- Backups and log viewing flows pass with mocks and deterministic waits.
|
||
|
||
### Phase 3: Security UI and Enforcement
|
||
|
||
Objective: Stabilize Cerberus UI configuration and enforcement workflows.
|
||
|
||
- Security dashboard and configuration pages.
|
||
- WAF, headers, rate limiting, CrowdSec, audit logs.
|
||
- Enforcement suites, including emergency token and whitelist blocking order.
|
||
|
||
Estimated runtime: 30-45 minutes
|
||
|
||
Success criteria:
|
||
|
||
- Security UI toggles and pages load without state leakage.
|
||
- Enforcement suites pass with Cerberus enabled and whitelist configured.
|
||
- Break-glass recovery restores bypass state for subsequent suites.
|
||
|
||
### Phase 4: Integration, Browser-Specific, and Debug Suites
|
||
|
||
Objective: Close cross-feature and browser-specific regressions.
|
||
|
||
- Integration workflows: proxy + cert, proxy + DNS, backup restore, import to
|
||
production, multi-feature workflows.
|
||
- Browser-specific Caddy import regressions (Firefox/WebKit).
|
||
- Debug/POC suites (Caddy import debug, diagnostics) run as opt-in,
|
||
including caddy-import-gaps and cross-browser import coverage.
|
||
|
||
Estimated runtime: 25-40 minutes
|
||
|
||
Success criteria:
|
||
|
||
- Integration workflows pass with stable TestDataManager cleanup.
|
||
- Browser-specific import tests show consistent API request handling.
|
||
- Debug suites remain optional and do not block core pipelines.
|
||
|
||
## 5. Acceptance Criteria (EARS)
|
||
|
||
- WHEN the E2E harness initializes, THE SYSTEM SHALL validate emergency token
|
||
and create a reusable auth state without flake.
|
||
- WHEN core management tests execute, THE SYSTEM SHALL complete CRUD flows
|
||
without manual retries or timeouts.
|
||
- WHEN security enforcement suites execute, THE SYSTEM SHALL apply Cerberus
|
||
settings with admin whitelist bypass and SHALL restore security state after
|
||
completion.
|
||
- WHEN integration workflows execute, THE SYSTEM SHALL complete cross-feature
|
||
journeys without data collisions or residual state.
|
||
|
||
## 6. Quick Start Commands
|
||
|
||
```bash
|
||
# Rebuild and start E2E container
|
||
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
|
||
|
||
# PHASE 1: Foundation
|
||
cd /projects/Charon
|
||
npx playwright test tests/global-setup.ts tests/auth.setup.ts --project=firefox
|
||
|
||
# PHASE 2: Core UI, Settings, Tasks, Monitoring
|
||
# NOTE: PLAYWRIGHT_SKIP_SECURITY_DEPS=1 is automatically set in E2E scripts
|
||
# Security suites will NOT execute as dependencies
|
||
npx playwright test tests/core --project=firefox
|
||
npx playwright test tests/settings --project=firefox
|
||
npx playwright test tests/tasks --project=firefox
|
||
npx playwright test tests/monitoring --project=firefox
|
||
|
||
# PHASE 3: Security UI and Enforcement (SERIAL)
|
||
npx playwright test tests/security --project=firefox
|
||
npx playwright test tests/security-enforcement --project=firefox --workers=1
|
||
|
||
# PHASE 4: Integration, Browser-Specific, Debug (Optional)
|
||
npx playwright test tests/integration --project=firefox
|
||
npx playwright test tests/firefox-specific --project=firefox
|
||
npx playwright test tests/webkit-specific --project=webkit
|
||
npx playwright test tests/debug --project=firefox
|
||
npx playwright test tests/tasks/caddy-import-gaps.spec.ts --project=firefox
|
||
```
|
||
|
||
## 7. Risks and Mitigations
|
||
|
||
- Risk: Security suite state leaks across tests. Mitigation: enforce admin
|
||
whitelist reset and break-glass recovery ordering.
|
||
- Risk: File-name ordering (zzz-) not enforced without `--workers=1`.
|
||
Mitigation: document `--workers=1` requirement and make it mandatory in
|
||
CI and quick-start commands.
|
||
- Risk: Emergency server unavailable. Mitigation: gate enforcement suites on
|
||
health checks and document port 2020 requirements.
|
||
- Risk: Import suites combine mocked and real flows. Mitigation: isolate by
|
||
phase and keep debug suites opt-in.
|
||
- Risk: Missing test suites hide regressions. Mitigation: inventory now
|
||
includes all suites and maps them to phases.
|
||
|
||
## 8. Dependencies and Impacted Files
|
||
|
||
- Harness: [tests/global-setup.ts](tests/global-setup.ts),
|
||
[tests/auth.setup.ts](tests/auth.setup.ts),
|
||
[tests/security-teardown.setup.ts](tests/security-teardown.setup.ts).
|
||
- Core UI: [tests/core](tests/core).
|
||
- Settings: [tests/settings](tests/settings).
|
||
- Tasks: [tests/tasks](tests/tasks).
|
||
- Monitoring: [tests/monitoring](tests/monitoring).
|
||
- Security UI: [tests/security](tests/security).
|
||
- Security enforcement: [tests/security-enforcement](tests/security-enforcement).
|
||
- Integration: [tests/integration](tests/integration).
|
||
- Browser-specific: [tests/firefox-specific](tests/firefox-specific),
|
||
[tests/webkit-specific](tests/webkit-specific).
|
||
|
||
## 9. Confidence Score
|
||
|
||
Confidence: 79 percent
|
||
|
||
Rationale: The suite inventory and dependencies are well understood. The main
|
||
unknowns are timing-sensitive security propagation and emergency server
|
||
availability in varied environments.
|
||
|
||
## Review Feedback & Required Additions
|
||
|
||
Summary: the spec is thorough and well-structured but is missing several concrete
|
||
forensic and reproduction details needed to reliably diagnose shard timeouts
|
||
and to make CI-side fixes repeatable. The items below add those missing
|
||
artifacts, commands, and prioritized mitigations.
|
||
|
||
1) Test-forensics (how to analyze Playwright traces & map failing tests to shards)
|
||
- Extract and open traces per-shard: unzip the artifact and run:
|
||
```bash
|
||
unzip e2e-shard-<INDEX>-output/trace.zip -d /tmp/trace-INDEX
|
||
npx playwright show-trace /tmp/trace-INDEX
|
||
```
|
||
- Use JSON reporter to map test IDs to trace files and timestamps:
|
||
```bash
|
||
# run locally to produce a reporter JSON for the shard
|
||
npx playwright test --shard=INDEX/TOTAL --project=chromium --reporter=json --output=/tmp/playwright-shard-INDEX --trace=on > /tmp/playwright-shard-INDEX.json
|
||
jq '.suites[].specs[]?.tests[] | {title: .title, file: .location.file, line: .location.line, duration: .duration, annotations: .annotations}' /tmp/playwright-shard-INDEX.json
|
||
```
|
||
- Correlate test start/stop timestamps (from reporter JSON) with job logs and container logs to find the precise point where execution stopped.
|
||
- If only one test is hanging, use `--grep` or `--file` to re-run that test with `--trace=on --debug=pw:api` and capture trace and stdout.
|
||
|
||
2) CI / Workflow checks (where to inspect timeouts and cancellation causes)
|
||
- Inspect `.github/workflows/*.yml` for both top-level `timeout-minutes:` and job-level `jobs.<job>.timeout-minutes`.
|
||
```bash
|
||
grep -n "timeout-minutes" .github/workflows -R || true
|
||
```
|
||
- From the run/job JSON (API) check `status` and `conclusion` fields and `cancelled_by` / `cancelled_at` times:
|
||
```bash
|
||
curl -H "Authorization: token $GITHUB_TOKEN" \
|
||
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID" | jq '.'
|
||
```
|
||
- Search job logs for runner messages indicating preemption, OOM, or cancellation:
|
||
```bash
|
||
grep -iE "Job canceled|cancelled|runner lost|Runner|Killed|OOM|oom_reaper|Timeout" -R job-$JOB_ID-logs || true
|
||
```
|
||
- Confirm whether the runner was `self-hosted` (job JSON `runner_name` / `runner_group_id`). If self-hosted, collect `journalctl` and docker host logs for the timestamp window.
|
||
|
||
3) Reproduction instructions (how to reproduce the shard locally exactly)
|
||
- Rebuild image used by CI (recommended to match CI):
|
||
```bash
|
||
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
|
||
```
|
||
- Start E2E environment (use the same compose used in CI):
|
||
```bash
|
||
docker compose -f containers/charon/docker-compose.yml up -d
|
||
```
|
||
- Environment variables to set (use the values CI uses):
|
||
- `PLAYWRIGHT_BASE_URL` – CI base URL (e.g. `http://localhost:8080` for Docker mode; `http://localhost:5173` for Vite dev).
|
||
- `CHARON_EMERGENCY_TOKEN` – emergency token used by tests.
|
||
- `PLAYWRIGHT_JOBS` or `PWDEBUG` as needed: `DEBUG=pw:api PWDEBUG=1`.
|
||
- Optional toggles used in CI: `PLAYWRIGHT_SKIP_SECURITY_DEPS=1`.
|
||
- Exact shard reproduction command (example matching CI):
|
||
```bash
|
||
export PLAYWRIGHT_BASE_URL=http://localhost:8080
|
||
export CHARON_EMERGENCY_TOKEN=changeme
|
||
DEBUG=pw:api PWDEBUG=1 \
|
||
npx playwright test --shard=INDEX/TOTAL --project=chromium \
|
||
--output=/tmp/playwright-shard-INDEX --reporter=json --trace=on > /tmp/playwright-shard-INDEX.log 2>&1
|
||
```
|
||
- To re-run a single failing test found in JSON:
|
||
```bash
|
||
npx playwright test tests/path/to/spec.ts -g "Exact test title" --project=chromium --trace=on --output=/tmp/playwright-single
|
||
```
|
||
|
||
4) Required artifacts & evidence to collect (exact list and commands)
|
||
- Per-shard Playwright outputs: `trace.zip`, `video/*`, `test-results.json` or `reporter json` and shard stdout/stderr log. Ensure `--output` points to shard-specific path and upload as artifact.
|
||
- Job-level artifacts: GitHub Actions run logs ZIP, job logs ZIP, `gh run download` output.
|
||
- Runner/host diagnostics (self-hosted): `journalctl -u actions.runner.*`, `dmesg | grep -i oom`, `sudo journalctl -u docker.service`, `docker ps -a`, `docker logs --since` for charon-e2e and caddy.
|
||
- Capture a timestamped mapping file that lists: job start, shard start, last test start, last trace timestamp, job end. Example CSV header: `job_id,job_start,shard_index,shard_start, last_test_started_at, job_end, conclusion`.
|
||
- Attach a minimal repro package: Docker image tag, docker-compose file, the exact Playwright command-line, and the failing test id/title.
|
||
|
||
5) Prioritization of fixes and quick mitigations (concrete)
|
||
- P0 (Immediate unblock):
|
||
- Temporarily increase `timeout-minutes` to 60 for failing workflow; add `if: always()` diagnostics step and artifact upload.
|
||
- Ensure each shard uses `--output` per-shard and is uploaded (`actions/upload-artifact`) so traces are available even on cancellation.
|
||
- Re-run failing shard locally with `DEBUG=pw:api PWDEBUG=1` and collect traces.
|
||
- P1 (Same-day):
|
||
- Add CI smoke healthcheck step that validates UI and emergency server before shards start (quick `curl` checks and a small Playwright smoke test).
|
||
- If self-hosted runner, add simple resource guard (systemd service restart prevention) and OOM monitoring alert.
|
||
- Configure Playwright retries for flaky tests (small number) and mark expensive suites as `--workers=1`.
|
||
- P2 (Next sprint):
|
||
- Implement historical-duration-based shard splitting to avoid heavy concentration in one shard.
|
||
- Add test-level tagging and targeted prioritization for long-running security-enforcement suites.
|
||
- Add CI-level telemetry: test-duration history, flaky-test dashboard.
|
||
|
||
Verdict: NEEDS CHANGES — the existing spec is a solid base, but add the forensic commands, reproducible shard reproduction steps, explicit artifact list, and CI checks above before marking this plan approved.
|
||
|
||
Actionable next steps (short list):
|
||
- Add the `always()` diagnostics step to `.github/workflows/<e2e-workflow>.yml` and upload diagnostics as artifacts.
|
||
- Modify the E2E job to set `--output` to `e2e-shard-${{ matrix.index }}-output` and upload that path.
|
||
- Run `gh run download 21865692694` and extract the per-job logs; parse the job JSON to determine if the runner was self-hosted and collect host logs if so.
|
||
- Reproduce the failing shard locally using the exact commands above and attach `trace.zip` and JSON reporter output to the issue.
|
||
|
||
If you want, I can apply the small CI YAML snippets (diagnostics + upload) as a targeted patch or download the run artifacts now (requires `GITHUB_TOKEN`).
|