fix: enhance notifications and validation features

- Added URL validation for notification providers to ensure only valid http/https URLs are accepted. - Implemented tests for URL validation scenarios in the Notifications component. - Updated translations for error messages related to invalid URLs in multiple languages. - Introduced new hooks for managing security headers and access lists in tests. - Enhanced the ProviderForm component to reset state correctly when switching between add and edit modes. - Improved user feedback with update indicators after saving changes to notification providers. - Added mock implementations for new hooks in various test files to ensure consistent testing behavior.
2026-02-10 22:01:45 +00:00
parent d29b8e9ce4
commit 2b2d907b0c
39 changed files with 2953 additions and 619 deletions
@@ -1,3 +1,394 @@
+# E2E Playwright Shard Timeout Investigation — Current Spec
+
+Last updated: 2026-02-10
+
+## Goal
+- Concise summary: investigate GitHub Actions run https://github.com/Wikid82/Charon/actions/runs/21865692694 where the E2E Playwright job reports Shard 3 stopping at ~30 minutes despite configured timeouts of ~40 minutes. Produce reproducible diagnostics, collect artifacts/logs, identify root cause hypotheses, and provide prioritized remediations and short-term unblock steps.
+
+## Phases
+- Discover: collect logs and artifacts.
+- Analyze: review config and correlate shard → tests.
+- Remediate: short-term and long-term fixes.
+- Verify: reproduce and confirm the fix.
+
+---
+
+## 1) Discover — exact places to collect logs & artifacts
+
+### GitHub Actions (run-level)
+- Run page: https://github.com/Wikid82/Charon/actions/runs/21865692694
+- Run logs (zip): GET https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/logs
+  - Programmatic commands:
+    ```bash
+    export GITHUB_OWNER=Wikid82
+    export GITHUB_REPO=Charon
+    export RUN_ID=21865692694
+    # Requires GITHUB_TOKEN set with repo access
+    curl -H "Accept: application/vnd.github+json" \
+      -H "Authorization: token $GITHUB_TOKEN" \
+      -L "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/logs" \
+      -o run-${RUN_ID}-logs.zip
+    unzip -d run-${RUN_ID}-logs run-${RUN_ID}-logs.zip
+    ```
+- Artifacts list (API):
+  ```bash
+  curl -H "Authorization: token $GITHUB_TOKEN" \
+    "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/artifacts" | jq '.'
+  ```
+- gh CLI (interactive/script):
+  ```bash
+  gh run view $RUN_ID --repo $GITHUB_OWNER/$GITHUB_REPO --log > run-$RUN_ID-summary.log
+  gh run download $RUN_ID --repo $GITHUB_OWNER/$GITHUB_REPO --dir artifacts-$RUN_ID
+  ```
+
+### GitHub Actions (job-level)
+- List jobs for the run and find Playwright shard job(s):
+  ```bash
+  curl -H "Authorization: token $GITHUB_TOKEN" \
+    "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
+  ```
+- For JOB_ID identified as the shard job, download job logs:
+  ```bash
+  curl -H "Authorization: token $GITHUB_TOKEN" -L \
+    "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID/logs" -o job-${JOB_ID}-logs.zip
+  unzip -d job-${JOB_ID}-logs job-${JOB_ID}-logs.zip
+  ```
+
+### Playwright test outputs used by this project
+- Search and collect the following files in the repo root (or workflow-run directories):
+  - `playwright.config.ts`, `playwright.config.js`, `playwright.config.mjs`
+  - `package.json` scripts invoking Playwright (e.g., `test:e2e`, `e2e:ci`)
+  - `.github/workflows/*` steps that run Playwright
+- Typical Playwright outputs to collect (per-shard):
+  - `<outputDir>/trace.zip`
+  - `<outputDir>/test-results.json` or `test-results/*`
+  - `<outputDir>/video/*`
+  - `<outputDir>/*.log` (stdout/stderr)
+
+Observed local example (for context): the developer ran
+`npx playwright test --project=chromium --output=/tmp/playwright-chromium-output --reporter=list > /tmp/playwright-chromium.log 2>&1` — look for similar invocations in workflows/scripts.
+
+### Repository container logs (containers/)
+- containers/charon:
+  - Files to check: `containers/charon/docker-compose.yml`, any `logs/` or `data/` directories under `containers/charon/`.
+  - Local commands (when reproducing):
+    ```bash
+    docker compose -f containers/charon/docker-compose.yml logs --no-color --timestamps > containers-charon-logs.txt
+    docker logs --timestamps --since "1h" charon-e2e > charon-e2e.log 2>&1 || true
+    ```
+- containers/caddy:
+  - Files: `containers/caddy/Caddyfile`, `containers/caddy/config/`, `containers/caddy/logs/`
+  - Local checks:
+    ```bash
+    docker logs --timestamps caddy > caddy.log 2>&1 || true
+    curl -sS http://127.0.0.1:2019/ || true  # admin
+    curl -sS http://127.0.0.1:2020/ || true  # emergency
+    ```
+
+---
+
+## 2) Analyze — specific files and config to review (exact paths)
+
+- Workflows (search these paths):
+  - `.github/workflows/*.yml` — likely candidates: `.github/workflows/e2e.yml`, `.github/workflows/ci.yml`, `.github/workflows/playwright.yml` (run `grep -R "playwright" .github/workflows || true`).
+  - Look for `timeout-minutes:` either at top-level workflow or under `jobs:<job>.timeout-minutes`.
+
+- Playwright config files:
+  - `/projects/Charon/playwright.config.ts`
+  - `/projects/Charon/playwright.config.js`
+  - `/projects/Charon/playwright.config.mjs`
+  - Inspect `projects`, `workers`, `retries`, `outputDir`, `reporter` sections.
+
+- package.json and scripts:
+  - `/projects/Charon/package.json` — inspect `scripts` for e.g. `test:e2e`, `e2e:ci` and the exact Playwright CLI flags used by CI.
+
+- GitHub skill scripts & E2E runner:
+  - `.github/skills/scripts/skill-runner.sh` — used in `docs` and testing instructions; check for `docker-rebuild-e2e`, `test-e2e-playwright-coverage`.
+  - Commands:
+    ```bash
+    sed -n '1,240p' .github/skills/scripts/skill-runner.sh
+    grep -n "docker-rebuild-e2e\|test-e2e-playwright-coverage\|playwright" -n .github/skills || true
+    ```
+
+- Makefile:
+  - `/projects/Charon/Makefile` — search for targets related to `e2e`, `playwright`, `rebuild`.
+
+---
+
+## 3) Steps to download GitHub Actions logs & artifacts for run 21865692694
+
+### Programmatic (API)
+1. List artifacts for run:
+```bash
+curl -H "Authorization: token $GITHUB_TOKEN" \
+  "https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/artifacts" | jq '.'
+```
+2. Download run logs (zip):
+```bash
+curl -H "Authorization: token $GITHUB_TOKEN" -L \
+  "https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/logs" -o run-21865692694-logs.zip
+unzip -d run-21865692694-logs run-21865692694-logs.zip
+```
+3. List jobs to find Playwright shard job id(s):
+```bash
+curl -H "Authorization: token $GITHUB_TOKEN" \
+  "https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
+```
+4. Download job logs by JOB_ID:
+```bash
+curl -H "Authorization: token $GITHUB_TOKEN" -L \
+  "https://api.github.com/repos/Wikid82/Charon/actions/jobs/$JOB_ID/logs" -o job-$JOB_ID-logs.zip
+unzip -d job-$JOB_ID-logs job-$JOB_ID-logs.zip
+```
+
+### Using gh CLI
+```bash
+gh run view 21865692694 --repo Wikid82/Charon --log > run-21865692694-summary.log
+gh run download 21865692694 --repo Wikid82/Charon --dir artifacts-21865692694
+```
+
+### Manual web UI
+- Visit run page and download artifacts and job logs from the job view.
+
+---
+
+## 4) How to locate shard-specific logs and correlate shard indices to tests
+
+- Typical patterns to inspect:
+  - Look for Playwright CLI flags in the job step (e.g., `--shard=INDEX/TOTAL`, `--output=/tmp/...`).
+  - If the job ran `npx playwright test --output=/tmp/...`, search the downloaded job logs for that exact command to find the shard index.
+
+- Commands to list tests assigned to a shard (dry-run):
+```bash
+# Show which tests a given shard would run (no execution)
+npx playwright test --list --shard=INDEX/TOTAL
+
+# Or run with reporter=list (shows test items as executed)
+npx playwright test --shard=INDEX/TOTAL --reporter=list
+```
+
+- Note: Playwright shard index is zero-based. If CI logs show `--shard=3/4`, double-check whether the team used zero-based numbering; confirm by re-running the `--list` command.
+
+Expected per-shard artifact names (if implemented):
+- `e2e-shard-<INDEX>-output` containing `trace.zip`, `video/*`, `test-results.json`, and shard-specific logs (stdout/stderr files).
+
+---
+
+## 5) Runner/container logs to inspect
+
+- GitHub-hosted runner: review the Actions job logs for runner messages and any `Runner` diagnostic lines. You cannot access host-level logs.
+
+- Self-hosted runner (if used): retrieve host system logs (requires access to runner host):
+  ```bash
+  sudo journalctl -u actions.runner.* -n 1000 > runner-service-journal.log
+  sudo journalctl -k --since "1 hour ago" | grep -i oom > runner-kernel-oom.log || true
+  sudo journalctl -u docker.service -n 200 > docker-journal.log
+  ```
+
+- Docker container logs (charon, caddy, charon-e2e):
+  ```bash
+  docker ps -a --filter "name=charon" --format "{{.Names}} {{.Status}}" > containers-ps.txt
+  docker logs --since "1h" charon-e2e > charon-e2e.log 2>&1 || true
+  docker logs --since "1h" caddy > caddy.log 2>&1 || true
+  ```
+
+Check Caddy admin/emergency ports (2019 & 2020) to confirm the proxy was healthy during the test run:
+```bash
+curl -sS --max-time 5 http://127.0.0.1:2019/ || echo "admin not responding"
+curl -sS --max-time 5 http://127.0.0.1:2020/ || echo "emergency not responding"
+```
+
+---
+
+## 6) Hypotheses for why Shard 3 stopped at ~30m (descriptions + exact artifacts to search)
+
+H1 — Workflow/job timeout configured smaller than expected
+- Search:
+  - `.github/workflows/*` for `timeout-minutes:`
+  - job logs for `Timeout` or `Job execution time exceeded`
+- Commands:
+  ```bash
+  grep -n "timeout-minutes" .github/workflows -R || true
+  grep -i "timeout" -R run-${RUN_ID}-logs || true
+  ```
+- Confirmed by: `timeout-minutes: 30` or job logs showing `aborting execution due to timeout`.
+
+H2 — Runner preemption / connection loss
+- Search job logs for: `Runner lost`, `The runner has been shutdown`, `Connection to the server was lost`.
+- Commands:
+  ```bash
+  grep -iE "runner lost|runner.*shutdown|connection.*lost|Job canceled|cancelled by" -R run-${RUN_ID}-logs || true
+  ```
+- Confirmed by: runner disconnect lines and abrupt end of logs with no Playwright stack trace.
+
+H3 — E2E environment container (charon/caddy) died or became unhealthy
+- Search container logs for crash/fatal/panic messages and timestamps matching the job stop time.
+- Commands:
+  ```bash
+  docker ps -a --filter "name=charon" --format '{{.Names}} {{.Status}}'
+  docker logs charon-e2e --since "2h" | sed -n '1,200p'
+  grep -iE "panic|fatal|segfault|exited|health.*unhealthy|503|502" containers -R || true
+  ```
+- Confirmed by: container exit matching job finish time and Caddy returning 502/503 during run.
+
+H4 — Playwright/Node process killed by OOM
+- Search for `Killed`, kernel `oom_reaper` lines, system `dmesg` outputs.
+- Commands:
+  ```bash
+  grep -R "Killed" job-${JOB_ID}-logs || true
+  # on self-hosted runner host
+  sudo journalctl -k --since '2 hours ago' | grep -i oom || true
+  ```
+- Confirmed by: kernel OOM logs at same timestamp or `Killed` in job logs.
+
+H5 — Script-level early timeout (explicit `timeout 30m` or `kill`)
+- Search `.github/skills` and workflow steps for `timeout 30m`, `timeout 1800`, or `kill` calls.
+- Commands:
+  ```bash
+  grep -R "\btimeout\b\|kill -9\|kill -15\|pkill" -n .github || true
+  ```
+- Confirmed by: a script with `timeout 30m` or similar wrapper used in the job.
+
+H6 — Misinterpreted units or mis-configuration (seconds vs minutes)
+- Search for numeric values used in scripts and steps (e.g., `1800` used where minutes expected).
+- Commands:
+  ```bash
+  grep -R "\b1800\b\|\b3600\b\|timeout-minutes" -n .github || true
+  ```
+- Confirmed by: a value of `1800` where `timeout-minutes` or similar was expected to be minutes.
+
+For each hypothesis, the exact lines/entries returned by the grep/journal/docker commands are the evidence to confirm or refute it. Keep timestamps to correlate with the job start/completion times in the run logs.
+
+---
+
+## 7) Prioritized remediation plan (short-term → long-term)
+
+### Short-term (unblock re-runs quickly)
+1. Download and attach all logs/artifacts for run 21865692694 (use `gh run download`) and share with E2E test author.
+2. Temporarily bump `timeout-minutes` for the failing workflow to 60 to allow full runs while diagnosing.
+3. Add an `if: always()` step to the E2E job that collects diagnostics and uploads them as artifacts (free memory, `dmesg`, `ps aux`, `docker ps -a`, `docker logs charon-e2e`).
+4. Re-run just the failing shard with added `DEBUG=pw:api` and `PWDEBUG=1` and persist shard outputs.
+
+### Medium-term
+1. Persist per-shard Playwright outputs via `actions/upload-artifact@v4` for traces/videos/test-results.
+2. Add Playwright `retries` for transient failures and `--trace`/`--video` options.
+3. Add a CI smoke check before full shard execution to confirm env health.
+4. If self-hosted, add runner health checks and alerting (memory, disk, Docker status).
+
+### Long-term
+1. Implement stable test splitting based on historical test durations rather than equal-file sharding.
+2. Introduce resource constraints and monitoring to protect against OOM and flapping containers.
+3. Build a golden-minimal E2E smoke job that must pass before running full shards.
+
+---
+
+## 8) Minimal reproduction checklist (local)
+
+1. Rebuild E2E image used by CI (per repo skill):
+```bash
+.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
+```
+2. Start the environment (example):
+```bash
+docker compose -f containers/charon/docker-compose.yml up -d
+```
+3. Set base URL and run the same shard (replace INDEX/TOTAL with values from CI):
+```bash
+export PLAYWRIGHT_BASE_URL=http://localhost:5173
+DEBUG=pw:api PWDEBUG=1 \
+  npx playwright test --shard=INDEX/TOTAL --project=chromium \
+  --output=/tmp/playwright-shard-INDEX --reporter=list > /tmp/playwright-shard-INDEX.log 2>&1
+```
+4. If reproducing a timeout, immediately collect:
+```bash
+docker ps -a --format '{{.Names}} {{.Status}}' > reproduce-docker-ps.txt
+docker logs --since '1h' charon-e2e > reproduce-charon-e2e.log || true
+tail -n 500 /tmp/playwright-shard-INDEX.log > reproduce-pw-tail.log
+```
+
+---
+
+## 9) Required workflow/scripts changes to improve diagnostics & prevent recurrence
+
+- Add `timeout-minutes: 60` to `.github/workflows/<e2e workflow>.yml` while diagnosing; later set to a reasoned SLA (e.g., 50m).
+- Add an `always()` step to collect diagnostics on failure and upload artifacts. Example YAML snippet:
+  ```yaml
+  - name: Collect diagnostics
+    if: always()
+    run: |
+      uptime > uptime.txt
+      free -m > free-m.txt
+      df -h > df-h.txt
+      ps aux > ps-aux.txt
+      docker ps -a > docker-ps.txt || true
+      docker logs --tail 500 charon-e2e > docker-charon-e2e.log || true
+  - uses: actions/upload-artifact@v4
+    with:
+      name: e2e-diagnostics-${{ github.run_id }}
+      path: |
+        uptime.txt
+        free-m.txt
+        df-h.txt
+        ps-aux.txt
+        docker-ps.txt
+        docker-charon-e2e.log
+  ```
+
+- Ensure each Playwright shard runs with `--output` pointing to a shard-specific path and upload that path as artifact:
+  - artifact name convention: `e2e-shard-${{ matrix.index }}-output`.
+
+---
+
+## 10) People/roles to notify & recommended next actions
+
+- Notify:
+  - CI/Infra owner or person in `CODEOWNERS` for `.github/workflows`
+  - E2E test author(s) (owners of failing tests)
+  - Self-hosted runner owner (if runner_name in job JSON indicates self-hosted)
+
+- Recommended immediate actions for them:
+  1. Download run artifacts and job logs for run 21865692694 and share them with the test author.
+  2. Re-run the shard with `DEBUG=pw:api` and `PWDEBUG=1` enabled and ensure per-shard artifacts are uploaded.
+  3. If self-hosted, check runner host kernel logs for OOM and Docker container exits at the job time.
+
+---
+
+## 11) Verification steps (post-remediation)
+
+1. Re-run E2E workflow end-to-end; verify Shard 3 completes.
+2. Confirm artifacts `e2e-shard-3-output` exist and contain `trace.zip`, `video/*`, and `test-results.json`.
+3. Confirm no `oom_reaper` or `Killed` messages in runner host logs during the run.
+
+---
+
+## Appendix — quick extraction commands summary
+```bash
+# Download all artifacts and logs for RUN_ID
+gh run download 21865692694 --repo Wikid82/Charon --dir ./artifacts-21865692694
+
+# List jobs and find Playwright shard job(s)
+curl -H "Authorization: token $GITHUB_TOKEN" \
+  "https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
+
+# Download job logs for JOB_ID
+curl -H "Authorization: token $GITHUB_TOKEN" -L \
+  "https://api.github.com/repos/Wikid82/Charon/actions/jobs/$JOB_ID/logs" -o job-$JOB_ID-logs.zip
+unzip -d job-$JOB_ID-logs job-$JOB_ID-logs.zip
+
+# Grep for likely causes
+grep -iE "timeout|minut|runner lost|cancelled|Killed|OOM|oom_reaper|Out of memory|panic|fatal" -R run-21865692694-logs || true
+```
+
+---
+
+## Next three immediate actions (checklist)
+1. Run `gh run download 21865692694 --repo Wikid82/Charon --dir ./artifacts-21865692694` and unzip the run logs.
+2. Search the downloaded logs for `timeout-minutes`, `Runner lost`, `Killed`, and `oom_reaper` to triage H1–H4.
+3. Re-run the failing shard locally with `DEBUG=pw:api PWDEBUG=1` and `--output=/tmp/playwright-shard-INDEX`, capture outputs, and upload them as artifacts.
+
+---
+
+If you want, I can now (A) download the run artifacts & logs for run 21865692694 using gh/API (requires your GITHUB_TOKEN) and list the job IDs, or (B) open the workflow files in `.github/workflows` and search for `timeout-minutes` and Playwright invocations. Which would you like me to do first?
 ---
 post_title: "E2E Test Remediation Plan"
 author1: "Charon Team"
@@ -312,3 +703,99 @@ Confidence: 79 percent
 Rationale: The suite inventory and dependencies are well understood. The main
 unknowns are timing-sensitive security propagation and emergency server
 availability in varied environments.
+
+## Review Feedback & Required Additions
+
+Summary: the spec is thorough and well-structured but is missing several concrete
+forensic and reproduction details needed to reliably diagnose shard timeouts
+and to make CI-side fixes repeatable. The items below add those missing
+artifacts, commands, and prioritized mitigations.
+
+1) Test-forensics (how to analyze Playwright traces & map failing tests to shards)
+- Extract and open traces per-shard: unzip the artifact and run:
+   ```bash
+   unzip e2e-shard-<INDEX>-output/trace.zip -d /tmp/trace-INDEX
+   npx playwright show-trace /tmp/trace-INDEX
+   ```
+- Use JSON reporter to map test IDs to trace files and timestamps:
+   ```bash
+   # run locally to produce a reporter JSON for the shard
+   npx playwright test --shard=INDEX/TOTAL --project=chromium --reporter=json --output=/tmp/playwright-shard-INDEX --trace=on > /tmp/playwright-shard-INDEX.json
+   jq '.suites[].specs[]?.tests[] | {title: .title, file: .location.file, line: .location.line, duration: .duration, annotations: .annotations}' /tmp/playwright-shard-INDEX.json
+   ```
+- Correlate test start/stop timestamps (from reporter JSON) with job logs and container logs to find the precise point where execution stopped.
+- If only one test is hanging, use `--grep` or `--file` to re-run that test with `--trace=on --debug=pw:api` and capture trace and stdout.
+
+2) CI / Workflow checks (where to inspect timeouts and cancellation causes)
+- Inspect `.github/workflows/*.yml` for both top-level `timeout-minutes:` and job-level `jobs.<job>.timeout-minutes`.
+   ```bash
+   grep -n "timeout-minutes" .github/workflows -R || true
+   ```
+- From the run/job JSON (API) check `status` and `conclusion` fields and `cancelled_by` / `cancelled_at` times:
+   ```bash
+   curl -H "Authorization: token $GITHUB_TOKEN" \
+      "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID" | jq '.'
+   ```
+- Search job logs for runner messages indicating preemption, OOM, or cancellation:
+   ```bash
+   grep -iE "Job canceled|cancelled|runner lost|Runner|Killed|OOM|oom_reaper|Timeout" -R job-$JOB_ID-logs || true
+   ```
+- Confirm whether the runner was `self-hosted` (job JSON `runner_name` / `runner_group_id`). If self-hosted, collect `journalctl` and docker host logs for the timestamp window.
+
+3) Reproduction instructions (how to reproduce the shard locally exactly)
+- Rebuild image used by CI (recommended to match CI):
+   ```bash
+   .github/skills/scripts/skill-runner.sh docker-rebuild-e2e
+   ```
+- Start E2E environment (use the same compose used in CI):
+   ```bash
+   docker compose -f containers/charon/docker-compose.yml up -d
+   ```
+- Environment variables to set (use the values CI uses):
+   - `PLAYWRIGHT_BASE_URL` – CI base URL (e.g. `http://localhost:8080` for Docker mode; `http://localhost:5173` for Vite dev).
+   - `CHARON_EMERGENCY_TOKEN` – emergency token used by tests.
+   - `PLAYWRIGHT_JOBS` or `PWDEBUG` as needed: `DEBUG=pw:api PWDEBUG=1`.
+   - Optional toggles used in CI: `PLAYWRIGHT_SKIP_SECURITY_DEPS=1`.
+- Exact shard reproduction command (example matching CI):
+   ```bash
+   export PLAYWRIGHT_BASE_URL=http://localhost:8080
+   export CHARON_EMERGENCY_TOKEN=changeme
+   DEBUG=pw:api PWDEBUG=1 \
+      npx playwright test --shard=INDEX/TOTAL --project=chromium \
+         --output=/tmp/playwright-shard-INDEX --reporter=json --trace=on > /tmp/playwright-shard-INDEX.log 2>&1
+   ```
+- To re-run a single failing test found in JSON:
+   ```bash
+   npx playwright test tests/path/to/spec.ts -g "Exact test title" --project=chromium --trace=on --output=/tmp/playwright-single
+   ```
+
+4) Required artifacts & evidence to collect (exact list and commands)
+- Per-shard Playwright outputs: `trace.zip`, `video/*`, `test-results.json` or `reporter json` and shard stdout/stderr log. Ensure `--output` points to shard-specific path and upload as artifact.
+- Job-level artifacts: GitHub Actions run logs ZIP, job logs ZIP, `gh run download` output.
+- Runner/host diagnostics (self-hosted): `journalctl -u actions.runner.*`, `dmesg | grep -i oom`, `sudo journalctl -u docker.service`, `docker ps -a`, `docker logs --since` for charon-e2e and caddy.
+- Capture a timestamped mapping file that lists: job start, shard start, last test start, last trace timestamp, job end. Example CSV header: `job_id,job_start,shard_index,shard_start, last_test_started_at, job_end, conclusion`.
+- Attach a minimal repro package: Docker image tag, docker-compose file, the exact Playwright command-line, and the failing test id/title.
+
+5) Prioritization of fixes and quick mitigations (concrete)
+- P0 (Immediate unblock):
+   - Temporarily increase `timeout-minutes` to 60 for failing workflow; add `if: always()` diagnostics step and artifact upload.
+   - Ensure each shard uses `--output` per-shard and is uploaded (`actions/upload-artifact`) so traces are available even on cancellation.
+   - Re-run failing shard locally with `DEBUG=pw:api PWDEBUG=1` and collect traces.
+- P1 (Same-day):
+   - Add CI smoke healthcheck step that validates UI and emergency server before shards start (quick `curl` checks and a small Playwright smoke test).
+   - If self-hosted runner, add simple resource guard (systemd service restart prevention) and OOM monitoring alert.
+   - Configure Playwright retries for flaky tests (small number) and mark expensive suites as `--workers=1`.
+- P2 (Next sprint):
+   - Implement historical-duration-based shard splitting to avoid heavy concentration in one shard.
+   - Add test-level tagging and targeted prioritization for long-running security-enforcement suites.
+   - Add CI-level telemetry: test-duration history, flaky-test dashboard.
+
+Verdict: NEEDS CHANGES — the existing spec is a solid base, but add the forensic commands, reproducible shard reproduction steps, explicit artifact list, and CI checks above before marking this plan approved.
+
+Actionable next steps (short list):
+- Add the `always()` diagnostics step to `.github/workflows/<e2e-workflow>.yml` and upload diagnostics as artifacts.
+- Modify the E2E job to set `--output` to `e2e-shard-${{ matrix.index }}-output` and upload that path.
+- Run `gh run download 21865692694` and extract the per-job logs; parse the job JSON to determine if the runner was self-hosted and collect host logs if so.
+- Reproduce the failing shard locally using the exact commands above and attach `trace.zip` and JSON reporter output to the issue.
+
+If you want, I can apply the small CI YAML snippets (diagnostics + upload) as a targeted patch or download the run artifacts now (requires `GITHUB_TOKEN`).