fix: enhance notifications and validation features

- Added URL validation for notification providers to ensure only valid http/https URLs are accepted.
- Implemented tests for URL validation scenarios in the Notifications component.
- Updated translations for error messages related to invalid URLs in multiple languages.
- Introduced new hooks for managing security headers and access lists in tests.
- Enhanced the ProviderForm component to reset state correctly when switching between add and edit modes.
- Improved user feedback with update indicators after saving changes to notification providers.
- Added mock implementations for new hooks in various test files to ensure consistent testing behavior.
This commit is contained in:
GitHub Actions
2026-02-10 22:01:45 +00:00
parent d29b8e9ce4
commit 2b2d907b0c
39 changed files with 2953 additions and 619 deletions
+487
View File
@@ -1,3 +1,394 @@
# E2E Playwright Shard Timeout Investigation — Current Spec
Last updated: 2026-02-10
## Goal
- Concise summary: investigate GitHub Actions run https://github.com/Wikid82/Charon/actions/runs/21865692694 where the E2E Playwright job reports Shard 3 stopping at ~30 minutes despite configured timeouts of ~40 minutes. Produce reproducible diagnostics, collect artifacts/logs, identify root cause hypotheses, and provide prioritized remediations and short-term unblock steps.
## Phases
- Discover: collect logs and artifacts.
- Analyze: review config and correlate shard → tests.
- Remediate: short-term and long-term fixes.
- Verify: reproduce and confirm the fix.
---
## 1) Discover — exact places to collect logs & artifacts
### GitHub Actions (run-level)
- Run page: https://github.com/Wikid82/Charon/actions/runs/21865692694
- Run logs (zip): GET https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/logs
- Programmatic commands:
```bash
export GITHUB_OWNER=Wikid82
export GITHUB_REPO=Charon
export RUN_ID=21865692694
# Requires GITHUB_TOKEN set with repo access
curl -H "Accept: application/vnd.github+json" \
-H "Authorization: token $GITHUB_TOKEN" \
-L "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/logs" \
-o run-${RUN_ID}-logs.zip
unzip -d run-${RUN_ID}-logs run-${RUN_ID}-logs.zip
```
- Artifacts list (API):
```bash
curl -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/artifacts" | jq '.'
```
- gh CLI (interactive/script):
```bash
gh run view $RUN_ID --repo $GITHUB_OWNER/$GITHUB_REPO --log > run-$RUN_ID-summary.log
gh run download $RUN_ID --repo $GITHUB_OWNER/$GITHUB_REPO --dir artifacts-$RUN_ID
```
### GitHub Actions (job-level)
- List jobs for the run and find Playwright shard job(s):
```bash
curl -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
```
- For JOB_ID identified as the shard job, download job logs:
```bash
curl -H "Authorization: token $GITHUB_TOKEN" -L \
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID/logs" -o job-${JOB_ID}-logs.zip
unzip -d job-${JOB_ID}-logs job-${JOB_ID}-logs.zip
```
### Playwright test outputs used by this project
- Search and collect the following files in the repo root (or workflow-run directories):
- `playwright.config.ts`, `playwright.config.js`, `playwright.config.mjs`
- `package.json` scripts invoking Playwright (e.g., `test:e2e`, `e2e:ci`)
- `.github/workflows/*` steps that run Playwright
- Typical Playwright outputs to collect (per-shard):
- `<outputDir>/trace.zip`
- `<outputDir>/test-results.json` or `test-results/*`
- `<outputDir>/video/*`
- `<outputDir>/*.log` (stdout/stderr)
Observed local example (for context): the developer ran
`npx playwright test --project=chromium --output=/tmp/playwright-chromium-output --reporter=list > /tmp/playwright-chromium.log 2>&1` — look for similar invocations in workflows/scripts.
### Repository container logs (containers/)
- containers/charon:
- Files to check: `containers/charon/docker-compose.yml`, any `logs/` or `data/` directories under `containers/charon/`.
- Local commands (when reproducing):
```bash
docker compose -f containers/charon/docker-compose.yml logs --no-color --timestamps > containers-charon-logs.txt
docker logs --timestamps --since "1h" charon-e2e > charon-e2e.log 2>&1 || true
```
- containers/caddy:
- Files: `containers/caddy/Caddyfile`, `containers/caddy/config/`, `containers/caddy/logs/`
- Local checks:
```bash
docker logs --timestamps caddy > caddy.log 2>&1 || true
curl -sS http://127.0.0.1:2019/ || true # admin
curl -sS http://127.0.0.1:2020/ || true # emergency
```
---
## 2) Analyze — specific files and config to review (exact paths)
- Workflows (search these paths):
- `.github/workflows/*.yml` — likely candidates: `.github/workflows/e2e.yml`, `.github/workflows/ci.yml`, `.github/workflows/playwright.yml` (run `grep -R "playwright" .github/workflows || true`).
- Look for `timeout-minutes:` either at top-level workflow or under `jobs:<job>.timeout-minutes`.
- Playwright config files:
- `/projects/Charon/playwright.config.ts`
- `/projects/Charon/playwright.config.js`
- `/projects/Charon/playwright.config.mjs`
- Inspect `projects`, `workers`, `retries`, `outputDir`, `reporter` sections.
- package.json and scripts:
- `/projects/Charon/package.json` — inspect `scripts` for e.g. `test:e2e`, `e2e:ci` and the exact Playwright CLI flags used by CI.
- GitHub skill scripts & E2E runner:
- `.github/skills/scripts/skill-runner.sh` — used in `docs` and testing instructions; check for `docker-rebuild-e2e`, `test-e2e-playwright-coverage`.
- Commands:
```bash
sed -n '1,240p' .github/skills/scripts/skill-runner.sh
grep -n "docker-rebuild-e2e\|test-e2e-playwright-coverage\|playwright" -n .github/skills || true
```
- Makefile:
- `/projects/Charon/Makefile` — search for targets related to `e2e`, `playwright`, `rebuild`.
---
## 3) Steps to download GitHub Actions logs & artifacts for run 21865692694
### Programmatic (API)
1. List artifacts for run:
```bash
curl -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/artifacts" | jq '.'
```
2. Download run logs (zip):
```bash
curl -H "Authorization: token $GITHUB_TOKEN" -L \
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/logs" -o run-21865692694-logs.zip
unzip -d run-21865692694-logs run-21865692694-logs.zip
```
3. List jobs to find Playwright shard job id(s):
```bash
curl -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
```
4. Download job logs by JOB_ID:
```bash
curl -H "Authorization: token $GITHUB_TOKEN" -L \
"https://api.github.com/repos/Wikid82/Charon/actions/jobs/$JOB_ID/logs" -o job-$JOB_ID-logs.zip
unzip -d job-$JOB_ID-logs job-$JOB_ID-logs.zip
```
### Using gh CLI
```bash
gh run view 21865692694 --repo Wikid82/Charon --log > run-21865692694-summary.log
gh run download 21865692694 --repo Wikid82/Charon --dir artifacts-21865692694
```
### Manual web UI
- Visit run page and download artifacts and job logs from the job view.
---
## 4) How to locate shard-specific logs and correlate shard indices to tests
- Typical patterns to inspect:
- Look for Playwright CLI flags in the job step (e.g., `--shard=INDEX/TOTAL`, `--output=/tmp/...`).
- If the job ran `npx playwright test --output=/tmp/...`, search the downloaded job logs for that exact command to find the shard index.
- Commands to list tests assigned to a shard (dry-run):
```bash
# Show which tests a given shard would run (no execution)
npx playwright test --list --shard=INDEX/TOTAL
# Or run with reporter=list (shows test items as executed)
npx playwright test --shard=INDEX/TOTAL --reporter=list
```
- Note: Playwright shard index is zero-based. If CI logs show `--shard=3/4`, double-check whether the team used zero-based numbering; confirm by re-running the `--list` command.
Expected per-shard artifact names (if implemented):
- `e2e-shard-<INDEX>-output` containing `trace.zip`, `video/*`, `test-results.json`, and shard-specific logs (stdout/stderr files).
---
## 5) Runner/container logs to inspect
- GitHub-hosted runner: review the Actions job logs for runner messages and any `Runner` diagnostic lines. You cannot access host-level logs.
- Self-hosted runner (if used): retrieve host system logs (requires access to runner host):
```bash
sudo journalctl -u actions.runner.* -n 1000 > runner-service-journal.log
sudo journalctl -k --since "1 hour ago" | grep -i oom > runner-kernel-oom.log || true
sudo journalctl -u docker.service -n 200 > docker-journal.log
```
- Docker container logs (charon, caddy, charon-e2e):
```bash
docker ps -a --filter "name=charon" --format "{{.Names}} {{.Status}}" > containers-ps.txt
docker logs --since "1h" charon-e2e > charon-e2e.log 2>&1 || true
docker logs --since "1h" caddy > caddy.log 2>&1 || true
```
Check Caddy admin/emergency ports (2019 & 2020) to confirm the proxy was healthy during the test run:
```bash
curl -sS --max-time 5 http://127.0.0.1:2019/ || echo "admin not responding"
curl -sS --max-time 5 http://127.0.0.1:2020/ || echo "emergency not responding"
```
---
## 6) Hypotheses for why Shard 3 stopped at ~30m (descriptions + exact artifacts to search)
H1 — Workflow/job timeout configured smaller than expected
- Search:
- `.github/workflows/*` for `timeout-minutes:`
- job logs for `Timeout` or `Job execution time exceeded`
- Commands:
```bash
grep -n "timeout-minutes" .github/workflows -R || true
grep -i "timeout" -R run-${RUN_ID}-logs || true
```
- Confirmed by: `timeout-minutes: 30` or job logs showing `aborting execution due to timeout`.
H2 — Runner preemption / connection loss
- Search job logs for: `Runner lost`, `The runner has been shutdown`, `Connection to the server was lost`.
- Commands:
```bash
grep -iE "runner lost|runner.*shutdown|connection.*lost|Job canceled|cancelled by" -R run-${RUN_ID}-logs || true
```
- Confirmed by: runner disconnect lines and abrupt end of logs with no Playwright stack trace.
H3 — E2E environment container (charon/caddy) died or became unhealthy
- Search container logs for crash/fatal/panic messages and timestamps matching the job stop time.
- Commands:
```bash
docker ps -a --filter "name=charon" --format '{{.Names}} {{.Status}}'
docker logs charon-e2e --since "2h" | sed -n '1,200p'
grep -iE "panic|fatal|segfault|exited|health.*unhealthy|503|502" containers -R || true
```
- Confirmed by: container exit matching job finish time and Caddy returning 502/503 during run.
H4 — Playwright/Node process killed by OOM
- Search for `Killed`, kernel `oom_reaper` lines, system `dmesg` outputs.
- Commands:
```bash
grep -R "Killed" job-${JOB_ID}-logs || true
# on self-hosted runner host
sudo journalctl -k --since '2 hours ago' | grep -i oom || true
```
- Confirmed by: kernel OOM logs at same timestamp or `Killed` in job logs.
H5 — Script-level early timeout (explicit `timeout 30m` or `kill`)
- Search `.github/skills` and workflow steps for `timeout 30m`, `timeout 1800`, or `kill` calls.
- Commands:
```bash
grep -R "\btimeout\b\|kill -9\|kill -15\|pkill" -n .github || true
```
- Confirmed by: a script with `timeout 30m` or similar wrapper used in the job.
H6 — Misinterpreted units or mis-configuration (seconds vs minutes)
- Search for numeric values used in scripts and steps (e.g., `1800` used where minutes expected).
- Commands:
```bash
grep -R "\b1800\b\|\b3600\b\|timeout-minutes" -n .github || true
```
- Confirmed by: a value of `1800` where `timeout-minutes` or similar was expected to be minutes.
For each hypothesis, the exact lines/entries returned by the grep/journal/docker commands are the evidence to confirm or refute it. Keep timestamps to correlate with the job start/completion times in the run logs.
---
## 7) Prioritized remediation plan (short-term → long-term)
### Short-term (unblock re-runs quickly)
1. Download and attach all logs/artifacts for run 21865692694 (use `gh run download`) and share with E2E test author.
2. Temporarily bump `timeout-minutes` for the failing workflow to 60 to allow full runs while diagnosing.
3. Add an `if: always()` step to the E2E job that collects diagnostics and uploads them as artifacts (free memory, `dmesg`, `ps aux`, `docker ps -a`, `docker logs charon-e2e`).
4. Re-run just the failing shard with added `DEBUG=pw:api` and `PWDEBUG=1` and persist shard outputs.
### Medium-term
1. Persist per-shard Playwright outputs via `actions/upload-artifact@v4` for traces/videos/test-results.
2. Add Playwright `retries` for transient failures and `--trace`/`--video` options.
3. Add a CI smoke check before full shard execution to confirm env health.
4. If self-hosted, add runner health checks and alerting (memory, disk, Docker status).
### Long-term
1. Implement stable test splitting based on historical test durations rather than equal-file sharding.
2. Introduce resource constraints and monitoring to protect against OOM and flapping containers.
3. Build a golden-minimal E2E smoke job that must pass before running full shards.
---
## 8) Minimal reproduction checklist (local)
1. Rebuild E2E image used by CI (per repo skill):
```bash
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
```
2. Start the environment (example):
```bash
docker compose -f containers/charon/docker-compose.yml up -d
```
3. Set base URL and run the same shard (replace INDEX/TOTAL with values from CI):
```bash
export PLAYWRIGHT_BASE_URL=http://localhost:5173
DEBUG=pw:api PWDEBUG=1 \
npx playwright test --shard=INDEX/TOTAL --project=chromium \
--output=/tmp/playwright-shard-INDEX --reporter=list > /tmp/playwright-shard-INDEX.log 2>&1
```
4. If reproducing a timeout, immediately collect:
```bash
docker ps -a --format '{{.Names}} {{.Status}}' > reproduce-docker-ps.txt
docker logs --since '1h' charon-e2e > reproduce-charon-e2e.log || true
tail -n 500 /tmp/playwright-shard-INDEX.log > reproduce-pw-tail.log
```
---
## 9) Required workflow/scripts changes to improve diagnostics & prevent recurrence
- Add `timeout-minutes: 60` to `.github/workflows/<e2e workflow>.yml` while diagnosing; later set to a reasoned SLA (e.g., 50m).
- Add an `always()` step to collect diagnostics on failure and upload artifacts. Example YAML snippet:
```yaml
- name: Collect diagnostics
if: always()
run: |
uptime > uptime.txt
free -m > free-m.txt
df -h > df-h.txt
ps aux > ps-aux.txt
docker ps -a > docker-ps.txt || true
docker logs --tail 500 charon-e2e > docker-charon-e2e.log || true
- uses: actions/upload-artifact@v4
with:
name: e2e-diagnostics-${{ github.run_id }}
path: |
uptime.txt
free-m.txt
df-h.txt
ps-aux.txt
docker-ps.txt
docker-charon-e2e.log
```
- Ensure each Playwright shard runs with `--output` pointing to a shard-specific path and upload that path as artifact:
- artifact name convention: `e2e-shard-${{ matrix.index }}-output`.
---
## 10) People/roles to notify & recommended next actions
- Notify:
- CI/Infra owner or person in `CODEOWNERS` for `.github/workflows`
- E2E test author(s) (owners of failing tests)
- Self-hosted runner owner (if runner_name in job JSON indicates self-hosted)
- Recommended immediate actions for them:
1. Download run artifacts and job logs for run 21865692694 and share them with the test author.
2. Re-run the shard with `DEBUG=pw:api` and `PWDEBUG=1` enabled and ensure per-shard artifacts are uploaded.
3. If self-hosted, check runner host kernel logs for OOM and Docker container exits at the job time.
---
## 11) Verification steps (post-remediation)
1. Re-run E2E workflow end-to-end; verify Shard 3 completes.
2. Confirm artifacts `e2e-shard-3-output` exist and contain `trace.zip`, `video/*`, and `test-results.json`.
3. Confirm no `oom_reaper` or `Killed` messages in runner host logs during the run.
---
## Appendix — quick extraction commands summary
```bash
# Download all artifacts and logs for RUN_ID
gh run download 21865692694 --repo Wikid82/Charon --dir ./artifacts-21865692694
# List jobs and find Playwright shard job(s)
curl -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/Wikid82/Charon/actions/runs/21865692694/jobs" | jq '.jobs[] | {id: .id, name: .name, runner_name: .runner_name, started_at: .started_at, completed_at: .completed_at}'
# Download job logs for JOB_ID
curl -H "Authorization: token $GITHUB_TOKEN" -L \
"https://api.github.com/repos/Wikid82/Charon/actions/jobs/$JOB_ID/logs" -o job-$JOB_ID-logs.zip
unzip -d job-$JOB_ID-logs job-$JOB_ID-logs.zip
# Grep for likely causes
grep -iE "timeout|minut|runner lost|cancelled|Killed|OOM|oom_reaper|Out of memory|panic|fatal" -R run-21865692694-logs || true
```
---
## Next three immediate actions (checklist)
1. Run `gh run download 21865692694 --repo Wikid82/Charon --dir ./artifacts-21865692694` and unzip the run logs.
2. Search the downloaded logs for `timeout-minutes`, `Runner lost`, `Killed`, and `oom_reaper` to triage H1H4.
3. Re-run the failing shard locally with `DEBUG=pw:api PWDEBUG=1` and `--output=/tmp/playwright-shard-INDEX`, capture outputs, and upload them as artifacts.
---
If you want, I can now (A) download the run artifacts & logs for run 21865692694 using gh/API (requires your GITHUB_TOKEN) and list the job IDs, or (B) open the workflow files in `.github/workflows` and search for `timeout-minutes` and Playwright invocations. Which would you like me to do first?
---
post_title: "E2E Test Remediation Plan"
author1: "Charon Team"
@@ -312,3 +703,99 @@ Confidence: 79 percent
Rationale: The suite inventory and dependencies are well understood. The main
unknowns are timing-sensitive security propagation and emergency server
availability in varied environments.
## Review Feedback & Required Additions
Summary: the spec is thorough and well-structured but is missing several concrete
forensic and reproduction details needed to reliably diagnose shard timeouts
and to make CI-side fixes repeatable. The items below add those missing
artifacts, commands, and prioritized mitigations.
1) Test-forensics (how to analyze Playwright traces & map failing tests to shards)
- Extract and open traces per-shard: unzip the artifact and run:
```bash
unzip e2e-shard-<INDEX>-output/trace.zip -d /tmp/trace-INDEX
npx playwright show-trace /tmp/trace-INDEX
```
- Use JSON reporter to map test IDs to trace files and timestamps:
```bash
# run locally to produce a reporter JSON for the shard
npx playwright test --shard=INDEX/TOTAL --project=chromium --reporter=json --output=/tmp/playwright-shard-INDEX --trace=on > /tmp/playwright-shard-INDEX.json
jq '.suites[].specs[]?.tests[] | {title: .title, file: .location.file, line: .location.line, duration: .duration, annotations: .annotations}' /tmp/playwright-shard-INDEX.json
```
- Correlate test start/stop timestamps (from reporter JSON) with job logs and container logs to find the precise point where execution stopped.
- If only one test is hanging, use `--grep` or `--file` to re-run that test with `--trace=on --debug=pw:api` and capture trace and stdout.
2) CI / Workflow checks (where to inspect timeouts and cancellation causes)
- Inspect `.github/workflows/*.yml` for both top-level `timeout-minutes:` and job-level `jobs.<job>.timeout-minutes`.
```bash
grep -n "timeout-minutes" .github/workflows -R || true
```
- From the run/job JSON (API) check `status` and `conclusion` fields and `cancelled_by` / `cancelled_at` times:
```bash
curl -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID" | jq '.'
```
- Search job logs for runner messages indicating preemption, OOM, or cancellation:
```bash
grep -iE "Job canceled|cancelled|runner lost|Runner|Killed|OOM|oom_reaper|Timeout" -R job-$JOB_ID-logs || true
```
- Confirm whether the runner was `self-hosted` (job JSON `runner_name` / `runner_group_id`). If self-hosted, collect `journalctl` and docker host logs for the timestamp window.
3) Reproduction instructions (how to reproduce the shard locally exactly)
- Rebuild image used by CI (recommended to match CI):
```bash
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e
```
- Start E2E environment (use the same compose used in CI):
```bash
docker compose -f containers/charon/docker-compose.yml up -d
```
- Environment variables to set (use the values CI uses):
- `PLAYWRIGHT_BASE_URL` CI base URL (e.g. `http://localhost:8080` for Docker mode; `http://localhost:5173` for Vite dev).
- `CHARON_EMERGENCY_TOKEN` emergency token used by tests.
- `PLAYWRIGHT_JOBS` or `PWDEBUG` as needed: `DEBUG=pw:api PWDEBUG=1`.
- Optional toggles used in CI: `PLAYWRIGHT_SKIP_SECURITY_DEPS=1`.
- Exact shard reproduction command (example matching CI):
```bash
export PLAYWRIGHT_BASE_URL=http://localhost:8080
export CHARON_EMERGENCY_TOKEN=changeme
DEBUG=pw:api PWDEBUG=1 \
npx playwright test --shard=INDEX/TOTAL --project=chromium \
--output=/tmp/playwright-shard-INDEX --reporter=json --trace=on > /tmp/playwright-shard-INDEX.log 2>&1
```
- To re-run a single failing test found in JSON:
```bash
npx playwright test tests/path/to/spec.ts -g "Exact test title" --project=chromium --trace=on --output=/tmp/playwright-single
```
4) Required artifacts & evidence to collect (exact list and commands)
- Per-shard Playwright outputs: `trace.zip`, `video/*`, `test-results.json` or `reporter json` and shard stdout/stderr log. Ensure `--output` points to shard-specific path and upload as artifact.
- Job-level artifacts: GitHub Actions run logs ZIP, job logs ZIP, `gh run download` output.
- Runner/host diagnostics (self-hosted): `journalctl -u actions.runner.*`, `dmesg | grep -i oom`, `sudo journalctl -u docker.service`, `docker ps -a`, `docker logs --since` for charon-e2e and caddy.
- Capture a timestamped mapping file that lists: job start, shard start, last test start, last trace timestamp, job end. Example CSV header: `job_id,job_start,shard_index,shard_start, last_test_started_at, job_end, conclusion`.
- Attach a minimal repro package: Docker image tag, docker-compose file, the exact Playwright command-line, and the failing test id/title.
5) Prioritization of fixes and quick mitigations (concrete)
- P0 (Immediate unblock):
- Temporarily increase `timeout-minutes` to 60 for failing workflow; add `if: always()` diagnostics step and artifact upload.
- Ensure each shard uses `--output` per-shard and is uploaded (`actions/upload-artifact`) so traces are available even on cancellation.
- Re-run failing shard locally with `DEBUG=pw:api PWDEBUG=1` and collect traces.
- P1 (Same-day):
- Add CI smoke healthcheck step that validates UI and emergency server before shards start (quick `curl` checks and a small Playwright smoke test).
- If self-hosted runner, add simple resource guard (systemd service restart prevention) and OOM monitoring alert.
- Configure Playwright retries for flaky tests (small number) and mark expensive suites as `--workers=1`.
- P2 (Next sprint):
- Implement historical-duration-based shard splitting to avoid heavy concentration in one shard.
- Add test-level tagging and targeted prioritization for long-running security-enforcement suites.
- Add CI-level telemetry: test-duration history, flaky-test dashboard.
Verdict: NEEDS CHANGES — the existing spec is a solid base, but add the forensic commands, reproducible shard reproduction steps, explicit artifact list, and CI checks above before marking this plan approved.
Actionable next steps (short list):
- Add the `always()` diagnostics step to `.github/workflows/<e2e-workflow>.yml` and upload diagnostics as artifacts.
- Modify the E2E job to set `--output` to `e2e-shard-${{ matrix.index }}-output` and upload that path.
- Run `gh run download 21865692694` and extract the per-job logs; parse the job JSON to determine if the runner was self-hosted and collect host logs if so.
- Reproduce the failing shard locally using the exact commands above and attach `trace.zip` and JSON reporter output to the issue.
If you want, I can apply the small CI YAML snippets (diagnostics + upload) as a targeted patch or download the run artifacts now (requires `GITHUB_TOKEN`).