Files
Charon/docs/reports/ci_workflow_analysis.md
GitHub Actions 2b2d907b0c fix: enhance notifications and validation features
- Added URL validation for notification providers to ensure only valid http/https URLs are accepted.
- Implemented tests for URL validation scenarios in the Notifications component.
- Updated translations for error messages related to invalid URLs in multiple languages.
- Introduced new hooks for managing security headers and access lists in tests.
- Enhanced the ProviderForm component to reset state correctly when switching between add and edit modes.
- Improved user feedback with update indicators after saving changes to notification providers.
- Added mock implementations for new hooks in various test files to ensure consistent testing behavior.
2026-02-10 22:01:45 +00:00

12 KiB

CI Workflow Analysis - E2E Timeout Investigation

Scope

Reviewed CI workflow configuration and the provided E2E job logs to identify timeout and shard-related risks, per sections 2, 3, 7, and 9 of the current spec.

CI Evidence Collection (Spec Sections 2, 3, 7, 9)

The following commands capture the exact evidence sources used for this investigation.

Run Logs Download (gh)

gh run download 21865692694 --repo Wikid82/Charon --dir artifacts-21865692694

Job Logs API Call (curl)

export GITHUB_OWNER=Wikid82
export GITHUB_REPO=Charon
export JOB_ID=<JOB_ID>
curl -H "Accept: application/vnd.github+json" \
  -H "Authorization: token $GITHUB_TOKEN" \
  -L "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID/logs" \
  -o job-$JOB_ID-logs.zip
unzip -d job-$JOB_ID-logs job-$JOB_ID-logs.zip

Artifact List API Call (curl)

export GITHUB_OWNER=Wikid82
export GITHUB_REPO=Charon
export RUN_ID=21865692694
curl -H "Accept: application/vnd.github+json" \
  -H "Authorization: token $GITHUB_TOKEN" \
  "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/runs/$RUN_ID/artifacts" | jq '.'

Job JSON Inspection (Cancellation Evidence)

export GITHUB_OWNER=Wikid82
export GITHUB_REPO=Charon
export JOB_ID=<JOB_ID>
curl -H "Accept: application/vnd.github+json" \
  -H "Authorization: token $GITHUB_TOKEN" \
  "https://api.github.com/repos/$GITHUB_OWNER/$GITHUB_REPO/actions/jobs/$JOB_ID" | jq '.'

E2E Playwright Invocation and Shard Strategy

Reproduction Command Coverage (Spec Sections 3, 8)

The steps below mirror the CI flow with the same compose file, env variables, and Playwright CLI flags.

Image Rebuild Steps (CI Parity)

# CI build job produces a local image and saves it as a tar.
# To match CI locally, rebuild the E2E image using the project skill:
.github/skills/scripts/skill-runner.sh docker-rebuild-e2e

Environment Start Commands (CI Compose)

# CI uses the Playwright CI compose file.
docker compose -f .docker/compose/docker-compose.playwright-ci.yml up -d

# Health check to match CI wait loop behavior.
curl -sf http://127.0.0.1:8080/api/v1/health > /dev/null 2>&1

Exact Playwright CLI Invocation (Non-Security Shards)

export PLAYWRIGHT_BASE_URL=http://127.0.0.1:8080
export CI=true
export TEST_WORKER_INDEX=<SHARD_INDEX>
export CHARON_EMERGENCY_TOKEN=<SECRET>
export CHARON_EMERGENCY_SERVER_ENABLED=true
export CHARON_SECURITY_TESTS_ENABLED=false
export CHARON_E2E_IMAGE_TAG=<IMAGE_TAG>

npx playwright test \
  --project=chromium \
  --shard=<SHARD_INDEX>/<TOTAL_SHARDS> \
  --output=playwright-output/chromium-shard-<SHARD_INDEX> \
  tests/core \
  tests/dns-provider-crud.spec.ts \
  tests/dns-provider-types.spec.ts \
  tests/integration \
  tests/manual-dns-provider.spec.ts \
  tests/monitoring \
  tests/settings \
  tests/tasks

Post-Failure Diagnostic Collection (CI Always-Run)

mkdir -p diagnostics
uptime > diagnostics/uptime.txt
free -m > diagnostics/free-m.txt
df -h > diagnostics/df-h.txt
ps aux > diagnostics/ps-aux.txt
docker ps -a > diagnostics/docker-ps.txt || true
docker logs --tail 500 charon-e2e > diagnostics/docker-charon-e2e.log 2>&1 || true
docker compose -f .docker/compose/docker-compose.playwright-ci.yml logs > docker-logs-shard.txt 2>&1

Emergency Server Port (2020) Configuration

  • No explicit references to port 2020 were found in workflow YAMLs. The E2E workflow sets CHARON_EMERGENCY_SERVER_ENABLED=true but does not validate port 2020 availability.

Job Log Evidence (Shard 3)

  • No runner cancellation, runner lost, or OOM strings were present in the reviewed job log text.
  • The job log shows Playwright test-level timeouts (10s and 60s expectations), not a job-level timeout.
  • The job log shows the shard command executed with --shard=3/4 and standard suite list, indicating the job did run sharded Playwright as expected.

Excerpt:

2026-02-10T12:58:19.5379132Z npx playwright test \
2026-02-10T12:58:19.5379658Z   --shard=3/4 \
2026-02-10T13:06:49.1304667Z     Test timeout of 60000ms exceeded.

Proposed Workflow YAML Changes (Section 9)

The following changes were applied to the E2E workflow to align with the spec:

# Timeout increase (temporary)
  e2e-chromium:
    timeout-minutes: 60

# Per-shard output + artifact upload
      - name: Run Chromium Non-Security Tests (Shard ${{ matrix.shard }}/${{ matrix.total-shards }})
        run: |
          npx playwright test \
            --project=chromium \
            --shard=${{ matrix.shard }}/${{ matrix.total-shards }} \
            --output=playwright-output/chromium-shard-${{ matrix.shard }} \
            ...

      - name: Upload Playwright output (Chromium shard ${{ matrix.shard }})
        if: always()
        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
        with:
          name: playwright-output-chromium-shard-${{ matrix.shard }}
          path: playwright-output/chromium-shard-${{ matrix.shard }}/

# Diagnostics (always)
      - name: Collect diagnostics
        if: always()
        run: |
          mkdir -p diagnostics
          uptime > diagnostics/uptime.txt
          free -m > diagnostics/free-m.txt
          df -h > diagnostics/df-h.txt
          ps aux > diagnostics/ps-aux.txt
          docker ps -a > diagnostics/docker-ps.txt || true
          docker logs --tail 500 charon-e2e > diagnostics/docker-charon-e2e.log 2>&1 || true

      - name: Upload diagnostics
        if: always()
        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
        with:
          name: e2e-diagnostics-chromium-shard-${{ matrix.shard }}
          path: diagnostics/

Quick Mitigation Checklist (P0)

  • Increase E2E job timeouts to 60 minutes in the E2E workflow to eliminate premature job cancellation risk.
  • Collect diagnostics on every shard with if: always() and upload artifacts.
  • Enforce per-shard --output paths and upload them as artifacts so traces and JSON are preserved even on failure.
  • Re-run the failing shard locally with the exact shard flags and diagnostics enabled to capture a trace.

CI Remediation Priority Labels (Spec Section 5)

P0 (Immediate - already applied)

  • Timeout increase to 60 minutes for E2E shard jobs.
  • Always-run diagnostics collection and artifact upload.

P1 (Same-day)

  • Add a lightweight CI smoke check step before shard execution (health check + minimal Playwright smoke).
  • Add basic resource monitoring output (CPU/memory/disk) to the diagnostics bundle.

P2 (Next sprint)

  • Implement shard balancing based on historical test durations.
  • Stand up a test-duration/flake telemetry dashboard for CI trends.

Explicit Confirmation Checklist