247 lines
11 KiB
Markdown
247 lines
11 KiB
Markdown
---
|
|
title: "CI Workflow Reliability Fixes"
|
|
status: "docs_complete"
|
|
scope: "ci/workflows"
|
|
notes: Finalize E2E split workflow execution and aggregation logic, prevent security tests from leaking into non-security shards, align Docker build and Codecov defaults, and ensure workflows run only on pull_request or workflow_dispatch.
|
|
---
|
|
|
|
## 1. Introduction
|
|
|
|
This plan finalizes CI workflow specifications based on Supervisor feedback and deeper analysis. The scope covers three workflows:
|
|
|
|
- E2E split workflow reliability and accurate results aggregation.
|
|
- Docker build workflow trigger logic for PRs and manual dispatches.
|
|
- Codecov upload workflow default behavior on non-dispatch events.
|
|
|
|
Objectives:
|
|
|
|
- Ensure Playwright jobs fail deterministically when tests fail and do not get masked by aggregation.
|
|
- Ensure results aggregation reflects expected runs based on `inputs.browser` and `inputs.test_category`.
|
|
- Ensure non-security jobs do not run security test suites.
|
|
- Preserve intended Docker build scan/test job triggers for pull_request and workflow_dispatch only.
|
|
- Preserve Codecov default behavior on non-dispatch events.
|
|
- Enforce policy that no push events trigger CI run jobs.
|
|
|
|
## 2. Research Findings
|
|
|
|
### 2.1 E2E Split Workflow Execution and Aggregation
|
|
|
|
[.github/workflows/e2e-tests-split.yml](../../.github/workflows/e2e-tests-split.yml) currently:
|
|
|
|
- Executes Playwright in multiple jobs without a consistent exit-code capture pattern.
|
|
- Aggregates results in `e2e-results` by converting any `skipped` job into `success` regardless of whether it should have run.
|
|
|
|
This can hide failures and unexpected skips when inputs or secrets are misconfigured.
|
|
|
|
### 2.2 Security Test Leakage into Non-Security Shards
|
|
|
|
Non-security jobs call Playwright with explicit paths including `tests/integration` and other folders. Security-only jobs explicitly run:
|
|
|
|
- `tests/security-enforcement/`
|
|
- `tests/security/`
|
|
|
|
If `tests/integration` contains security-sensitive tests, those could execute in non-security shards. This violates the isolation goal of the split workflow.
|
|
|
|
### 2.3 Docker Build Workflow Trigger Logic
|
|
|
|
[.github/workflows/docker-build.yml](../../.github/workflows/docker-build.yml) is intended to:
|
|
|
|
- Run `scan-pr-image` for pull requests.
|
|
- Run `test-image` for pull_request and workflow_dispatch.
|
|
|
|
The workflow also currently triggers on `push`. The user requirement is that no push events trigger CI run jobs, so all push triggers must be removed and job logic updated accordingly.
|
|
|
|
### 2.4 Codecov Default Behavior on Non-Dispatch Events
|
|
|
|
[.github/workflows/codecov-upload.yml](../../.github/workflows/codecov-upload.yml) uses `inputs.run_backend` and `inputs.run_frontend`. On `pull_request`, `inputs` is undefined, leading to unintended skips. The intended behavior is to run by default on non-dispatch events and honor inputs only on `workflow_dispatch`. The workflow must not be triggered by `push`.
|
|
|
|
## 3. Technical Specifications
|
|
|
|
### 3.1 E2E Workflow Execution Hardening
|
|
|
|
Workflow: [.github/workflows/e2e-tests-split.yml](../../.github/workflows/e2e-tests-split.yml)
|
|
|
|
Requirements:
|
|
|
|
- Apply a robust execution pattern in every Playwright run step (security and non-security jobs) to capture exit codes and fail the job consistently.
|
|
- Keep timing logs but do not allow Playwright failures to be hidden by a non-zero exit path.
|
|
|
|
Required run-step pattern:
|
|
|
|
```bash
|
|
set -euo pipefail
|
|
STATUS=0
|
|
npx playwright test ... || STATUS=$?
|
|
echo "PLAYWRIGHT_STATUS=$STATUS" >> "$GITHUB_ENV"
|
|
exit "$STATUS"
|
|
```
|
|
|
|
### 3.2 E2E Results Aggregation with Expected-Run Logic
|
|
|
|
Workflow: [.github/workflows/e2e-tests-split.yml](../../.github/workflows/e2e-tests-split.yml)
|
|
|
|
Requirements:
|
|
|
|
- Replicate the same input filtering logic used in the job `if` clauses to determine if each job should have run.
|
|
- `e2e-results` MUST declare `needs: [shard-jobs...]` covering every shard job so the aggregation evaluates actual results.
|
|
- Treat `skipped`, `failure`, or `cancelled` as failure outcomes when a job should have run.
|
|
- Ignore `skipped` when a job should not have run.
|
|
|
|
Expected-run logic (derived from inputs and defaults):
|
|
|
|
- `effective_browser = inputs.browser || 'all'`
|
|
- `effective_category = inputs.test_category || 'all'`
|
|
|
|
For each job:
|
|
|
|
- Security jobs should run when `effective_browser` matches the browser (or `all`) AND `effective_category` is `security` or `all`.
|
|
- Non-security jobs should run when `effective_browser` matches the browser (or `all`) AND `effective_category` is `non-security` or `all`.
|
|
|
|
Aggregation behavior:
|
|
|
|
- If a job should run and result is `skipped`, `failure`, or `cancelled`, the aggregation fails.
|
|
- If a job should not run and result is `skipped`, ignore it.
|
|
- Only return success when all expected jobs are successful.
|
|
|
|
### 3.3 Security Isolation for Non-Security Shards
|
|
|
|
Workflow: [.github/workflows/e2e-tests-split.yml](../../.github/workflows/e2e-tests-split.yml)
|
|
|
|
Requirements:
|
|
|
|
- Ensure non-security jobs do not execute any tests from `tests/security-enforcement/` or `tests/security/`.
|
|
- Validate whether `tests/integration` includes security tests. If it does, split those tests into the security suites or exclude them explicitly from non-security runs.
|
|
- Maintain explicit path lists for non-security jobs to avoid accidental inclusion via globbing.
|
|
|
|
### 3.4 Docker Build Workflow Conditions (Preserve Intended Behavior)
|
|
|
|
Workflow: [.github/workflows/docker-build.yml](../../.github/workflows/docker-build.yml)
|
|
|
|
Requirements:
|
|
|
|
- Remove `push` triggers from the workflow `on:` block.
|
|
- Keep `scan-pr-image` for `pull_request` only.
|
|
- Keep `test-image` for `pull_request` and `workflow_dispatch` only.
|
|
- Preserve `skip_build` gating.
|
|
- Remove any job-level checks for `github.event_name == 'push'`.
|
|
|
|
Expected `on:` block:
|
|
|
|
```yaml
|
|
on:
|
|
pull_request:
|
|
branches:
|
|
- main
|
|
- development
|
|
workflow_dispatch:
|
|
```
|
|
|
|
Expected `if` conditions:
|
|
|
|
- `scan-pr-image`: `needs.build-and-push.outputs.skip_build != 'true' && needs.build-and-push.result == 'success' && github.event_name == 'pull_request'`
|
|
- `test-image`: `needs.build-and-push.outputs.skip_build != 'true' && needs.build-and-push.result == 'success' && (github.event_name == 'pull_request' || github.event_name == 'workflow_dispatch')`
|
|
|
|
### 3.5 Codecov Default Behavior on Non-Dispatch Events
|
|
|
|
Workflow: [.github/workflows/codecov-upload.yml](../../.github/workflows/codecov-upload.yml)
|
|
|
|
Requirements:
|
|
|
|
- Remove `push` triggers from the workflow `on:` block.
|
|
- Keep default coverage uploads on non-dispatch events.
|
|
- Only honor `inputs.run_backend` and `inputs.run_frontend` for `workflow_dispatch`.
|
|
- Remove any job-level checks for `github.event_name == 'push'`.
|
|
|
|
Expected `on:` block:
|
|
|
|
```yaml
|
|
on:
|
|
pull_request:
|
|
branches:
|
|
- main
|
|
- development
|
|
workflow_dispatch:
|
|
```
|
|
|
|
Expected job conditions:
|
|
|
|
- `backend-codecov`: `${{ github.event_name != 'workflow_dispatch' || inputs.run_backend != 'false' }}`
|
|
- `frontend-codecov`: `${{ github.event_name != 'workflow_dispatch' || inputs.run_frontend != 'false' }}`
|
|
|
|
### 3.6 Trigger Policy: No Push CI Runs
|
|
|
|
Workflows: [.github/workflows/e2e-tests-split.yml](../../.github/workflows/e2e-tests-split.yml),
|
|
[.github/workflows/docker-build.yml](../../.github/workflows/docker-build.yml),
|
|
[.github/workflows/codecov-upload.yml](../../.github/workflows/codecov-upload.yml)
|
|
|
|
Requirements:
|
|
|
|
- Remove `push` from the `on:` block in each workflow.
|
|
- Ensure only `pull_request` and `workflow_dispatch` events trigger runs.
|
|
- Remove any job-level `if` clauses that include `github.event_name == 'push'`.
|
|
|
|
Expected `on:` blocks:
|
|
|
|
```yaml
|
|
on:
|
|
pull_request:
|
|
branches:
|
|
- main
|
|
- development
|
|
workflow_dispatch:
|
|
```
|
|
|
|
## 4. Implementation Plan
|
|
|
|
### Phase 1: Playwright Tests (Behavior Definition)
|
|
|
|
- No new Playwright tests are required; this change focuses on CI execution and aggregation.
|
|
- Confirm current Playwright suites already cover security enforcement and non-security coverage across browsers.
|
|
|
|
### Phase 2: Backend Implementation
|
|
|
|
- Not applicable. No backend code changes are required.
|
|
|
|
### Phase 3: Frontend Implementation
|
|
|
|
- Not applicable. No frontend code changes are required.
|
|
|
|
### Phase 4: Integration and Testing
|
|
|
|
- Update all Playwright run steps in the split workflow to use the robust execution pattern.
|
|
- Update `e2e-results` to compute expected-run logic and fail on unexpected skips.
|
|
- Audit non-security test path lists to ensure security tests are not executed outside security jobs.
|
|
- Update Docker workflow `on:` triggers and job conditions to remove `push` and align with `pull_request` and `workflow_dispatch` only.
|
|
- Update Codecov workflow `on:` triggers and job conditions to default on non-dispatch events (no push triggers).
|
|
- Update E2E split workflow `on:` triggers to remove `push` and align with `pull_request` and `workflow_dispatch` only.
|
|
|
|
### Phase 5: Documentation and Deployment
|
|
|
|
- Update this plan with final specs and ensure it matches workflow behavior.
|
|
- No deployment or release changes are required.
|
|
|
|
## 5. Acceptance Criteria (EARS)
|
|
|
|
- WHEN a Playwright job runs, THE SYSTEM SHALL record the Playwright exit status and exit the step with that status.
|
|
- WHEN a job should have run based on `inputs.browser` and `inputs.test_category`, THE SYSTEM SHALL fail aggregation if the job result is `skipped`, `failure`, or `cancelled`.
|
|
- WHEN a job should not have run based on `inputs.browser` and `inputs.test_category`, THE SYSTEM SHALL ignore its `skipped` result.
|
|
- WHEN non-security shards run, THE SYSTEM SHALL NOT execute tests from `tests/security-enforcement/` or `tests/security/`.
|
|
- WHEN the Docker build workflow runs on a pull request, THE SYSTEM SHALL execute `scan-pr-image`.
|
|
- WHEN the Docker build workflow runs on a workflow dispatch, THE SYSTEM SHALL execute `test-image`.
|
|
- WHEN the Docker build workflow runs on a pull request, THE SYSTEM SHALL execute `test-image`.
|
|
- WHEN the Codecov workflow runs on a non-dispatch event, THE SYSTEM SHALL execute backend and frontend coverage jobs by default.
|
|
- WHEN the Codecov workflow is manually dispatched and inputs disable a job, THE SYSTEM SHALL skip the disabled job.
|
|
- WHEN any of the target workflows are triggered, THE SYSTEM SHALL only allow `pull_request` and `workflow_dispatch` events and SHALL NOT run for `push` events.
|
|
|
|
## 6. Risks and Mitigations
|
|
|
|
- Risk: Aggregation logic becomes too strict for selective runs. Mitigation: derive expected-run logic from inputs with the same defaults used in the workflow.
|
|
- Risk: Security tests remain in `tests/integration` and leak into non-security shards. Mitigation: audit `tests/integration` and relocate or exclude security tests explicitly.
|
|
- Risk: Removing push triggers reduces CI coverage for direct branch pushes. Mitigation: enforce pull_request-based checks and require manual dispatch for emergency validations.
|
|
|
|
## 7. Confidence Score
|
|
|
|
Confidence: 86 percent
|
|
|
|
Rationale: Required changes are localized to workflow triggers, job conditions, and test path selection. The main remaining uncertainty is whether any security tests reside in `tests/integration`, which must be verified before finalizing non-security path lists.
|