11 KiB
title, status, scope, notes
| title | status | scope | notes |
|---|---|---|---|
| CI Workflow Reliability Fixes | docs_complete | ci/workflows | Finalize E2E split workflow execution and aggregation logic, prevent security tests from leaking into non-security shards, align Docker build and Codecov defaults, and ensure workflows run only on pull_request or workflow_dispatch. |
1. Introduction
This plan finalizes CI workflow specifications based on Supervisor feedback and deeper analysis. The scope covers three workflows:
- E2E split workflow reliability and accurate results aggregation.
- Docker build workflow trigger logic for PRs and manual dispatches.
- Codecov upload workflow default behavior on non-dispatch events.
Objectives:
- Ensure Playwright jobs fail deterministically when tests fail and do not get masked by aggregation.
- Ensure results aggregation reflects expected runs based on
inputs.browserandinputs.test_category. - Ensure non-security jobs do not run security test suites.
- Preserve intended Docker build scan/test job triggers for pull_request and workflow_dispatch only.
- Preserve Codecov default behavior on non-dispatch events.
- Enforce policy that no push events trigger CI run jobs.
2. Research Findings
2.1 E2E Split Workflow Execution and Aggregation
.github/workflows/e2e-tests-split.yml currently:
- Executes Playwright in multiple jobs without a consistent exit-code capture pattern.
- Aggregates results in
e2e-resultsby converting anyskippedjob intosuccessregardless of whether it should have run.
This can hide failures and unexpected skips when inputs or secrets are misconfigured.
2.2 Security Test Leakage into Non-Security Shards
Non-security jobs call Playwright with explicit paths including tests/integration and other folders. Security-only jobs explicitly run:
tests/security-enforcement/tests/security/
If tests/integration contains security-sensitive tests, those could execute in non-security shards. This violates the isolation goal of the split workflow.
2.3 Docker Build Workflow Trigger Logic
.github/workflows/docker-build.yml is intended to:
- Run
scan-pr-imagefor pull requests. - Run
test-imagefor pull_request and workflow_dispatch.
The workflow also currently triggers on push. The user requirement is that no push events trigger CI run jobs, so all push triggers must be removed and job logic updated accordingly.
2.4 Codecov Default Behavior on Non-Dispatch Events
.github/workflows/codecov-upload.yml uses inputs.run_backend and inputs.run_frontend. On pull_request, inputs is undefined, leading to unintended skips. The intended behavior is to run by default on non-dispatch events and honor inputs only on workflow_dispatch. The workflow must not be triggered by push.
3. Technical Specifications
3.1 E2E Workflow Execution Hardening
Workflow: .github/workflows/e2e-tests-split.yml
Requirements:
- Apply a robust execution pattern in every Playwright run step (security and non-security jobs) to capture exit codes and fail the job consistently.
- Keep timing logs but do not allow Playwright failures to be hidden by a non-zero exit path.
Required run-step pattern:
set -euo pipefail
STATUS=0
npx playwright test ... || STATUS=$?
echo "PLAYWRIGHT_STATUS=$STATUS" >> "$GITHUB_ENV"
exit "$STATUS"
3.2 E2E Results Aggregation with Expected-Run Logic
Workflow: .github/workflows/e2e-tests-split.yml
Requirements:
- Replicate the same input filtering logic used in the job
ifclauses to determine if each job should have run. e2e-resultsMUST declareneeds: [shard-jobs...]covering every shard job so the aggregation evaluates actual results.- Treat
skipped,failure, orcancelledas failure outcomes when a job should have run. - Ignore
skippedwhen a job should not have run.
Expected-run logic (derived from inputs and defaults):
effective_browser = inputs.browser || 'all'effective_category = inputs.test_category || 'all'
For each job:
- Security jobs should run when
effective_browsermatches the browser (orall) ANDeffective_categoryissecurityorall. - Non-security jobs should run when
effective_browsermatches the browser (orall) ANDeffective_categoryisnon-securityorall.
Aggregation behavior:
- If a job should run and result is
skipped,failure, orcancelled, the aggregation fails. - If a job should not run and result is
skipped, ignore it. - Only return success when all expected jobs are successful.
3.3 Security Isolation for Non-Security Shards
Workflow: .github/workflows/e2e-tests-split.yml
Requirements:
- Ensure non-security jobs do not execute any tests from
tests/security-enforcement/ortests/security/. - Validate whether
tests/integrationincludes security tests. If it does, split those tests into the security suites or exclude them explicitly from non-security runs. - Maintain explicit path lists for non-security jobs to avoid accidental inclusion via globbing.
3.4 Docker Build Workflow Conditions (Preserve Intended Behavior)
Workflow: .github/workflows/docker-build.yml
Requirements:
- Remove
pushtriggers from the workflowon:block. - Keep
scan-pr-imageforpull_requestonly. - Keep
test-imageforpull_requestandworkflow_dispatchonly. - Preserve
skip_buildgating. - Remove any job-level checks for
github.event_name == 'push'.
Expected on: block:
on:
pull_request:
branches:
- main
- development
workflow_dispatch:
Expected if conditions:
scan-pr-image:needs.build-and-push.outputs.skip_build != 'true' && needs.build-and-push.result == 'success' && github.event_name == 'pull_request'test-image:needs.build-and-push.outputs.skip_build != 'true' && needs.build-and-push.result == 'success' && (github.event_name == 'pull_request' || github.event_name == 'workflow_dispatch')
3.5 Codecov Default Behavior on Non-Dispatch Events
Workflow: .github/workflows/codecov-upload.yml
Requirements:
- Remove
pushtriggers from the workflowon:block. - Keep default coverage uploads on non-dispatch events.
- Only honor
inputs.run_backendandinputs.run_frontendforworkflow_dispatch. - Remove any job-level checks for
github.event_name == 'push'.
Expected on: block:
on:
pull_request:
branches:
- main
- development
workflow_dispatch:
Expected job conditions:
backend-codecov:${{ github.event_name != 'workflow_dispatch' || inputs.run_backend != 'false' }}frontend-codecov:${{ github.event_name != 'workflow_dispatch' || inputs.run_frontend != 'false' }}
3.6 Trigger Policy: No Push CI Runs
Workflows: .github/workflows/e2e-tests-split.yml, .github/workflows/docker-build.yml, .github/workflows/codecov-upload.yml
Requirements:
- Remove
pushfrom theon:block in each workflow. - Ensure only
pull_requestandworkflow_dispatchevents trigger runs. - Remove any job-level
ifclauses that includegithub.event_name == 'push'.
Expected on: blocks:
on:
pull_request:
branches:
- main
- development
workflow_dispatch:
4. Implementation Plan
Phase 1: Playwright Tests (Behavior Definition)
- No new Playwright tests are required; this change focuses on CI execution and aggregation.
- Confirm current Playwright suites already cover security enforcement and non-security coverage across browsers.
Phase 2: Backend Implementation
- Not applicable. No backend code changes are required.
Phase 3: Frontend Implementation
- Not applicable. No frontend code changes are required.
Phase 4: Integration and Testing
- Update all Playwright run steps in the split workflow to use the robust execution pattern.
- Update
e2e-resultsto compute expected-run logic and fail on unexpected skips. - Audit non-security test path lists to ensure security tests are not executed outside security jobs.
- Update Docker workflow
on:triggers and job conditions to removepushand align withpull_requestandworkflow_dispatchonly. - Update Codecov workflow
on:triggers and job conditions to default on non-dispatch events (no push triggers). - Update E2E split workflow
on:triggers to removepushand align withpull_requestandworkflow_dispatchonly.
Phase 5: Documentation and Deployment
- Update this plan with final specs and ensure it matches workflow behavior.
- No deployment or release changes are required.
5. Acceptance Criteria (EARS)
- WHEN a Playwright job runs, THE SYSTEM SHALL record the Playwright exit status and exit the step with that status.
- WHEN a job should have run based on
inputs.browserandinputs.test_category, THE SYSTEM SHALL fail aggregation if the job result isskipped,failure, orcancelled. - WHEN a job should not have run based on
inputs.browserandinputs.test_category, THE SYSTEM SHALL ignore itsskippedresult. - WHEN non-security shards run, THE SYSTEM SHALL NOT execute tests from
tests/security-enforcement/ortests/security/. - WHEN the Docker build workflow runs on a pull request, THE SYSTEM SHALL execute
scan-pr-image. - WHEN the Docker build workflow runs on a workflow dispatch, THE SYSTEM SHALL execute
test-image. - WHEN the Docker build workflow runs on a pull request, THE SYSTEM SHALL execute
test-image. - WHEN the Codecov workflow runs on a non-dispatch event, THE SYSTEM SHALL execute backend and frontend coverage jobs by default.
- WHEN the Codecov workflow is manually dispatched and inputs disable a job, THE SYSTEM SHALL skip the disabled job.
- WHEN any of the target workflows are triggered, THE SYSTEM SHALL only allow
pull_requestandworkflow_dispatchevents and SHALL NOT run forpushevents.
6. Risks and Mitigations
- Risk: Aggregation logic becomes too strict for selective runs. Mitigation: derive expected-run logic from inputs with the same defaults used in the workflow.
- Risk: Security tests remain in
tests/integrationand leak into non-security shards. Mitigation: audittests/integrationand relocate or exclude security tests explicitly. - Risk: Removing push triggers reduces CI coverage for direct branch pushes. Mitigation: enforce pull_request-based checks and require manual dispatch for emergency validations.
7. Confidence Score
Confidence: 86 percent
Rationale: Required changes are localized to workflow triggers, job conditions, and test path selection. The main remaining uncertainty is whether any security tests reside in tests/integration, which must be verified before finalizing non-security path lists.