Charon/docs/plans/current_spec.md at 5bdaffe6b7a097787f325338de0741450264229d

Files

GitHub Actions 6caa82935e fix: update CI workflows to enhance execution reliability and prevent security test leakage

2026-02-09 06:09:45 +00:00

11 KiB

Raw Blame History

title, status, scope, notes

title	status	scope	notes
CI Workflow Reliability Fixes	docs_complete	ci/workflows	Finalize E2E split workflow execution and aggregation logic, prevent security tests from leaking into non-security shards, align Docker build and Codecov defaults, and ensure workflows run only on pull_request or workflow_dispatch.

1. Introduction

This plan finalizes CI workflow specifications based on Supervisor feedback and deeper analysis. The scope covers three workflows:

E2E split workflow reliability and accurate results aggregation.
Docker build workflow trigger logic for PRs and manual dispatches.
Codecov upload workflow default behavior on non-dispatch events.

Objectives:

Ensure Playwright jobs fail deterministically when tests fail and do not get masked by aggregation.
Ensure results aggregation reflects expected runs based on inputs.browser and inputs.test_category.
Ensure non-security jobs do not run security test suites.
Preserve intended Docker build scan/test job triggers for pull_request and workflow_dispatch only.
Preserve Codecov default behavior on non-dispatch events.
Enforce policy that no push events trigger CI run jobs.

2. Research Findings

2.1 E2E Split Workflow Execution and Aggregation

.github/workflows/e2e-tests-split.yml currently:

Executes Playwright in multiple jobs without a consistent exit-code capture pattern.
Aggregates results in e2e-results by converting any skipped job into success regardless of whether it should have run.

This can hide failures and unexpected skips when inputs or secrets are misconfigured.

2.2 Security Test Leakage into Non-Security Shards

Non-security jobs call Playwright with explicit paths including tests/integration and other folders. Security-only jobs explicitly run:

tests/security-enforcement/
tests/security/

If tests/integration contains security-sensitive tests, those could execute in non-security shards. This violates the isolation goal of the split workflow.

2.3 Docker Build Workflow Trigger Logic

.github/workflows/docker-build.yml is intended to:

Run scan-pr-image for pull requests.
Run test-image for pull_request and workflow_dispatch.

The workflow also currently triggers on push. The user requirement is that no push events trigger CI run jobs, so all push triggers must be removed and job logic updated accordingly.

2.4 Codecov Default Behavior on Non-Dispatch Events

.github/workflows/codecov-upload.yml uses inputs.run_backend and inputs.run_frontend. On pull_request, inputs is undefined, leading to unintended skips. The intended behavior is to run by default on non-dispatch events and honor inputs only on workflow_dispatch. The workflow must not be triggered by push.

3. Technical Specifications

3.1 E2E Workflow Execution Hardening

Workflow: .github/workflows/e2e-tests-split.yml

Requirements:

Apply a robust execution pattern in every Playwright run step (security and non-security jobs) to capture exit codes and fail the job consistently.
Keep timing logs but do not allow Playwright failures to be hidden by a non-zero exit path.

Required run-step pattern:

set -euo pipefail
STATUS=0
npx playwright test ... || STATUS=$?
echo "PLAYWRIGHT_STATUS=$STATUS" >> "$GITHUB_ENV"
exit "$STATUS"

3.2 E2E Results Aggregation with Expected-Run Logic

Workflow: .github/workflows/e2e-tests-split.yml

Requirements:

Replicate the same input filtering logic used in the job if clauses to determine if each job should have run.
e2e-results MUST declare needs: [shard-jobs...] covering every shard job so the aggregation evaluates actual results.
Treat skipped, failure, or cancelled as failure outcomes when a job should have run.
Ignore skipped when a job should not have run.

Expected-run logic (derived from inputs and defaults):

effective_browser = inputs.browser || 'all'
effective_category = inputs.test_category || 'all'

For each job:

Security jobs should run when effective_browser matches the browser (or all) AND effective_category is security or all.
Non-security jobs should run when effective_browser matches the browser (or all) AND effective_category is non-security or all.

Aggregation behavior:

If a job should run and result is skipped, failure, or cancelled, the aggregation fails.
If a job should not run and result is skipped, ignore it.
Only return success when all expected jobs are successful.

3.3 Security Isolation for Non-Security Shards

Workflow: .github/workflows/e2e-tests-split.yml

Requirements:

Ensure non-security jobs do not execute any tests from tests/security-enforcement/ or tests/security/.
Validate whether tests/integration includes security tests. If it does, split those tests into the security suites or exclude them explicitly from non-security runs.
Maintain explicit path lists for non-security jobs to avoid accidental inclusion via globbing.

3.4 Docker Build Workflow Conditions (Preserve Intended Behavior)

Workflow: .github/workflows/docker-build.yml

Requirements:

Remove push triggers from the workflow on: block.
Keep scan-pr-image for pull_request only.
Keep test-image for pull_request and workflow_dispatch only.
Preserve skip_build gating.
Remove any job-level checks for github.event_name == 'push'.

Expected on: block:

on:
	pull_request:
		branches:
			- main
			- development
	workflow_dispatch:

Expected if conditions:

scan-pr-image: needs.build-and-push.outputs.skip_build != 'true' && needs.build-and-push.result == 'success' && github.event_name == 'pull_request'
test-image: needs.build-and-push.outputs.skip_build != 'true' && needs.build-and-push.result == 'success' && (github.event_name == 'pull_request' || github.event_name == 'workflow_dispatch')

3.5 Codecov Default Behavior on Non-Dispatch Events

Workflow: .github/workflows/codecov-upload.yml

Requirements:

Remove push triggers from the workflow on: block.
Keep default coverage uploads on non-dispatch events.
Only honor inputs.run_backend and inputs.run_frontend for workflow_dispatch.
Remove any job-level checks for github.event_name == 'push'.

Expected on: block:

on:
	pull_request:
		branches:
			- main
			- development
	workflow_dispatch:

Expected job conditions:

backend-codecov: ${{ github.event_name != 'workflow_dispatch' || inputs.run_backend != 'false' }}
frontend-codecov: ${{ github.event_name != 'workflow_dispatch' || inputs.run_frontend != 'false' }}

3.6 Trigger Policy: No Push CI Runs

Workflows: .github/workflows/e2e-tests-split.yml, .github/workflows/docker-build.yml, .github/workflows/codecov-upload.yml

Requirements:

Remove push from the on: block in each workflow.
Ensure only pull_request and workflow_dispatch events trigger runs.
Remove any job-level if clauses that include github.event_name == 'push'.

Expected on: blocks:

on:
	pull_request:
		branches:
			- main
			- development
	workflow_dispatch:

4. Implementation Plan

Phase 1: Playwright Tests (Behavior Definition)

No new Playwright tests are required; this change focuses on CI execution and aggregation.
Confirm current Playwright suites already cover security enforcement and non-security coverage across browsers.

Phase 2: Backend Implementation

Not applicable. No backend code changes are required.

Phase 3: Frontend Implementation

Not applicable. No frontend code changes are required.

Phase 4: Integration and Testing

Update all Playwright run steps in the split workflow to use the robust execution pattern.
Update e2e-results to compute expected-run logic and fail on unexpected skips.
Audit non-security test path lists to ensure security tests are not executed outside security jobs.
Update Docker workflow on: triggers and job conditions to remove push and align with pull_request and workflow_dispatch only.
Update Codecov workflow on: triggers and job conditions to default on non-dispatch events (no push triggers).
Update E2E split workflow on: triggers to remove push and align with pull_request and workflow_dispatch only.

Phase 5: Documentation and Deployment

Update this plan with final specs and ensure it matches workflow behavior.
No deployment or release changes are required.

5. Acceptance Criteria (EARS)

WHEN a Playwright job runs, THE SYSTEM SHALL record the Playwright exit status and exit the step with that status.
WHEN a job should have run based on inputs.browser and inputs.test_category, THE SYSTEM SHALL fail aggregation if the job result is skipped, failure, or cancelled.
WHEN a job should not have run based on inputs.browser and inputs.test_category, THE SYSTEM SHALL ignore its skipped result.
WHEN non-security shards run, THE SYSTEM SHALL NOT execute tests from tests/security-enforcement/ or tests/security/.
WHEN the Docker build workflow runs on a pull request, THE SYSTEM SHALL execute scan-pr-image.
WHEN the Docker build workflow runs on a workflow dispatch, THE SYSTEM SHALL execute test-image.
WHEN the Docker build workflow runs on a pull request, THE SYSTEM SHALL execute test-image.
WHEN the Codecov workflow runs on a non-dispatch event, THE SYSTEM SHALL execute backend and frontend coverage jobs by default.
WHEN the Codecov workflow is manually dispatched and inputs disable a job, THE SYSTEM SHALL skip the disabled job.
WHEN any of the target workflows are triggered, THE SYSTEM SHALL only allow pull_request and workflow_dispatch events and SHALL NOT run for push events.

6. Risks and Mitigations

Risk: Aggregation logic becomes too strict for selective runs. Mitigation: derive expected-run logic from inputs with the same defaults used in the workflow.
Risk: Security tests remain in tests/integration and leak into non-security shards. Mitigation: audit tests/integration and relocate or exclude security tests explicitly.
Risk: Removing push triggers reduces CI coverage for direct branch pushes. Mitigation: enforce pull_request-based checks and require manual dispatch for emergency validations.

7. Confidence Score

Confidence: 86 percent

Rationale: Required changes are localized to workflow triggers, job conditions, and test path selection. The main remaining uncertainty is whether any security tests reside in tests/integration, which must be verified before finalizing non-security path lists.

11 KiB Raw Blame History

1. Introduction

2. Research Findings

2.1 E2E Split Workflow Execution and Aggregation

2.2 Security Test Leakage into Non-Security Shards

2.3 Docker Build Workflow Trigger Logic

2.4 Codecov Default Behavior on Non-Dispatch Events

3. Technical Specifications

3.1 E2E Workflow Execution Hardening

3.2 E2E Results Aggregation with Expected-Run Logic

3.3 Security Isolation for Non-Security Shards

3.4 Docker Build Workflow Conditions (Preserve Intended Behavior)

3.5 Codecov Default Behavior on Non-Dispatch Events

3.6 Trigger Policy: No Push CI Runs

4. Implementation Plan

Phase 1: Playwright Tests (Behavior Definition)

Phase 2: Backend Implementation

Phase 3: Frontend Implementation

Phase 4: Integration and Testing

Phase 5: Documentation and Deployment

5. Acceptance Criteria (EARS)

6. Risks and Mitigations

7. Confidence Score

11 KiB

Raw Blame History