Files
Charon/docs/plans/archive/ci_sequencing_spec.md
2026-03-04 18:34:49 +00:00

7.8 KiB

title, status, scope
title status scope
CI Sequencing and Parallelization draft ci/pipeline, ci/integration, ci/coverage, ci/security

1. Introduction

This plan reorders CI job dependencies so lint remains first, integration runs after image build, E2E depends on integration-gate, and coverage/security jobs run in parallel with E2E once integration has passed.

Objectives:

  • Keep lint as the earliest gate before build-image.
  • Ensure integration tests run after build-image and before e2e.
  • Make e2e depend on integration-gate.
  • Start coverage and security jobs after integration-gate so they complete while E2E runs.
  • Preserve strict gating: if a required stage is skipped or fails, downstream gates fail.

2. Research Findings

2.1 Current CI Ordering

  • lint runs first and gates build-image.
  • Integration jobs depend on build-image, then integration-gate aggregates results.
  • e2e depends only on build-image.
  • Coverage and security jobs depend on build-image (CodeQL has no dependencies).
  • codecov-upload depends on coverage-gate and e2e.
  • pipeline-gate evaluates gate jobs based on input flags, not on integration status.

2.2 Impact of Current Ordering

  • E2E can run without integration tests completing.
  • Coverage and security jobs start before or during integration, competing for runners.
  • security-codeql can start immediately, cluttering early logs.

3. Technical Specifications

3.1 Target Dependency Graph

setup -> lint -> build-image
build-image -> integration-* -> integration-gate
integration-gate -> e2e -> e2e-gate
integration-gate -> coverage-* -> coverage-gate
integration-gate -> security-* -> security-gate
coverage-gate + e2e -> codecov-upload -> codecov-gate
pipeline-gate depends on lint, build-image, integration-gate, e2e-gate, coverage-gate, codecov-gate, security-gate

3.2 Job Dependency Updates

Update .github/workflows/ci-pipeline.yml:

  • e2e:

    • needs: build-image, integration-gate.
    • if: require needs.integration-gate.result == 'success' and needs.build-image.result == 'success' and the existing run_e2e input guard.
    • Keep build-image in needs for image_ref outputs.
  • e2e-gate:

    • Keep needs: e2e.
    • Update if guard to require integration to have run (see Section 3.4) so skips are treated consistently.
  • coverage-backend and coverage-frontend:

    • needs: integration-gate.
    • if: existing run_coverage input guard AND needs.integration-gate.result == 'success'.
    • No build-image dependency is required for coverage tests.
  • security-codeql:

    • Add needs: integration-gate.
    • if: existing run_security_scans guard AND needs.integration-gate.result == 'success' AND the existing fork guard.
  • security-trivy and security-supply-chain:

    • needs: build-image, integration-gate.
    • if: existing guards AND needs.integration-gate.result == 'success'.
  • security-gate:

    • Keep needs on all security jobs.
    • if: same as today, plus integration-enabled guard (Section 3.4).

3.3 Parallelization Strategy

  • Once integration-gate succeeds, start e2e, coverage-*, and security-* concurrently.
  • This ensures non-E2E work completes during the longer E2E runtime window.

3.4 Enablement and Strict Gating Logic

Adjust enablement expressions so skip behavior is intentional and strict:

  • Define an integration-enabled expression for reuse:
    • integration_enabled = needs.build-image.outputs.run_integration == 'true'
  • Define an integration-gate pass-or-skip expression for reuse:
    • integration_gate_ok = needs.integration-gate.result == 'success' || needs.integration-gate.result == 'skipped'

Update gate conditions to include integration_enabled:

  • e2e and e2e-gate:

    • Only enabled if run_e2e is not false.
    • Add guard: always() && integration_gate_ok so fork PRs (integration skipped) still run.
  • coverage-*, coverage-gate, codecov-upload, codecov-gate:

    • Only enabled if run_coverage is not false.
    • Add guard: always() && integration_gate_ok so fork PRs (integration skipped) still run.
  • security-*, security-gate:

    • Only enabled if run_security_scans is not false (and fork guard for CodeQL/Trivy/SBOM).
    • Add guard: always() && integration_gate_ok so fork PRs (integration skipped) still run.
  • pipeline-gate:

    • Update enabled checks (e2e_enabled, coverage_enabled, security_enabled) to use integration_gate_ok (not integration_enabled).
    • Keep strict gating: when enabled, skipped results remain failures.

3.5 Artifact and Output Dependencies

  • codecov-upload continues to depend on e2e for E2E coverage artifacts and on coverage-gate for unit coverage.
  • security-trivy and security-supply-chain continue using needs.build-image.outputs.image_ref_dockerhub.

3.6 Data Flow and Runners

  • Integration is isolated to its phase, reducing early runner contention.
  • The post-integration phase allows e2e, coverage-*, and security-* to run in parallel.

4. Implementation Plan

Phase 1: Playwright Tests (Behavior Baseline)

  • No UI behavior changes are expected; treat as baseline verification only.

Phase 2: Dependency Rewire

  • Update e2e to require integration-gate and build-image.
  • Add integration-gate to coverage-* and security-* needs.
  • Move security-codeql behind integration-gate.

Phase 3: Strict Gating Alignment

  • Update job if conditions to include integration_enabled.
  • Update pipeline-gate enablement logic to match new gating rules.

Phase 4: Validation

  • Verify that lint -> build-image -> integration -> e2e is enforced.
  • Confirm coverage and security jobs start only after integration succeeds.
  • Confirm codecov-upload and codecov-gate still run after coverage and E2E are complete.

5. Acceptance Criteria (EARS)

  • WHEN a pipeline starts, THE SYSTEM SHALL run lint before build-image.
  • WHEN build-image succeeds and integration is enabled, THE SYSTEM SHALL run all integration tests and aggregate them in integration-gate.
  • WHEN integration-gate succeeds or is skipped and E2E is enabled, THE SYSTEM SHALL start e2e and require it to pass before e2e-gate succeeds.
  • WHEN integration-gate succeeds or is skipped and coverage is enabled, THE SYSTEM SHALL start coverage-backend and coverage-frontend in parallel with E2E.
  • WHEN integration-gate succeeds or is skipped and security scans are enabled, THE SYSTEM SHALL start security-codeql, security-trivy, and security-supply-chain in parallel with E2E.
  • WHEN integration is skipped for fork PRs (run_integration=false), THE SYSTEM SHALL still run e2e, coverage-*, and security-* if their respective enablement flags are true.
  • IF integration-gate is not successful while integration is enabled, THEN THE SYSTEM SHALL skip e2e, coverage-*, and security-* and fail the appropriate gates.
  • WHEN coverage and E2E complete successfully, THE SYSTEM SHALL run codecov-upload and codecov-gate.

6. Risks and Mitigations

  • Risk: Coupling coverage/security to integration could reduce flexibility for ad-hoc runs. Mitigation: Keep run_integration default true; document that disabling integration disables downstream stages.

  • Risk: CodeQL no longer starts early, increasing total elapsed time for that job. Mitigation: CodeQL runs in parallel with E2E to keep total pipeline time stable.

  • Risk: Misaligned gate logic could mark expected skips as failures. Mitigation: Centralize enablement logic (integration_enabled) and apply consistently in job if conditions and pipeline-gate.

7. Confidence Score

Confidence: 88 percent

Rationale: The sequencing changes are localized to job needs and if expressions. The main uncertainty is ensuring gate logic stays strict while respecting the new integration-first requirement.