12 KiB
Executable File
CI Pipeline Integration and Gate Enforcement Fix Plan
Introduction
This plan addresses two pipeline defects in .github/workflows/ci-pipeline.yml :
- Integration jobs are skipped even when the image build/push is successful.
- Gate jobs report success even when upstream jobs are skipped.
The goal is to make the execution order deterministic and strict: Setup -> Build/Push -> Integration -> Integration Gate -> E2E -> E2E Gate, with gates failing if any required dependency is not successful.
Research Findings
Integration jobs are conditionally skipped
The integration jobs (integration-cerberus, integration-crowdsec, integration-waf, integration-ratelimit) are gated by the same if: expression in .github/workflows/ci-pipeline.yml . That expression requires:
needs.build.result == 'success'needs.build.outputs.image_ref != ''- the workflow not being explicitly disabled via
workflow_dispatchinput
This creates two likely skip paths:
- Image reference availability is tied to Docker Hub only. If the build job does not push or resolve a Docker Hub reference, integration jobs skip even if an image exists elsewhere (e.g., GHCR).
- Push policy is not part of the integration condition. The build job exposes
image_pushed, but integration jobs do not check it. This prevents a predictable decision about whether an image is actually available in a registry the jobs can pull from.
Gate jobs accept skipped dependencies
The gate jobs (integration-gate, coverage-gate, codecov-gate, pipeline-gate) use if: always() and only fail on failure or cancelled. They do not fail on skipped, which allows skipped dependencies to be treated as a success.
Examples in .github/workflows/ci-pipeline.yml :
integration-gateexits 0 when integration is skipped due to build state orrun_integrationbeing false.coverage-gateandpipeline-gatedo not enforce a strict success-only check across dependencies.
Reusable E2E workflow masks skipped jobs
The reusable workflow .github/workflows/e2e-tests-split.yml includes a final job that explicitly converts skipped to success. That behavior is useful for partial workflow_dispatch runs, but in CI (where browser=all and test_category=all) it allows a silent skip to pass.
Technical Specifications
Requirements (EARS Notation)
- WHEN the build-and-push stage completes and produces a successful push, THE SYSTEM SHALL start all integration jobs.
- WHEN integration is required, THE SYSTEM SHALL fail the integration gate if any integration job result is not
success. - WHEN E2E tests are required, THE SYSTEM SHALL fail the E2E gate if the reusable workflow result is not
success. - WHEN coverage jobs are required, THE SYSTEM SHALL fail the coverage gate if any coverage or E2E dependency is not
success. - WHEN any required gate fails, THE SYSTEM SHALL fail the pipeline gate.
- WHEN a stage is enabled, THE SYSTEM SHALL treat any
skippedormissingdependency as a gate failure. - IF a stage is explicitly disabled via
workflow_dispatchorworkflow_callinput, THEN THE SYSTEM SHALL skip the stage and its gate by using the same stage-enabled condition on the gate job.
Integration job eligibility and image selection
Define a single computed boolean output that decides whether integration should run. This avoids duplicating conditions across jobs, aligns with the image availability policy, and normalizes input booleans across workflow_dispatch and workflow_call.
Definitive architecture:
-
Job
setupoutputsinput_run_integration(user intent only). -
Job
build-and-pushcomputes finalrun_integration. -
Computed logic:
run_integration = (needs.setup.outputs.input_run_integration == 'true') && (steps.push.outcome == 'success'). -
Dependent jobs (integration + gate) use the exact same
ifexpression:${{ needs.build-and-push.outputs.run_integration == 'true' }}. -
Gate logic fails if any
needsis notsuccess. -
run_integration=trueif and only if:needs.setup.outputs.input_run_integrationis true, and- the push step in
build-and-pushsucceeds.
-
Integration tests run in a separate job and require the image to be available in a registry. A
pull_requestevent alone does not permit integration to run without a pushed image.
Recommended outputs:
setup.outputs.input_run_integration: normalized input boolean derived fromworkflow_dispatchorworkflow_callbuild-and-push.outputs.image_ref: resolved image reference with fallback to GHCRbuild-and-push.outputs.image_registry:dockerhuborghcrbuild-and-push.outputs.image_pushed:trueonly when a registry push occurredbuild-and-push.outputs.run_integration: computed eligibility boolean
Integration jobs should use the same if: expression based on needs.build-and-push.outputs.run_integration and should pull from the resolved image_ref.
Gate enforcement pattern (fail on skipped or failed)
Use a strict pattern that fails on anything other than success when a stage is required. This should be reusable across integration, coverage, E2E, and pipeline gates. Gate jobs MUST use the same stage-enabled if as the jobs in the stage.
For integration, the gate job if condition must be ${{ needs.build-and-push.outputs.run_integration == 'true' }}.
Gate logic details (explicit YAML/script pattern):
- Gate job uses the same stage-enabled
ifas the jobs in the stage. - Gate job uses a single verification step that inspects
needsvia JSON and fails if any required job is notsuccess(includingskippedormissing). - Gate job is skipped when the stage is intentionally disabled, since the job-level
ifmatches the stage condition.
Reusable pattern (standard block or composite action):
- Inputs:
required_jobs: JSON array of job ids in scope for that gate.
- Logic:
- Iterate
required_jobsand fail on any result not equal tosuccess.
- Iterate
Canonical gate step example (for plan reference):
steps:
- name: Evaluate gate
env:
NEEDS_JSON: ${{ toJSON(needs) }}
REQUIRED_JOBS: ${{ inputs.required_jobs }}
run: |
set -euo pipefail
for job in $(echo "$REQUIRED_JOBS" | jq -r '.[]'); do
result=$(echo "$NEEDS_JSON" | jq -r --arg job "$job" '.[$job].result // "missing"')
if [[ "$result" != "success" ]]; then
echo "::error::Gate failed: $job result is $result"
exit 1
fi
done
Example stage_enabled signals by gate:
- Integration gate:
needs.build-and-push.outputs.run_integration == 'true' - E2E gate:
inputs.run_e2e == 'true'(or the equivalent workflow input) - Coverage gate:
inputs.run_coverage == 'true' - Pipeline gate: always true, but only depends on gates and required security jobs
E2E strictness
In .github/workflows/e2e-tests-split.yml , the final e2e-results job should only convert skipped to success when the skip is intentional (for example, the workflow is manually dispatched with browser or test_category not including that job). For CI runs with browser=all and test_category=all, any skipped job should be treated as a failure.
Integration run logic (must match actual build/push)
Integration jobs must depend on the actual execution of the build/push step and the explicit input toggle. Use a single source of truth from setup and build-and-push outputs:
setup.outputs.input_run_integration: normalized input boolean derived fromworkflow_dispatchorworkflow_callbuild-and-push.outputs.image_ref: resolved registry reference from the same pushbuild-and-push.outputs.image_pushed:trueonly when a registry push occurredbuild-and-push.outputs.run_integration: computed boolean that validates input enablement and push availability
Integration job if: should be:
if: ${{ needs.build-and-push.outputs.run_integration == 'true' }}
run_integration must be computed using the strict integration requirement:
run_integration: ${{ (needs.setup.outputs.input_run_integration == 'true') && (steps.push.outcome == 'success') }}
Boolean/type safety
- Normalize
workflow_dispatchstring inputs usingfromJSONbefore comparison. - Preserve
workflow_callboolean inputs as-is, and pass them throughinputs.*without string comparisons. - Use a setup step to emit normalized boolean outputs (for example,
inputs.run_integration) so job conditions stay consistent and avoid mixed string/boolean logic.
Fail-fast strategy (efficiency)
Document and enforce a fail-fast strategy to reduce wasted runtime:
- For matrix jobs (E2E, coverage, or any parallel test suites), set
strategy.fail-fast: truefor CI runs so other matrix jobs stop when one fails. - Downstream stages must
needtheir gate job to prevent unnecessary execution after a failure. - Use workflow
concurrencywithcancel-in-progress: truefor CI workflows targeting the same branch to avoid redundant runs.
Sequence enforcement
Ensure the dependency chain is explicit and strict:
setupbuild-and-push- integration jobs
integration-gatee2e(reusable workflow)e2e-gate(new)- coverage jobs
coverage-gatecodecov-gate- security jobs
pipeline-gate
Implementation Plan
Phase 1: CI Workflow Validation Plan
- Add or update workflow validation checks to detect skipped jobs in CI mode.
- Update
e2e-tests-split.ymlso the finale2e-resultsjob fails if any job is skipped wheninputs.browser=allandinputs.test_category=all.
Phase 2: Integration Stage Fix
- Add
input_run_integrationoutput insetup. - Add a computed
run_integrationoutput inbuild-and-pushusing the push step outcome. - Add a resolved
image_refoutput that can use GHCR as a fallback if Docker Hub is unavailable. - Update all integration jobs to use the computed
run_integrationoutput and the resolvedimage_ref.
Phase 3: Gate Standardization
- Add a new
e2e-gatejob that fails ifneeds.e2e.resultis notsuccesswhen E2E is required. - Implement a reusable gate-check block or composite action that accepts
required_jobsandstage_enabledinputs. - Update
integration-gate,coverage-gate,codecov-gate, andpipeline-gateto enforce a strict success-only check for required dependencies.
Phase 4: Sequence and Dependency Updates
- Wire dependencies so
coverage-backendandcoverage-frontenddepend one2e-gaterather thanintegration-gatedirectly. - Ensure
pipeline-gatedepends on all gates and required security jobs.
Phase 5: Documentation and Verification
- Update this plan with any final implementation decisions once validated.
- Document the new gating behavior in relevant CI documentation if present.
Acceptance Criteria
- Integration jobs run whenever
input_run_integrationis true and the build/push step succeeds. - Integration gate fails if any integration job is
skipped,failure, orcancelledwhile integration is required. - E2E gate fails if the reusable E2E workflow result is not
successwhile E2E is required. - Coverage gate fails if any coverage or E2E dependency is not
successwhile coverage is required. - Pipeline gate fails if any required gate or security job is not
success. - The execution order is enforced as: Build -> Integration -> Integration Gate -> E2E -> E2E Gate -> Coverage -> Coverage Gate -> Codecov Gate -> Security -> Pipeline Gate.
- Fail-fast behavior is documented and applied for matrix jobs in CI runs.