Files
Charon/docs/plans/current_spec.md
GitHub Actions e7f791044d chore: Refactor CI workflows for pipeline consolidation and manual dispatch triggers
- Updated quality-checks.yml to support manual dispatch with frontend checks.
- Modified rate-limit-integration.yml to remove workflow_run triggers and adjust conditions for execution.
- Removed pull request triggers from repo-health.yml, retaining only scheduled and manual dispatch.
- Adjusted security-pr.yml and supply-chain-pr.yml to eliminate workflow_run dependencies and refine execution conditions.
- Cleaned up supply-chain-verify.yml by removing workflow_run triggers and ensuring proper execution conditions.
- Updated waf-integration.yml to remove workflow_run triggers, allowing manual dispatch only.
- Revised current_spec.md to reflect the consolidation of CI workflows into a single pipeline, detailing objectives, research findings, and implementation plans.
2026-02-08 05:36:29 +00:00

303 lines
12 KiB
Markdown

---
title: "CI Pipeline Consolidation"
status: "draft"
scope: "ci/pipeline"
notes: This plan replaces the current CI workflow chain with a single pipeline that supports PR triggers while keeping maintenance workflows scheduled.
---
## 1. Introduction
This plan consolidates the existing CI workflows into one pipeline
workflow that can trigger on pull requests across branches (in addition
to manual dispatch). The pipeline will run in a strict order defined by
the user:
lint, build, parallel integration prerequisites, E2E, parallel
coverage, then security scans. All stages will consume the same built
Docker image to ensure consistent test results.
Maintenance workflows remain scheduled (nightly/weekly/Renovate/repo
health) and are explicitly out of scope for trigger changes.
Out of scope: Alpine migration. Any base-image migration work must be
captured in a separate plan/spec.
Objectives:
- Enable the pipeline to run on pull requests across branches in
addition to manual dispatch.
- Create one pipeline workflow that sequences jobs in the requested
order with explicit dependencies.
- Ensure all integration, E2E, coverage, and security checks use the
same image digest produced by the pipeline build job.
- Push the pipeline image to Docker Hub and GHCR, but use Docker Hub as
the test image source.
- Keep the E2E image tag unchanged from the current convention.
- Align the pipeline with the current Definition of Done (DoD) by
mapping required checks into pipeline stages.
- Preserve scheduled maintenance workflows and do not convert them to
manual-only triggers.
## 2. Research Findings
### 2.1 Current Workflow Topology
The CI chain is currently split across multiple workflows linked by
workflow_run triggers. The core files in scope are:
- .github/workflows/docker-build.yml
- .github/workflows/docker-lint.yml
- .github/workflows/e2e-tests-split.yml
- .github/workflows/quality-checks.yml
- .github/workflows/codecov-upload.yml
- .github/workflows/codeql.yml
- .github/workflows/security-pr.yml
- .github/workflows/supply-chain-pr.yml
- .github/workflows/cerberus-integration.yml
- .github/workflows/crowdsec-integration.yml
- .github/workflows/waf-integration.yml
- .github/workflows/rate-limit-integration.yml
- .github/workflows/benchmark.yml
- .github/workflows/supply-chain-verify.yml
Several maintenance workflows also exist (nightly builds, weekly
security rebuilds, repository health, Renovate automation). They are
not part of the requested pipeline order and will remain scheduled
with their existing triggers.
### 2.2 Current Image Tagging and Digest Sources
- docker-build.yml outputs a build digest from the buildx iidfile and
pushes images to GHCR and Docker Hub.
- Tags currently include:
- pr-{number}-{short-sha} for PRs
- {sanitized-branch}-{short-sha} for feature branches
- latest/dev/nightly for main/development/nightly builds
- sha-{short-sha} for non-PR builds
- nightly branch tag (per user request) for nightly branch builds
### 2.3 Definition of Done (DoD) Alignment
The DoD requires E2E tests to run first, then security scans, pre-commit
checks, static analysis, coverage gates, type checks, and build
verification. The requested pipeline order differs by placing E2E after
integration prerequisites and before coverage and security scans.
Decision: the pipeline order is authoritative for CI. The DoD
order remains guidance for local workflows, but CI ordering will follow
the requested pipeline sequence and map required checks into stages.
## 3. Technical Specifications
### 3.1 Workflow Trigger Strategy
The new pipeline workflow will trigger on pull_request across branches
and workflow_dispatch. Existing CI workflows listed in Section 2.1 will
be converted to workflow_dispatch only (no PR triggers). Existing
workflow_run triggers will be removed. Scheduled maintenance workflows
will keep their schedules intact.
### 3.2 New Pipeline Workflow
Create a new workflow file that runs the entire pipeline in one run:
- File: .github/workflows/ci-pipeline.yml
- Trigger: workflow_dispatch and pull_request across branches
- Inputs:
- image_tag_override (optional)
- run_coverage (boolean)
- run_security_scans (boolean)
- run_integration (boolean)
- run_e2e (boolean)
### 3.3 Job Order and Dependencies
The pipeline job graph will enforce the requested order.
Job dependency table:
| Job | Purpose | Needs |
| --- | --- | --- |
| lint | Dockerfile lint, Go lint, frontend lint, repo health | none |
| build-image | Build and push Docker image, emit digest | lint |
| integration-cerberus | Cerberus integration tests | build-image |
| integration-crowdsec | CrowdSec integration tests | build-image |
| integration-waf | WAF integration tests | build-image |
| integration-ratelimit | Rate limit integration tests | build-image |
| e2e | Playwright E2E split workflow equivalent | integration-* |
| coverage-backend | Go tests with coverage and Codecov upload | e2e |
| coverage-frontend | Frontend tests with coverage and Codecov upload | e2e |
| coverage-e2e | Optional E2E coverage job | e2e |
| security-codeql | CodeQL Go and JS scans | coverage-* |
| security-trivy | Trivy image scan | coverage-* |
| security-supply-chain | SBOM generation and attestation | coverage-* |
Integration jobs should run in parallel. Coverage and security jobs
should run in parallel within their stages.
### 3.4 Shared Image Strategy
All downstream jobs must use the same image digest produced by the
build-image job. The build-image job will output:
- image_digest: from docker/build-push-action or iidfile
- image_ref: docker.io/wikid82/charon@sha256:...
- image_ref_ghcr: ghcr.io/wikid82/charon@sha256:...
- image_tag: pr-{number}-{short-sha} or sha-{short-sha}
Downstream jobs will pull the image by digest to ensure immutability and
retag it locally as charon:e2e-test for docker compose usage. For test
stages, the image source registry must be Docker Hub even though GHCR is
also pushed. The E2E image tag must remain unchanged from the current
convention.
### 3.5 Required File Updates
Workflow updates to manual-only triggers:
- .github/workflows/docker-build.yml
- .github/workflows/docker-lint.yml
- .github/workflows/e2e-tests-split.yml
- .github/workflows/quality-checks.yml
- .github/workflows/codecov-upload.yml
- .github/workflows/codeql.yml
- .github/workflows/security-pr.yml
- .github/workflows/supply-chain-pr.yml
- .github/workflows/cerberus-integration.yml
- .github/workflows/crowdsec-integration.yml
- .github/workflows/waf-integration.yml
- .github/workflows/rate-limit-integration.yml
- .github/workflows/benchmark.yml
- .github/workflows/supply-chain-verify.yml
Workflow additions (PR + manual triggers):
- .github/workflows/ci-pipeline.yml
Optional configuration updates if required for image reuse:
- .docker/compose/docker-compose.playwright-ci.yml (use image ref or
tag via environment variable)
- scripts/*.sh or .github/skills/scripts/skill-runner.sh, only if
necessary to accept image ref overrides
### 3.6 Error Handling and Gates
- Fail fast in lint and build stages.
- Integration, E2E, coverage, and security stages should fail the
pipeline if any job fails.
- Preserve existing retry behavior for registry pushes and pulls.
### 3.7 Required Checks and Branch Protection
- Add a pipeline summary job (e.g., pipeline-gate) that depends on all
pipeline jobs and fails if any required job fails.
- Require the pipeline-gate status check in branch protection/rulesets
for main and release branches.
- Pipeline workflows remain required by enforcing that the pipeline is
run against the merge commit or branch HEAD before merge.
- Keep admin bypass disabled for protected branches unless explicitly
approved.
### 3.7 Requirements (EARS Notation)
- WHEN a user manually dispatches the pipeline or opens a pull request,
THE SYSTEM SHALL run the lint stage before any build or test jobs.
- WHEN the build stage completes, THE SYSTEM SHALL publish a single
image digest that all later jobs consume.
- WHEN any integration test fails, THE SYSTEM SHALL stop the pipeline
before E2E execution.
- WHEN E2E completes, THE SYSTEM SHALL run coverage jobs in parallel.
- WHEN coverage completes, THE SYSTEM SHALL run security scans in
parallel using the same image digest.
- WHEN the pipeline pushes images, THE SYSTEM SHALL push to Docker Hub
and GHCR but use Docker Hub as the test image source.
- WHEN E2E runs, THE SYSTEM SHALL keep the existing E2E image tag and
preserve the security shard as a separate shard with the current
timeout-safe layout.
- IF any required DoD check fails, THEN THE SYSTEM SHALL fail the
pipeline and report the failing stage.
## 4. Implementation Plan
### Phase 1: Playwright Tests (Behavior Baseline)
- Validate the existing Playwright suites used by e2e-tests-split.yml
can run under the new pipeline using the shared image digest.
- Confirm the E2E stage still honors security and non-security shards
and that Cerberus toggle logic is preserved.
### Phase 2: Backend and CI Workflow Refactor
- Add the new pipeline workflow file.
- Modify existing CI workflows in Section 3.5 to use workflow_dispatch
only (no pull_request triggers).
- Move the docker-build logic into the pipeline build-image job and
export digest and tag outputs.
- Update integration job steps to consume the digest and retag locally
as needed for existing scripts.
### Phase 3: Frontend and E2E Workflow Refactor
- Update the E2E steps to pull the Docker Hub digest and retag to
charon:e2e-test before docker compose starts.
- Ensure environment variables or compose overrides reference the
shared image and keep the E2E tag unchanged.
- Preserve E2E sharding so the security shard remains separate and the
shard layout avoids timeouts.
### Phase 4: Coverage and Security Stage Consolidation
- Replace codecov-upload.yml and codeql.yml with pipeline jobs that run
after E2E completion.
- Ensure Codecov uploads and CodeQL scans run with the same code
checkout and digest metadata for traceability.
### Phase 5: Documentation and DoD Alignment
- Update docs/plans/current_spec.md with the final pipeline plan.
- Document the DoD ordering impact and confirm whether the DoD should
be updated to match the new pipeline order or the pipeline should
adapt to the DoD ordering.
### Phase 6: Branch Protection Updates
- Update branch protection/rulesets to require the pipeline-gate check.
- Document the manual pipeline run requirement for PR validation.
## 5. Acceptance Criteria
- The pipeline workflow triggers via pull_request across branches and
workflow_dispatch.
- All CI workflows listed in Section 3.5 trigger via
workflow_dispatch only and no longer use workflow_run or
pull_request.
- Maintenance workflows (nightly/weekly/Renovate/repo health) retain
their scheduled triggers and are not changed to PR/manual-only.
- The new pipeline workflow runs lint, build, integration, E2E,
coverage, and security stages in the requested order.
- Integration, E2E, coverage, and security jobs consume the same image
digest produced by the build stage.
- The pipeline exposes image_digest and image_ref outputs for audit
and debugging.
- All DoD-required checks are represented in the pipeline and fail the
run when they do not pass.
- The pipeline pushes images to Docker Hub and GHCR, and test stages
pull from Docker Hub.
- E2E sharding keeps the security shard separate and retains the
timeout-safe shard layout.
- The nightly branch tag remains part of the image tagging scheme.
## 6. Risks and Mitigations
- Risk: PR-triggered pipeline increases CI load and could cause noisy
failures on draft or experimental branches.
- Mitigation: keep legacy workflows manual-only, enforce the
pipeline-gate required check, and allow maintainers to re-run the
pipeline as needed.
## 7. Confidence Score
Confidence: 86 percent
Rationale: Manual pipeline consolidation is well scoped, but requires
careful coordination with branch protection and required checks.