336 lines
14 KiB
Markdown
336 lines
14 KiB
Markdown
---
|
|
title: "CI Pipeline Reliability and Docker Tagging"
|
|
status: "draft"
|
|
scope: "ci/linting, ci/integration, docker/publishing"
|
|
notes: Restore Go linting parity, prevent integration-stage cancellation after successful image builds, and correct Docker tag outputs across CI workflows.
|
|
---
|
|
|
|
## 1. Introduction
|
|
|
|
This plan expands the CI scope to address three related gaps: missing Go
|
|
lint enforcement, integration jobs being cancelled after a successful
|
|
image build, and incomplete Docker tag outputs on Docker Hub. The
|
|
intended outcome is a predictable pipeline where linting blocks early,
|
|
integration and E2E gates complete reliably, and registries receive the
|
|
full tag set required for traceability and stable consumption.
|
|
|
|
Objectives:
|
|
|
|
- Reinstate golangci-lint in the pipeline lint stage.
|
|
- Use the fast config that already blocks local commits.
|
|
- Ensure golangci-lint config is valid for the version used in CI.
|
|
- Remove CI-only leniency so lint failures block merges.
|
|
- Prevent integration jobs from being cancelled when image builds have
|
|
already completed successfully.
|
|
- Ensure Docker Hub and GHCR receive SHA-only and branch+SHA tags, plus
|
|
latest/dev/nightly tags for main/development/nightly branches.
|
|
- Keep CI behavior consistent across pre-commit, Makefile, VS Code
|
|
tasks, and GitHub Actions workflows.
|
|
|
|
## 2. Research Findings
|
|
|
|
### 2.1 Current CI State (Linting)
|
|
|
|
- The main pipeline is [ .github/workflows/ci-pipeline.yml ] and its
|
|
lint job runs repo health, Hadolint, GORM scanner, and frontend lint.
|
|
There is no Go lint step in this pipeline.
|
|
- A separate manual workflow, [ .github/workflows/quality-checks.yml ],
|
|
runs golangci-lint with `continue-on-error: true`, which means CI does
|
|
not block on Go lint failures.
|
|
|
|
### 2.2 Integration Cancellation Symptoms
|
|
|
|
- [ .github/workflows/ci-pipeline.yml ] defines workflow-level
|
|
concurrency:
|
|
`group: ci-manual-pipeline-${{ github.workflow }}-${{ github.ref_name }}`
|
|
with `cancel-in-progress: true`.
|
|
- Integration jobs depend on `build-image` and gate on
|
|
`inputs.run_integration != false` and
|
|
`needs.build-image.outputs.push_image == 'true'`.
|
|
- Integration-gate fails if any dependent integration job reports
|
|
`failure` or `cancelled`, and runs with `if: always()`.
|
|
- A workflow-level cancellation after the build-image job completes will
|
|
cancel downstream integration jobs even though the build succeeded.
|
|
|
|
### 2.3 Current Image Tag Outputs
|
|
|
|
- In [ .github/workflows/ci-pipeline.yml ], the `Compute image tags`
|
|
step emits:
|
|
- `DEFAULT_TAG` (sha-<short> or pr-<number>-<short>)
|
|
- latest/dev/nightly tags based on `github.ref_name`
|
|
- In [ .github/workflows/docker-build.yml ], `docker/metadata-action`
|
|
emits tags including:
|
|
- `type=raw,value=pr-${{ env.TRIGGER_PR_NUMBER }}-{{sha}}` for PRs
|
|
- `type=sha,format=short` for non-PRs
|
|
- feature branch tag via `steps.feature-tag.outputs.tag`
|
|
- `latest` only when `is_default_branch` is true
|
|
- `dev` only when `env.TRIGGER_REF == 'refs/heads/development'`
|
|
- Docker Hub currently shows only PR and SHA-prefixed tags for some
|
|
builds; SHA-only and branch+SHA tags are not emitted consistently.
|
|
- Nightly tagging exists in [ .github/workflows/nightly-build.yml ],
|
|
but the main Docker build workflow does not emit a `nightly` tag based
|
|
on branch detection.
|
|
|
|
## 3. Technical Specifications
|
|
|
|
### 3.1 CI Lint Job (Pipeline)
|
|
|
|
Add a Go lint step to the lint job in
|
|
[ .github/workflows/ci-pipeline.yml ]:
|
|
|
|
- Tooling: `golangci/golangci-lint-action`.
|
|
- Working directory: `backend`.
|
|
- Config: `backend/.golangci-fast.yml`.
|
|
- Timeout: match config intent (2m fast, or 5m if parity with other
|
|
pipeline steps is preferred).
|
|
- Failures: do not allow `continue-on-error`.
|
|
|
|
### 3.2 CI Lint Job (Manual Quality Checks)
|
|
|
|
Update [ .github/workflows/quality-checks.yml ] to align with local
|
|
blocking behavior:
|
|
|
|
- Remove `continue-on-error: true` from the golangci-lint step.
|
|
- Ensure the step points to `backend/.golangci-fast.yml` or runs in
|
|
`backend` so that the config is picked up deterministically.
|
|
- Pin golangci-lint version to the same major used in CI pipeline to
|
|
avoid config drift.
|
|
|
|
### 3.3 Integration Cancellation Root Cause and Fix
|
|
|
|
Investigate and address workflow-level cancellation affecting
|
|
integration jobs after `build-image` completes.
|
|
|
|
Required investigation steps:
|
|
|
|
- Inspect recent CI runs for cancellation reasons in the Actions UI
|
|
(workflow-level cancellation vs job-level failure).
|
|
- Confirm whether cancellations coincide with the workflow-level
|
|
concurrency group in [ .github/workflows/ci-pipeline.yml ].
|
|
- Verify `inputs.run_integration` values are only populated on
|
|
`workflow_dispatch` events and evaluate the behavior on
|
|
`pull_request` events.
|
|
- Verify `needs.build-image.outputs.push_image` and
|
|
`needs.build-image.outputs.image_ref_dockerhub` are set for non-fork
|
|
pull requests and branch pushes.
|
|
|
|
Proposed fix (preferred):
|
|
|
|
- Remove workflow-level concurrency from
|
|
[ .github/workflows/ci-pipeline.yml ] and instead apply job-level
|
|
concurrency to the build-image job only, keeping cancellation limited
|
|
to redundant builds while allowing downstream integration/E2E/coverage
|
|
jobs to finish.
|
|
- Add explicit guards to integration jobs:
|
|
`if: needs.build-image.result == 'success' &&
|
|
needs.build-image.outputs.push_image == 'true' &&
|
|
needs.build-image.outputs.image_ref_dockerhub != '' &&
|
|
(inputs.run_integration != false)`.
|
|
- Update the integration-gate logic to treat `skipped` jobs as
|
|
non-fatal and only fail on `failure` or `cancelled` when
|
|
`needs.build-image.result == 'success'` and `push_image == 'true'`.
|
|
|
|
Alternative fix (not recommended; does not meet primary objective):
|
|
|
|
- Keep workflow-level concurrency but change to
|
|
`cancel-in-progress: ${{ github.event_name == 'pull_request' }}` so
|
|
branch pushes and manual dispatches complete all downstream jobs.
|
|
- This option still cancels PR runs after successful builds, which
|
|
conflicts with the primary objective of allowing integration gates
|
|
to complete reliably.
|
|
|
|
### 3.4 Image Tag Outputs (CI Pipeline)
|
|
|
|
Update the `Compute image tags` step in
|
|
[ .github/workflows/ci-pipeline.yml ] to emit additional tags.
|
|
|
|
Required additions:
|
|
|
|
- SHA-only tag (short SHA, no prefix):
|
|
`${SHORT_SHA}` for both GHCR and Docker Hub.
|
|
- Tag normalization rules for `SANITIZED_BRANCH`:
|
|
- Ensure the tag is non-empty after sanitization.
|
|
- Ensure the first character is `[a-z0-9]`; if it would start with
|
|
`-` or `.`, normalize by trimming leading `-` or `.` and recheck.
|
|
- Replace non-alphanumeric characters with `-` and collapse multiple
|
|
`-` characters into one.
|
|
- Limit the tag length to 128 characters after normalization.
|
|
- Fallback: if the sanitized result is empty or still invalid after
|
|
normalization, use `branch` as the fallback prefix.
|
|
- Branch+SHA tag for non-PR events using a sanitized branch name derived
|
|
from `github.ref_name` (lowercase, `/` → `-`, non-alnum → `-`,
|
|
trimmed, collapsed). Example:
|
|
`${SANITIZED_BRANCH}-${SHORT_SHA}`.
|
|
- Preserve existing `pr-${PR_NUMBER}-${SHORT_SHA}` for PRs.
|
|
- Keep `latest`, `dev`, and `nightly` tags based on:
|
|
`github.ref_name == 'main' | 'development' | 'nightly'`.
|
|
|
|
Decision point: SHA-only tags for PR builds
|
|
|
|
- Option A (recommended): publish SHA-only tags only for trusted
|
|
branches (main/development/nightly and non-fork pushes). PR builds
|
|
continue to use `pr-${PR_NUMBER}-${SHORT_SHA}` without SHA-only tags.
|
|
- Option B: publish SHA-only tags for PR builds when image push is
|
|
enabled for a non-fork authorized run (e.g., same-repo PRs), in
|
|
addition to PR-prefixed tags.
|
|
- Assumption (default until decided): follow Option A to avoid
|
|
ambiguous SHA-only tags for untrusted PR contexts.
|
|
|
|
Required step-level variables and expressions:
|
|
|
|
- Step: `Compute image tags` (id: `tags`).
|
|
- Variables: `SHORT_SHA`, `DEFAULT_TAG`, `PR_NUMBER`, `SANITIZED_BRANCH`.
|
|
- Expressions:
|
|
- `${{ github.event_name }}`
|
|
- `${{ github.ref_name }}`
|
|
- `${{ github.event.pull_request.number }}`
|
|
|
|
### 3.5 Image Tag Outputs (docker-build.yml)
|
|
|
|
Update [ .github/workflows/docker-build.yml ] `Generate Docker metadata`
|
|
tags to match the required outputs.
|
|
|
|
Required additions:
|
|
|
|
- Add SHA-only short tag for all events:
|
|
`type=sha,format=short,prefix=,suffix=`.
|
|
- Add branch+SHA short tag for non-PR events using a sanitized branch
|
|
name derived from `env.TRIGGER_REF` or `env.TRIGGER_HEAD_BRANCH`.
|
|
- Apply the same tag normalization rules as the CI pipeline
|
|
(`SANITIZED_BRANCH` non-empty, leading character normalized, length
|
|
<= 128, fallback to `branch`).
|
|
- Add explicit branch tags for main/development/nightly based on
|
|
`env.TRIGGER_REF` (do not rely on `is_default_branch` for
|
|
workflow_run triggers):
|
|
- `type=raw,value=latest,enable=${{ env.TRIGGER_REF == 'refs/heads/main' }}`
|
|
- `type=raw,value=dev,enable=${{ env.TRIGGER_REF == 'refs/heads/development' }}`
|
|
- `type=raw,value=nightly,enable=${{ env.TRIGGER_REF == 'refs/heads/nightly' }}`
|
|
|
|
Required step names and variables:
|
|
|
|
- Step: `Compute feature branch tag` (id: `feature-tag`) remains for
|
|
`refs/heads/feature/*`.
|
|
- New step: `Compute branch+sha tag` (id: `branch-tag`) for all
|
|
non-PR events using `TRIGGER_REF`.
|
|
- Metadata step: `Generate Docker metadata` (id: `meta`).
|
|
- Expressions:
|
|
- `${{ env.TRIGGER_EVENT }}`
|
|
- `${{ env.TRIGGER_REF }}`
|
|
- `${{ env.TRIGGER_HEAD_SHA }}`
|
|
- `${{ env.TRIGGER_PR_NUMBER }}`
|
|
- `${{ steps.branch-tag.outputs.tag }}`
|
|
|
|
### 3.6 Repository Hygiene Review (Requested)
|
|
|
|
- [ .gitignore ]: No change required for CI updates; no new artifacts
|
|
introduced by the tag changes.
|
|
- [ codecov.yml ]: No change required; coverage configuration remains
|
|
correct.
|
|
- [ .dockerignore ]: No change required; CI-only YAML edits are already
|
|
excluded from Docker build context.
|
|
- [ Dockerfile ]: No change required; tagging logic is CI-only.
|
|
- [ Branch tag normalization ]: No new files required; logic should be
|
|
implemented in existing CI steps only.
|
|
|
|
## 4. Implementation Plan
|
|
|
|
### Phase 1: Playwright Tests (Behavior Baseline)
|
|
|
|
- Confirm that no UI behavior is affected by CI-only changes.
|
|
- Keep this phase as a verification note: E2E is unchanged and can be
|
|
re-run if CI changes surface unexpected side effects.
|
|
|
|
### Phase 2: Pipeline Lint Restoration
|
|
|
|
- Add a Go lint step to the lint job in
|
|
[ .github/workflows/ci-pipeline.yml ].
|
|
- Use `backend/.golangci-fast.yml` and ensure the step blocks on
|
|
failure.
|
|
- Keep the lint job dependency order intact (repo health → Hadolint →
|
|
GORM scan → Go lint → frontend lint).
|
|
|
|
### Phase 3: Integration Cancellation Fix
|
|
|
|
- Remove workflow-level concurrency from
|
|
[ .github/workflows/ci-pipeline.yml ] and add job-level concurrency
|
|
on `build-image` only.
|
|
- Add explicit `if` guards to integration jobs based on
|
|
`needs.build-image.result`, `needs.build-image.outputs.push_image`,
|
|
and `needs.build-image.outputs.image_ref_dockerhub`.
|
|
- Update `integration-gate` to ignore `skipped` results when integration
|
|
is not expected to run and only fail on `failure` or `cancelled` when
|
|
build-image succeeded and pushed an image.
|
|
|
|
### Phase 4: Docker Tagging Updates
|
|
|
|
- Update `Compute image tags` in
|
|
[ .github/workflows/ci-pipeline.yml ] to emit SHA-only and
|
|
branch+SHA tags in addition to the existing PR and branch tags.
|
|
- Update `Generate Docker metadata` in
|
|
[ .github/workflows/docker-build.yml ] to emit SHA-only, branch+SHA,
|
|
and explicit latest/dev/nightly tags based on `env.TRIGGER_REF`.
|
|
- Add tag normalization logic in both workflows to ensure valid Docker
|
|
tag prefixes (non-empty, valid leading character, <= 128 length,
|
|
fallback when sanitized branch is empty or invalid).
|
|
|
|
### Phase 5: Validation and Guardrails
|
|
|
|
- Verify CI logs show the golangci-lint version and config in use.
|
|
- Confirm integration jobs are no longer cancelled after successful
|
|
builds when new runs are queued.
|
|
- Validate that Docker Hub and GHCR tags include:
|
|
- SHA-only short tags
|
|
- Branch+SHA short tags
|
|
- latest/dev/nightly tags for main/development/nightly branches
|
|
|
|
## 5. Acceptance Criteria (EARS)
|
|
|
|
- WHEN a pull request or manual pipeline run executes, THE SYSTEM SHALL
|
|
run golangci-lint in the pipeline lint stage using
|
|
`backend/.golangci-fast.yml`.
|
|
- WHEN golangci-lint finds violations, THE SYSTEM SHALL fail the
|
|
pipeline lint stage and block downstream jobs.
|
|
- WHEN the manual quality workflow runs, THE SYSTEM SHALL enforce the
|
|
same blocking behavior and fast config as pre-commit.
|
|
- WHEN a build-image job completes successfully and image push is
|
|
enabled for a non-fork authorized run, THE SYSTEM SHALL allow
|
|
integration jobs to run to completion without being cancelled by
|
|
workflow-level concurrency.
|
|
- WHEN integration jobs are skipped by configuration while image push
|
|
is disabled or not authorized for the run, THE SYSTEM SHALL not mark
|
|
the integration gate as failed.
|
|
- WHEN a non-PR build runs on main/development/nightly branches and
|
|
image push is enabled for a non-fork authorized run, THE SYSTEM SHALL
|
|
publish `latest`, `dev`, or `nightly` tags respectively to Docker Hub
|
|
and GHCR.
|
|
- WHEN any image is built in CI and image push is enabled for a
|
|
non-fork authorized run, THE SYSTEM SHALL publish SHA-only and
|
|
branch+SHA tags in addition to existing PR or default tags.
|
|
|
|
## 6. Risks and Mitigations
|
|
|
|
- Risk: CI runtime increases due to added golangci-lint execution.
|
|
Mitigation: use the fast config and keep timeout tight (2m) with
|
|
caching enabled by the action.
|
|
- Risk: Config incompatibility with CI golangci-lint version.
|
|
Mitigation: pin the version and log it in CI; validate config format.
|
|
- Risk: Reduced cancellation leads to overlapping integration runs.
|
|
Mitigation: keep job-level concurrency on build-image; monitor queue
|
|
time and adjust if needed.
|
|
- Risk: Tag proliferation complicates image selection for users.
|
|
Mitigation: document tag matrix in release notes or README once
|
|
verified in CI.
|
|
- Risk: Sanitized branch names may collapse to empty or invalid tags.
|
|
Mitigation: enforce normalization rules with a safe fallback prefix
|
|
to keep tag generation deterministic.
|
|
|
|
## 7. Confidence Score
|
|
|
|
Confidence: 84 percent
|
|
|
|
Rationale: The linting changes are straightforward, but integration
|
|
job cancellation behavior depends on workflow-level concurrency and may
|
|
require validation in Actions history to select the most appropriate
|
|
fix. Tagging changes are predictable once metadata-action inputs are
|
|
aligned with branch detection.
|