- Refactor concurrency settings in `e2e-tests-split.yml` and `codecov-upload.yml` to remove SHA and run_id from group strings, allowing for proper cancellation of in-progress runs. - Ensure that new pushes to the same branch cancel any ongoing workflow runs, improving CI efficiency and reducing queue times.
293 lines
17 KiB
Markdown
293 lines
17 KiB
Markdown
---
|
|
post_title: "Current Spec: Fix Workflow Concurrency Groups to Enable Run Cancellation"
|
|
categories:
|
|
- planning
|
|
- ci-cd
|
|
- github-actions
|
|
tags:
|
|
- concurrency
|
|
- e2e-tests
|
|
- workflow-optimization
|
|
status: draft
|
|
created: 2026-02-26
|
|
---
|
|
|
|
# Fix Workflow Concurrency Groups to Enable Run Cancellation
|
|
|
|
## 1. Introduction
|
|
|
|
### Overview
|
|
|
|
GitHub Actions workflow runs are queueing for hours instead of canceling prior runs when new commits are pushed to the same branch. The user observed 9+ pages of stacked E2E workflow runs.
|
|
|
|
### Objective
|
|
|
|
Audit all 36 workflow files in `.github/workflows/`, identify misconfigured concurrency groups that prevent run cancellation, and define the fix for each affected workflow.
|
|
|
|
## 2. Root Cause Analysis
|
|
|
|
### How GitHub Actions Concurrency Works
|
|
|
|
GitHub Actions uses the `concurrency` block to control parallel execution:
|
|
|
|
```yaml
|
|
concurrency:
|
|
group: <string> # Runs sharing the same group string are subject to concurrency control
|
|
cancel-in-progress: true # If true, a new run cancels any in-progress run in the same group
|
|
```
|
|
|
|
**The critical rule**: Two runs will only cancel each other if they resolve to the **exact same** `group` string at runtime.
|
|
|
|
### The SHA-in-Group Anti-Pattern
|
|
|
|
The primary offender (`e2e-tests-split.yml`) uses:
|
|
|
|
```yaml
|
|
concurrency:
|
|
group: e2e-split-${{ github.workflow }}-${{ github.ref }}-${{ github.event.pull_request.head.sha || github.sha }}
|
|
cancel-in-progress: true
|
|
```
|
|
|
|
**Why this prevents cancellation:**
|
|
|
|
| Push # | Branch | SHA | Resolved Group String |
|
|
|--------|--------|-----|----------------------|
|
|
| 1 | `refs/heads/feat-x` | `abc1234` | `e2e-split-E2E Tests-refs/heads/feat-x-abc1234` |
|
|
| 2 | `refs/heads/feat-x` | `def5678` | `e2e-split-E2E Tests-refs/heads/feat-x-def5678` |
|
|
| 3 | `refs/heads/feat-x` | `ghi9012` | `e2e-split-E2E Tests-refs/heads/feat-x-ghi9012` |
|
|
|
|
Every push produces a different SHA, so every run gets a **unique** concurrency group. Since no two runs share a group, `cancel-in-progress: true` has no effect — all runs execute to completion, creating the observed hour-long queue.
|
|
|
|
### The `run_id`-in-Group Anti-Pattern
|
|
|
|
`codecov-upload.yml` uses:
|
|
|
|
```yaml
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.run_id }}
|
|
```
|
|
|
|
`github.run_id` is unique per workflow run by definition, so this has the same effect as the SHA anti-pattern — runs never cancel each other.
|
|
|
|
### The Correct Pattern
|
|
|
|
For workflows where you want a new push on the same branch to cancel the prior run:
|
|
|
|
```yaml
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.ref }}
|
|
cancel-in-progress: true
|
|
```
|
|
|
|
This produces the same group string for all runs of the same workflow on the same branch, enabling proper cancellation.
|
|
|
|
## 3. Full Audit Table
|
|
|
|
### Legend
|
|
|
|
| Symbol | Meaning |
|
|
|--------|---------|
|
|
| `BUG` | Has SHA/run_id in concurrency group — prevents cancellation |
|
|
| `OK` | Concurrency group is branch-scoped and works correctly |
|
|
| `NO-CANCEL` | `cancel-in-progress: false` — intentional (review needed) |
|
|
| `NONE` | No concurrency block at all |
|
|
| `N/A` | Workflow nature doesn't need cancellation (schedule-only, manual-only, etc.) |
|
|
|
|
### Workflow Audit
|
|
|
|
| # | Workflow File | Name | Triggers | Concurrency Group | cancel-in-progress | SHA/run_id Bug? | Verdict | Fix? |
|
|
|---|--------------|------|----------|-------------------|-------------------|-----------------|---------|------|
|
|
| 1 | `e2e-tests-split.yml` | E2E Tests | `workflow_call`, `workflow_dispatch`, `pull_request` | `e2e-split-${{ github.workflow }}-${{ github.ref }}-${{ github.event.pull_request.head.sha \|\| github.sha }}` | `true` | **YES — SHA** | **BUG** | **YES** |
|
|
| 2 | `codecov-upload.yml` | Upload Coverage to Codecov | `pull_request`, `push(main)`, `workflow_dispatch` | `${{ github.workflow }}-${{ github.ref_name }}-${{ github.run_id }}` | `true` | **YES — run_id** | **BUG** | **YES** |
|
|
| 3 | `codeql.yml` | CodeQL - Analyze | `pull_request`, `push(main)`, `workflow_dispatch`, `schedule` | `${{ github.workflow }}-${{ github.event_name }}-${{ github.head_ref \|\| github.ref_name }}` | `true` | No | OK | No |
|
|
| 4 | `quality-checks.yml` | Quality Checks | `pull_request`, `push(main)` | `${{ github.workflow }}-${{ github.ref }}` | `true` | No | OK | No |
|
|
| 5 | `docker-build.yml` | Docker Build, Publish & Test | `pull_request`, `push(main)`, `workflow_dispatch`, `workflow_run` | `${{ github.workflow }}-${{ github.event_name }}-${{ ... head_branch fallback }}` | `true` | No | OK | No |
|
|
| 6 | `benchmark.yml` | Go Benchmark | `pull_request`, `push(main)`, `workflow_dispatch` | `${{ github.workflow }}-${{ github.event_name }}-${{ ... head_branch \|\| github.ref }}` | `true` | No | OK | No |
|
|
| 7 | `cerberus-integration.yml` | Cerberus Integration | `workflow_dispatch`, `pull_request`, `push(main)` | `${{ github.workflow }}-${{ ... event_name }}-${{ ... head_branch \|\| github.ref }}` | `true` | No | OK | No |
|
|
| 8 | `crowdsec-integration.yml` | CrowdSec Integration | `workflow_dispatch`, `pull_request`, `push(main)` | `${{ github.workflow }}-${{ ... event_name }}-${{ ... head_branch \|\| github.ref }}` | `true` | No | OK | No |
|
|
| 9 | `waf-integration.yml` | WAF integration | `workflow_dispatch`, `pull_request`, `push(main)` | `${{ github.workflow }}-${{ ... event_name }}-${{ ... head_branch \|\| github.ref }}` | `true` | No | OK | No |
|
|
| 10 | `rate-limit-integration.yml` | Rate Limit integration | `workflow_dispatch`, `pull_request`, `push(main)` | `${{ github.workflow }}-${{ ... event_name }}-${{ ... head_branch \|\| github.ref }}` | `true` | No | OK | No |
|
|
| 11 | `supply-chain-pr.yml` | Supply Chain Verification (PR) | `workflow_dispatch`, `pull_request`, `push(main)` | `supply-chain-pr-${{ ... event_name }}-${{ ... head_branch \|\| github.ref }}` | `true` | No | OK | No |
|
|
| 12 | `security-pr.yml` | Security Scan (PR) | `workflow_run`, `workflow_dispatch`, `pull_request`, `push(main)` | `security-pr-${{ ... event_name }}-${{ ... head_branch \|\| github.ref }}` | `true` | No | OK | No |
|
|
| 13 | `docker-lint.yml` | Docker Lint | `workflow_dispatch` | `${{ github.workflow }}-${{ github.event_name }}-${{ github.head_ref \|\| github.ref_name }}` | `true` | No | OK | No |
|
|
| 14 | `repo-health.yml` | Repo Health Check | `schedule`, `workflow_dispatch` | `${{ github.workflow }}-${{ github.event_name }}-${{ github.head_ref \|\| github.ref_name }}` | `true` | No | OK | No |
|
|
| 15 | `auto-changelog.yml` | Auto Changelog | `workflow_run`, `release` | `${{ github.workflow }}-${{ github.event_name }}-${{ ... head_branch \|\| ... ref_name }}` | `true` | No | OK | No |
|
|
| 16 | `history-rewrite-tests.yml` | History Rewrite Tests | `workflow_run` | `${{ github.workflow }}-${{ github.event_name }}-${{ ... head_branch \|\| ... ref_name }}` | `true` | No | OK | No |
|
|
| 17 | `dry-run-history-rewrite.yml` | History Rewrite Dry-Run | `workflow_run`, `schedule`, `workflow_dispatch` | `${{ github.workflow }}-${{ github.event_name }}-${{ ... head_branch \|\| ... ref_name }}` | `true` | No | OK | No |
|
|
| 18 | `pr-checklist.yml` | PR Checklist Validation | `workflow_dispatch` | `${{ github.workflow }}-${{ inputs.pr_number \|\| ... }}` | `true` | No | OK | No |
|
|
| 19 | `auto-label-issues.yml` | Auto-label Issues | `issues` | `${{ github.workflow }}-${{ github.event.issue.number }}` | `true` | No | OK | No |
|
|
| 20 | `renovate_prune.yml` | Prune Renovate Branches | `workflow_dispatch`, `schedule` | `prune-renovate-branches` (job-level) | `true` | No | OK | No |
|
|
| 21 | `docs.yml` | Deploy Docs to Pages | `workflow_run`, `workflow_dispatch` | `pages-${{ github.event_name }}-${{ ... head_branch \|\| github.ref }}` | `false` | No | NO-CANCEL | No |
|
|
| 22 | `propagate-changes.yml` | Propagate Changes | `workflow_run` | `${{ github.workflow }}-${{ ... head_branch \|\| github.ref }}` | `false` | No | NO-CANCEL | No |
|
|
| 23 | `docs-to-issues.yml` | Convert Docs to Issues | `workflow_run`, `workflow_dispatch` | `${{ github.workflow }}-${{ ... head_branch \|\| github.ref }}` | `false` | No | NO-CANCEL | No |
|
|
| 24 | `auto-versioning.yml` | Auto Versioning and Release | `workflow_run(main)` | `${{ github.workflow }}-${{ ... head_branch \|\| github.ref }}` | `false` | No | NO-CANCEL | No |
|
|
| 25 | `release-goreleaser.yml` | Release (GoReleaser) | `push(tags: v*)` | `${{ github.workflow }}-${{ github.ref }}` | `false` | No | NO-CANCEL | No |
|
|
| 26 | `weekly-nightly-promotion.yml` | Weekly Nightly Promotion | `schedule`, `workflow_dispatch` | `${{ github.workflow }}` | `false` | No | NO-CANCEL | No |
|
|
| 27 | `caddy-major-monitor.yml` | Monitor Caddy Major | `schedule`, `workflow_dispatch` | `${{ github.workflow }}` | `false` | No | N/A | No |
|
|
| 28 | `renovate.yml` | Renovate | `schedule`, `workflow_dispatch` | `${{ github.workflow }}` | `false` | No | N/A | No |
|
|
| 29 | `create-labels.yml` | Create Project Labels | `workflow_dispatch` | `${{ github.workflow }}` | `false` | No | N/A | No |
|
|
| 30 | `auto-add-to-project.yml` | Auto-add to Project | `issues` | `${{ github.workflow }}-${{ ... issue.number }}` | `false` | No | N/A | No |
|
|
| 31 | `security-weekly-rebuild.yml` | Weekly Security Rebuild | `schedule`, `workflow_dispatch` | `${{ github.workflow }}-${{ github.ref }}` | `false` | No | NO-CANCEL | No |
|
|
| 32 | `nightly-build.yml` | Nightly Build & Package | `schedule`, `workflow_dispatch` | **None** | — | — | NONE | Optional |
|
|
| 33 | `supply-chain-verify.yml` | Supply Chain Verification | `workflow_dispatch`, `schedule`, `workflow_run`, `release` | **None** | — | — | NONE | Optional |
|
|
| 34 | `update-geolite2.yml` | Update GeoLite2 Checksum | `schedule`, `workflow_dispatch` | **None** | — | — | NONE | No |
|
|
| 35 | `gh_cache_cleanup.yml` | Cleanup GH caches | `workflow_dispatch` | **None** | — | — | NONE | No |
|
|
| 36 | `container-prune.yml` | Container Registry Prune | `pull_request`, `schedule`, `workflow_dispatch` | **None** | — | — | NONE | Optional |
|
|
|
|
## 4. Detailed Fix Plan
|
|
|
|
### 4.1 FIX: `e2e-tests-split.yml` — PRIMARY OFFENDER
|
|
|
|
**File:** `.github/workflows/e2e-tests-split.yml`, line 97-99
|
|
|
|
**Current (broken):**
|
|
```yaml
|
|
concurrency:
|
|
group: e2e-split-${{ github.workflow }}-${{ github.ref }}-${{ github.event.pull_request.head.sha || github.sha }}
|
|
cancel-in-progress: true
|
|
```
|
|
|
|
**Fixed:**
|
|
```yaml
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.ref }}
|
|
cancel-in-progress: true
|
|
```
|
|
|
|
**Rationale:**
|
|
- Remove `e2e-split-` prefix: redundant since `${{ github.workflow }}` already resolves to `"E2E Tests"`.
|
|
- Remove `${{ github.event.pull_request.head.sha || github.sha }}`: this is the root cause — makes every commit get its own group.
|
|
- `github.ref` ensures PRs use `refs/pull/N/merge` and branches use `refs/heads/branch-name`.
|
|
|
|
**Impact:** A new push to the same PR or branch will immediately cancel any in-progress E2E test run for that branch/PR.
|
|
|
|
### 4.2 FIX: `codecov-upload.yml` — SECONDARY OFFENDER
|
|
|
|
**File:** `.github/workflows/codecov-upload.yml`, line 21-23
|
|
|
|
**Current (broken):**
|
|
```yaml
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.ref_name }}-${{ github.run_id }}
|
|
cancel-in-progress: true
|
|
```
|
|
|
|
**Fixed:**
|
|
```yaml
|
|
concurrency:
|
|
group: ${{ github.workflow }}-${{ github.ref }}
|
|
cancel-in-progress: true
|
|
```
|
|
|
|
**Rationale:**
|
|
- Remove `${{ github.run_id }}`: unique per run, completely defeats concurrency cancellation.
|
|
- Switch `github.ref_name` to `github.ref` for consistency with other workflows and to avoid name collisions between branches and tags with the same name.
|
|
|
|
**Impact:** A new push to the same branch will cancel any in-progress Codecov upload for that branch.
|
|
|
|
## 5. Workflows Without Concurrency Blocks (Review)
|
|
|
|
| Workflow | Risk | Recommendation |
|
|
|----------|------|----------------|
|
|
| `nightly-build.yml` | Low — schedule/dispatch only | **Optional**: Add `group: ${{ github.workflow }}` with `cancel-in-progress: false` |
|
|
| `supply-chain-verify.yml` | Low — schedule/dispatch/workflow_run | **Optional**: Add `group: ${{ github.workflow }}-${{ github.ref }}` with `cancel-in-progress: true` |
|
|
| `update-geolite2.yml` | Negligible — weekly schedule | No action needed |
|
|
| `gh_cache_cleanup.yml` | Negligible — manual only | No action needed |
|
|
| `container-prune.yml` | Low — PR + weekly schedule | **Optional**: Add concurrency for PR trigger runs |
|
|
|
|
## 6. Workflow Call Interaction Analysis
|
|
|
|
`e2e-tests-split.yml` defines `workflow_call` inputs, meaning it can be invoked by other workflows as a reusable workflow. However:
|
|
|
|
- **No workflow in the repository currently calls it via `uses:`**.
|
|
- References found in `nightly-build.yml` (line 104) and `weekly-nightly-promotion.yml` (lines 83, 443) are JavaScript code within `actions/github-script` steps that *monitor* workflow run status — they do not invoke `e2e-tests-split.yml` as a reusable workflow.
|
|
- The `pull_request` trigger on `e2e-tests-split.yml` is the main trigger that causes the queueing problem.
|
|
|
|
**Important note about `workflow_call` concurrency**: When a workflow is called via `workflow_call`, the concurrency block in the **called** workflow is evaluated in the caller's context. The simplified group (`${{ github.workflow }}-${{ github.ref }}`) works correctly in both direct-trigger and `workflow_call` contexts.
|
|
|
|
## 7. Risk Assessment
|
|
|
|
### Workflows Where We Should NOT Change Concurrency
|
|
|
|
| Workflow | Reason |
|
|
|----------|--------|
|
|
| `release-goreleaser.yml` | Releases must complete — canceling mid-publish could leave artifacts broken |
|
|
| `auto-versioning.yml` | Version bumps must complete atomically |
|
|
| `propagate-changes.yml` | Branch synchronization must complete |
|
|
| `docs.yml` (Pages deploy) | GitHub Pages deployment should not be interrupted |
|
|
| `weekly-nightly-promotion.yml` | Promotion PR creation must finish cleanly |
|
|
| `security-weekly-rebuild.yml` | Security rebuild must complete for compliance |
|
|
| `docs-to-issues.yml` | Issue creation should not be interrupted |
|
|
| `create-labels.yml` | Manual-only, singleton |
|
|
| `renovate.yml` | Dependency updates should complete |
|
|
| `caddy-major-monitor.yml` | Monitoring check must complete |
|
|
| `auto-add-to-project.yml` | Issue/PR project assignment must complete |
|
|
|
|
**All of these are correctly configured. Do not modify them.**
|
|
|
|
### Risks of the Proposed Fix
|
|
|
|
| Risk | Severity | Mitigation |
|
|
|------|----------|-----------|
|
|
| In-flight E2E results discarded on cancel | Low | Desired behavior — stale results for an old commit are useless |
|
|
| Codecov partial upload on cancel | Low | Codecov handles partial uploads gracefully; next full run uploads complete data |
|
|
| `workflow_call` context mismatch if caller added later | None | Fix uses standard pattern that works in both direct and called contexts |
|
|
|
|
## 8. Acceptance Criteria
|
|
|
|
- [ ] `e2e-tests-split.yml` concurrency group does not contain SHA or run_id
|
|
- [ ] `codecov-upload.yml` concurrency group does not contain SHA or run_id
|
|
- [ ] Pushing a new commit to a PR cancels any in-progress E2E test run on that PR
|
|
- [ ] Pushing a new commit to a PR cancels any in-progress Codecov upload on that PR
|
|
- [ ] All other 34 workflows remain unchanged
|
|
- [ ] No workflows with `cancel-in-progress: false` are modified
|
|
|
|
## 9. Implementation Plan
|
|
|
|
### Phase 1: Fix (Single PR)
|
|
|
|
| Task | File | Line(s) | Change |
|
|
|------|------|---------|--------|
|
|
| 1 | `.github/workflows/e2e-tests-split.yml` | 97-99 | Replace concurrency group: remove SHA, simplify to `${{ github.workflow }}-${{ github.ref }}` |
|
|
| 2 | `.github/workflows/codecov-upload.yml` | 21-23 | Replace concurrency group: remove `run_id`, simplify to `${{ github.workflow }}-${{ github.ref }}` |
|
|
|
|
### Phase 2: Validate
|
|
|
|
1. Push to a test branch, wait for workflows to start
|
|
2. Push again to the same branch within 60 seconds
|
|
3. Verify the first E2E run is labeled "cancelled" in the Actions tab
|
|
4. Verify first Codecov run is labeled "cancelled"
|
|
5. Verify all other workflows are unaffected
|
|
|
|
## 10. PR Slicing Strategy
|
|
|
|
**Decision: Single PR**
|
|
|
|
**Rationale:**
|
|
- Config-only change: 2 YAML files, ~4 lines changed total
|
|
- No code changes, no build changes, no runtime impact
|
|
- Changes are atomic and self-contained
|
|
- Rollback is a single revert commit
|
|
- Risk is minimal — worst case restores the existing (broken) behavior
|
|
|
|
**PR Scope:**
|
|
|
|
| ID | Scope | Files | Dependencies | Validation Gate |
|
|
|----|-------|-------|--------------|----------------|
|
|
| PR-1 | Fix concurrency groups | `e2e-tests-split.yml`, `codecov-upload.yml` | None | Push 2 commits in quick succession; confirm first run is canceled |
|
|
|
|
**Rollback:** `git revert <commit-sha>` — restores prior concurrency groups immediately.
|
|
|
|
## 11. Summary
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Total workflows audited | 36 |
|
|
| Workflows with concurrency blocks | 31 |
|
|
| Workflows without concurrency blocks | 5 |
|
|
| **Workflows with SHA/run_id bug** | **2** |
|
|
| Workflows with intentional no-cancel | 11 |
|
|
| Workflows correctly configured | 18 |
|
|
| Files to change | 2 |
|
|
| Lines to change | ~4 |
|