357 lines
13 KiB
Markdown
357 lines
13 KiB
Markdown
## 1. Introduction
|
|
|
|
### Overview
|
|
|
|
`Nightly Build & Package` currently has two active workflow failures that must
|
|
be fixed together in one minimal-scope PR:
|
|
|
|
1. SBOM generation failure in `Generate SBOM` (Syft fetch/version resolution).
|
|
2. Dispatch failure from nightly workflow with `Missing required input
|
|
'pr_number' not provided`.
|
|
|
|
This plan hard-locks runtime code changes to
|
|
`.github/workflows/nightly-build.yml` only.
|
|
|
|
### Objectives
|
|
|
|
1. Restore deterministic nightly SBOM generation.
|
|
2. Enforce strict default-deny dispatch behavior for non-PR nightly events
|
|
(`schedule`, `workflow_dispatch`).
|
|
3. Preserve GitHub Actions best practices: pinned SHAs, least privilege, and
|
|
deterministic behavior.
|
|
4. Keep both current failures in a single scope and do not pivot to unrelated fixes.
|
|
5. Remove `security-pr.yml` from nightly dispatch list unless a hard
|
|
requirement is proven.
|
|
|
|
## 2. Research Findings
|
|
|
|
### 2.1 Primary Workflow Scope
|
|
|
|
File analyzed: `.github/workflows/nightly-build.yml`
|
|
|
|
Relevant areas:
|
|
|
|
1. Job `build-and-push-nightly`, step `Generate SBOM` uses
|
|
`anchore/sbom-action@17ae1740179002c89186b61233e0f892c3118b11`.
|
|
2. Job `trigger-nightly-validation` dispatches downstream workflows using
|
|
`actions/github-script` and currently includes `security-pr.yml`.
|
|
|
|
### 2.2 Root Cause: Missing `pr_number`
|
|
|
|
Directly related called workflow:
|
|
|
|
1. `.github/workflows/security-pr.yml`
|
|
2. Trigger contract includes:
|
|
- `workflow_dispatch.inputs.pr_number.required: true`
|
|
|
|
Impact:
|
|
|
|
1. Nightly dispatcher invokes `createWorkflowDispatch` for `security-pr.yml`
|
|
without `pr_number`.
|
|
2. For nightly non-PR contexts (scheduled/manual nightly), there is no natural
|
|
PR number, so dispatch fails by contract.
|
|
3. PR lookup by nightly head SHA is not a valid safety mechanism for nightly
|
|
non-PR trigger types and must not be relied on for `schedule` or
|
|
`workflow_dispatch`.
|
|
|
|
### 2.3 Decision: Remove PR-Only Workflow from Nightly Dispatch List
|
|
|
|
Assessment result:
|
|
|
|
1. No hard requirement was found that requires nightly workflow to dispatch
|
|
`security-pr.yml`.
|
|
2. `security-pr.yml` is contractually PR/manual-oriented because it requires
|
|
`pr_number`.
|
|
3. Keeping it in nightly fan-out adds avoidable failure risk and encourages
|
|
invalid context synthesis.
|
|
|
|
Decision:
|
|
|
|
1. Remove `security-pr.yml` from nightly dispatch list.
|
|
2. Keep strict default-deny guard logic to prevent accidental future dispatch
|
|
from non-PR events.
|
|
|
|
Risk reduction from removal:
|
|
|
|
1. Eliminates `pr_number` contract mismatch in nightly non-PR events.
|
|
2. Removes a class of false failures from nightly reliability metrics.
|
|
3. Simplifies dispatcher logic and review surface.
|
|
|
|
### 2.4 Root Cause: SBOM/Syft Fetch Failure
|
|
|
|
Observed behavior indicates Syft retrieval/version resolution instability during
|
|
the SBOM step. In current workflow, no explicit `syft-version` is set in
|
|
`nightly-build.yml`, so resolution is not explicitly pinned at the workflow
|
|
layer.
|
|
|
|
### 2.5 Constraints and Policy Alignment
|
|
|
|
1. Keep action SHAs pinned.
|
|
2. Keep permission scopes unchanged unless required.
|
|
3. Keep change minimal and limited to nightly workflow path only.
|
|
|
|
## 3. Technical Specification (EARS)
|
|
|
|
1. WHEN nightly runs from `schedule` or `workflow_dispatch`, THE SYSTEM SHALL
|
|
enforce strict default-deny for PR-only dispatches.
|
|
|
|
2. WHEN nightly runs from `schedule` or `workflow_dispatch`, THE SYSTEM SHALL
|
|
NOT perform PR-number lookup from nightly head SHA.
|
|
|
|
3. WHEN evaluating downstream nightly dispatches, THE SYSTEM SHALL exclude
|
|
`security-pr.yml` from nightly dispatch targets unless a hard requirement
|
|
is explicitly introduced and documented.
|
|
|
|
4. IF `security-pr.yml` is reintroduced in the future, THEN THE SYSTEM SHALL
|
|
dispatch it ONLY when a real PR context includes a concrete `pr_number`,
|
|
and SHALL deny by default in all other contexts.
|
|
|
|
5. WHEN `Generate SBOM` runs in nightly, THE SYSTEM SHALL use a deterministic
|
|
two-stage strategy in the same PR scope:
|
|
- Primary path: `syft-version: v1.42.1` via `anchore/sbom-action`
|
|
- In-PR fallback path: explicit Syft CLI installation/generation
|
|
with pinned version/checksum and hard verification
|
|
|
|
6. IF primary SBOM generation fails or does not produce a valid file, THEN THE
|
|
SYSTEM SHALL execute fallback generation and SHALL fail the job when fallback
|
|
also fails or output validation fails.
|
|
|
|
7. THE SYSTEM SHALL keep GitHub Actions pinned to immutable SHAs and SHALL NOT
|
|
broaden token permissions for this fix.
|
|
|
|
## 4. Exact Implementation Edits
|
|
|
|
### 4.1 `.github/workflows/nightly-build.yml`
|
|
|
|
### Edit A: Harden downstream dispatch for non-PR triggers
|
|
|
|
Location: job `trigger-nightly-validation`, step
|
|
`Dispatch Missing Nightly Validation Workflows`.
|
|
|
|
Exact change intent:
|
|
|
|
1. Remove `security-pr.yml` from the nightly dispatch list.
|
|
2. Keep dispatch for `e2e-tests-split.yml`, `codecov-upload.yml`,
|
|
`supply-chain-verify.yml`, and `codeql.yml` unchanged.
|
|
3. Add explicit guard comments and logging stating non-PR nightly events are
|
|
default-deny for PR-only workflows.
|
|
4. Explicitly prohibit PR number synthesis and prohibit PR lookup from nightly
|
|
SHA for `schedule` and `workflow_dispatch`.
|
|
|
|
Implementation shape (script-level):
|
|
|
|
1. Keep workflow list explicit.
|
|
2. Keep a local denylist/set for PR-only workflows and ensure they are never
|
|
dispatched from nightly non-PR events.
|
|
3. No PR-number inputs are synthesized from nightly SHA or non-PR context.
|
|
4. No PR lookup calls are executed for nightly non-PR events.
|
|
|
|
### Edit B: Stabilize Syft source in `Generate SBOM`
|
|
|
|
Location: job `build-and-push-nightly`, step `Generate SBOM`.
|
|
|
|
Exact change intent:
|
|
|
|
1. Keep existing pinned `anchore/sbom-action` SHA unless evidence shows that SHA
|
|
itself is the failure source.
|
|
2. Add explicit `syft-version: v1.42.1` in `with:` block as the primary pin.
|
|
3. Set the primary SBOM step to `continue-on-error: true` to allow deterministic
|
|
in-PR fallback execution.
|
|
4. Add fallback step gated on primary step failure OR missing/invalid output:
|
|
- Install Syft CLI `v1.42.1` from official release with checksum validation.
|
|
- Generate `sbom-nightly.json` via CLI.
|
|
5. Add mandatory verification step (no `continue-on-error`) with explicit
|
|
pass/fail criteria:
|
|
- `sbom-nightly.json` exists.
|
|
- file size is greater than 0 bytes.
|
|
- JSON parses successfully (`jq empty`).
|
|
- expected top-level fields exist for selected format.
|
|
6. If verification fails, job fails. SBOM cannot pass silently without
|
|
generated artifact.
|
|
|
|
### 4.2 Scope Lock
|
|
|
|
1. No edits to `.github/workflows/security-pr.yml` in this plan.
|
|
2. Contract remains unchanged: `workflow_dispatch.inputs.pr_number.required: true`.
|
|
|
|
## 5. Reconfirmation: Non-Target Files
|
|
|
|
No changes required:
|
|
|
|
1. `.gitignore`
|
|
2. `codecov.yml`
|
|
3. `.dockerignore`
|
|
4. `Dockerfile`
|
|
|
|
Rationale:
|
|
|
|
1. Both failures are workflow orchestration issues, not source-ignore, coverage
|
|
policy, Docker context, or image build recipe issues.
|
|
|
|
## 6. Risks and Mitigations
|
|
|
|
| Risk | Impact | Mitigation |
|
|
|---|---|---|
|
|
| `security-pr.yml` accidentally dispatched in non-PR mode | Low | Remove from nightly dispatch list and enforce default-deny comments/guards |
|
|
| Primary Syft acquisition fails (`v1.42.1`) | Medium | Execute deterministic in-PR fallback with pinned checksum and hard output verification |
|
|
| SBOM step appears green without real artifact | High | Mandatory verification step with explicit file/JSON checks and hard fail |
|
|
| Action SHA update introduces side effects | Medium | Limit SHA change to `Generate SBOM` step only and validate end-to-end nightly path |
|
|
| Over-dispatch/under-dispatch in validation job | Low | Preserve existing dispatch logic for all non-PR-dependent workflows |
|
|
|
|
## 7. Rollback Plan
|
|
|
|
1. Revert runtime behavior changes in
|
|
`.github/workflows/nightly-build.yml`:
|
|
- `trigger-nightly-validation` dispatch logic
|
|
- `Generate SBOM` primary + fallback + verification sequence
|
|
2. Re-run nightly dispatch manually to verify previous baseline runtime
|
|
behavior.
|
|
|
|
Rollback scope: runtime workflow behavior only in
|
|
`.github/workflows/nightly-build.yml`. Documentation updates are not part of
|
|
runtime rollback.
|
|
|
|
## 8. Validation Plan
|
|
|
|
### 8.1 Static Validation
|
|
|
|
```bash
|
|
cd /projects/Charon
|
|
pre-commit run actionlint --files .github/workflows/nightly-build.yml
|
|
```
|
|
|
|
### 8.2 Behavioral Validation (Nightly non-PR)
|
|
|
|
```bash
|
|
gh workflow run nightly-build.yml --ref nightly -f reason="nightly dual-fix validation" -f skip_tests=true
|
|
gh run list --workflow "Nightly Build & Package" --branch nightly --limit 1
|
|
gh run view <run-id> --json databaseId,headSha,event,status,conclusion,createdAt
|
|
gh run view <run-id> --log
|
|
```
|
|
|
|
Expected outcomes:
|
|
|
|
1. `Generate SBOM` succeeds through primary path or deterministic fallback and
|
|
`sbom-nightly.json` is uploaded.
|
|
2. Dispatch step does not attempt `security-pr.yml` from nightly run.
|
|
3. No `Missing required input 'pr_number' not provided` error.
|
|
4. Both targeted nightly failures are resolved in the same run scope:
|
|
`pr_number` dispatch failure and Syft/SBOM failure.
|
|
|
|
### 8.3 Explicit Negative Dispatch Verification (Run-Scoped/Time-Scoped)
|
|
|
|
Verify `security-pr.yml` was not dispatched by this specific nightly run using
|
|
time scope and actor scope (not SHA-only):
|
|
|
|
```bash
|
|
RUN_JSON=$(gh run view <nightly-run-id> --json databaseId,createdAt,updatedAt,event,headBranch)
|
|
START=$(echo "$RUN_JSON" | jq -r '.createdAt')
|
|
END=$(echo "$RUN_JSON" | jq -r '.updatedAt')
|
|
|
|
gh api repos/<owner>/<repo>/actions/workflows/security-pr.yml/runs \
|
|
--paginate \
|
|
-f event=workflow_dispatch | \
|
|
jq --arg start "$START" --arg end "$END" '
|
|
[ .workflow_runs[]
|
|
| select(.created_at >= $start and .created_at <= $end)
|
|
| select(.head_branch == "nightly")
|
|
| select(.triggering_actor.login == "github-actions[bot]")
|
|
] | length'
|
|
```
|
|
|
|
Expected result: `0`
|
|
|
|
### 8.4 Positive Validation: Manual `security-pr.yml` Dispatch Still Works
|
|
|
|
Run a manual dispatch with a valid PR number and verify successful start:
|
|
|
|
```bash
|
|
gh workflow run security-pr.yml --ref <pr-branch> -f pr_number=<valid-pr-number>
|
|
gh run list --workflow "Security Scan (PR)" --limit 5 \
|
|
--json databaseId,event,status,conclusion,createdAt,headBranch
|
|
gh run view <security-pr-run-id> --log
|
|
```
|
|
|
|
Expected results:
|
|
|
|
1. Workflow is accepted (no missing-input validation errors).
|
|
2. Run event is `workflow_dispatch`.
|
|
3. Run completes according to existing workflow behavior.
|
|
|
|
### 8.5 Contract Validation (No Contract Change)
|
|
|
|
1. `security-pr.yml` contract remains PR/manual specific and unchanged.
|
|
2. Nightly non-PR paths do not consume or synthesize `pr_number`.
|
|
|
|
## 9. Acceptance Criteria
|
|
|
|
1. `Nightly Build & Package` no longer fails in `Generate SBOM` due to Syft
|
|
fetch/version resolution, with deterministic in-PR fallback.
|
|
2. Nightly validation dispatch no longer fails with missing required
|
|
`pr_number`.
|
|
3. For non-PR nightly triggers (`schedule`/`workflow_dispatch`), PR-only
|
|
dispatch of `security-pr.yml` is default-deny and not attempted from nightly
|
|
dispatch targets.
|
|
4. Workflow remains SHA-pinned and permissions are not broadened.
|
|
5. Validation evidence includes explicit run-scoped/time-scoped proof that
|
|
`security-pr.yml` was not dispatched by the tested nightly run.
|
|
6. No changes made to `.gitignore`, `codecov.yml`, `.dockerignore`, or
|
|
`Dockerfile`.
|
|
7. Manual dispatch of `security-pr.yml` with valid `pr_number` is validated to
|
|
still work.
|
|
8. SBOM step fails hard when neither primary nor fallback path produces a valid
|
|
SBOM artifact.
|
|
|
|
## 10. PR Slicing Strategy
|
|
|
|
### Decision
|
|
|
|
Single PR.
|
|
|
|
### Trigger Reasons
|
|
|
|
1. Changes are tightly coupled inside one workflow path.
|
|
2. Shared validation path (nightly run) verifies both fixes together.
|
|
3. Rollback safety is high with one-file revert.
|
|
|
|
### Ordered Slices
|
|
|
|
#### PR-1: Nightly Dual-Failure Workflow Fix
|
|
|
|
Scope:
|
|
|
|
1. `.github/workflows/nightly-build.yml` only.
|
|
2. SBOM Syft stabilization with explicit tag pin + fallback rule.
|
|
3. Remove `security-pr.yml` from nightly dispatch list and enforce strict
|
|
default-deny semantics for non-PR nightly events.
|
|
|
|
Files:
|
|
|
|
1. `.github/workflows/nightly-build.yml`
|
|
2. `docs/plans/current_spec.md`
|
|
|
|
Dependencies:
|
|
|
|
1. `security-pr.yml` keeps required `workflow_dispatch` `pr_number` contract.
|
|
|
|
Validation gates:
|
|
|
|
1. `actionlint` passes.
|
|
2. Nightly manual dispatch run passes both targeted failure points.
|
|
3. SBOM artifact upload succeeds through primary path or fallback path.
|
|
4. Explicit run-scoped/time-scoped negative check confirms zero
|
|
bot-triggered `security-pr.yml` dispatches during the nightly run window.
|
|
5. Positive manual dispatch check with valid `pr_number` succeeds.
|
|
|
|
Rollback and contingency:
|
|
|
|
1. Revert PR-1.
|
|
2. If both primary and fallback Syft paths fail, treat as blocking regression
|
|
and do not merge until generation criteria pass.
|
|
|
|
## 11. Complexity Estimate
|
|
|
|
1. Implementation complexity: Low.
|
|
2. Validation complexity: Medium (requires workflow run completion).
|
|
3. Blast radius: Low (single workflow file, no runtime code changes).
|