# CI Docker Build Failure Analysis & Fix Plan **Issue**: Docker Build workflow failing on PR builds during image artifact save **Workflow**: `.github/workflows/docker-build.yml` **Error**: `Error response from daemon: reference does not exist` **Date**: 2026-01-12 **Status**: Analysis Complete - Ready for Implementation --- ## Executive Summary The `docker-build.yml` workflow is failing at the **"Save Docker Image as Artifact" step (lines 135-142)** for PR builds. The root cause is a **mismatch between the image name/tag format used by `docker/build-push-action` with `load: true` and the image reference manually constructed in the `docker save` command**. **Impact**: All PR builds fail at the artifact save step, preventing the `verify-supply-chain-pr` job from running. **Fix Complexity**: **Low** - Single step modification to use the exact tag from metadata output instead of manually constructing it. --- ## Root Cause Analysis ### 1. The Failing Step (Lines 135-142) **Location**: `.github/workflows/docker-build.yml`, lines 135-142 ```yaml - name: Save Docker Image as Artifact if: github.event_name == 'pull_request' run: | IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]') docker save ghcr.io/${IMAGE_NAME}:pr-${{ github.event.pull_request.number }} -o /tmp/charon-pr-image.tar ls -lh /tmp/charon-pr-image.tar ``` **What Happens**: - **Line 140**: Normalizes repository owner name to lowercase (e.g., `Wikid82` → `wikid82`) - **Line 141**: **Constructs the image reference manually**: `ghcr.io/${IMAGE_NAME}:pr-${PR_NUMBER}` - **Line 141**: **Attempts to save the image** using this manually constructed reference **The Problem**: The manually constructed image reference **assumes** the Docker image was loaded with the exact format `ghcr.io/wikid82/charon:pr-123`, but when `docker/build-push-action` uses `load: true`, the actual tag format applied to the local image may differ. ### 2. The Build Step (Lines 111-123) **Location**: `.github/workflows/docker-build.yml`, lines 111-123 ```yaml - name: Build and push Docker image if: steps.skip.outputs.skip_build != 'true' id: build-and-push uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6 with: context: . platforms: ${{ github.event_name == 'pull_request' && 'linux/amd64' || 'linux/amd64,linux/arm64' }} push: ${{ github.event_name != 'pull_request' }} load: ${{ github.event_name == 'pull_request' }} tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} no-cache: true pull: true build-args: | VERSION=${{ steps.meta.outputs.version }} BUILD_DATE=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }} VCS_REF=${{ github.sha }} CADDY_IMAGE=${{ steps.caddy.outputs.image }} ``` **Key Parameters for PR Builds**: - **Line 117**: `push: false` → Image is **not pushed** to the registry - **Line 118**: `load: true` → Image is **loaded into the local Docker daemon** - **Line 119**: `tags: ${{ steps.meta.outputs.tags }}` → Uses tags generated by the metadata action **Behavior with `load: true`**: - The image is built and loaded into the local Docker daemon - Tags from `steps.meta.outputs.tags` are applied to the image - For PR builds, this generates **one tag**: `ghcr.io/wikid82/charon:pr-123` ### 3. The Metadata Step (Lines 105-113) **Location**: `.github/workflows/docker-build.yml`, lines 105-113 ```yaml - name: Extract metadata (tags, labels) if: steps.skip.outputs.skip_build != 'true' id: meta uses: docker/metadata-action@c299e40c65443455700f0fdfc63efafe5b349051 # v5.10.0 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=raw,value=latest,enable={{is_default_branch}} type=raw,value=dev,enable=${{ github.ref == 'refs/heads/development' }} type=raw,value=beta,enable=${{ github.ref == 'refs/heads/feature/beta-release' }} type=raw,value=pr-${{ github.event.pull_request.number }},enable=${{ github.event_name == 'pull_request' }} type=sha,format=short,enable=${{ github.event_name != 'pull_request' }} ``` **For PR builds**, only **line 111** is enabled: ```yaml type=raw,value=pr-${{ github.event.pull_request.number }},enable=${{ github.event_name == 'pull_request' }} ``` **This generates a single tag**: `ghcr.io/wikid82/charon:pr-123` **Note**: The `IMAGE_NAME` is already normalized to lowercase at lines 56-57: ```yaml - name: Normalize image name run: | IMAGE_NAME=$(echo "${{ env.IMAGE_NAME }}" | tr '[:upper:]' '[:lower:]') echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV ``` So the metadata action receives `ghcr.io/wikid82/charon` (lowercase) as input. ### 4. The Critical Issue: Tag Mismatch When `docker/build-push-action` uses `load: true`, the behavior is: 1. ✅ **Expected**: Image is loaded with tags from `steps.meta.outputs.tags` → `ghcr.io/wikid82/charon:pr-123` 2. ❌ **Reality**: The exact tag format depends on Docker Buildx's internal behavior The `docker save` command at line 141 tries to save: ```bash ghcr.io/${IMAGE_NAME}:pr-${{ github.event.pull_request.number }} ``` But this **manually reconstructs** the tag instead of using the **actual tag applied by docker/build-push-action**. **Why This Fails**: - The `docker save` command requires an **exact match** of the image reference as it exists in the local Docker daemon - If the image is loaded with a slightly different tag format, `docker save` throws: ``` Error response from daemon: reference does not exist ``` **Evidence from Error Log**: ``` Run IMAGE_NAME=$(echo "Wikid82/charon" | tr '[:upper:]' '[:lower:]') Error response from daemon: reference does not exist Error: Process completed with exit code 1. ``` This confirms the `docker save` command cannot find the image reference constructed at line 141. ### 5. Job Dependencies Analysis **Complete Workflow Structure**: ``` build-and-push (lines 34-234) ├── Outputs: skip_build, digest ├── Steps: │ ├── Build image (load=true for PRs) │ ├── Save image artifact (❌ FAILS HERE at line 141) │ └── Upload artifact (never reached) │ test-image (lines 354-463) ├── needs: build-and-push ├── if: ... && github.event_name != 'pull_request' └── (Not relevant for PRs) │ trivy-pr-app-only (lines 465-493) ├── if: github.event_name == 'pull_request' └── (Independent - builds its own image) │ verify-supply-chain-pr (lines 495-722) ├── needs: build-and-push ├── if: github.event_name == 'pull_request' && needs.build-and-push.result == 'success' ├── Steps: │ ├── ❌ Download artifact (artifact doesn't exist) │ ├── ❌ Load image (cannot load non-existent artifact) │ └── ❌ Scan image (cannot scan non-loaded image) └── Currently skipped due to build-and-push failure │ verify-supply-chain-pr-skipped (lines 724-754) ├── needs: build-and-push └── if: github.event_name == 'pull_request' && needs.build-and-push.outputs.skip_build == 'true' ``` **Dependency Chain Impact**: 1. ❌ `build-and-push` **fails** at line 141 (`docker save`) 2. ❌ Artifact is **never uploaded** (lines 144-150 are skipped) 3. ❌ `verify-supply-chain-pr` **cannot download** artifact (line 517 fails) 4. ❌ **Supply chain verification never runs** for PRs ### 6. Verification: Why Similar Patterns Work **Line 376** (in `test-image` job): ```yaml - name: Normalize image name run: | raw="${{ github.repository_owner }}/${{ github.event.repository.name }}" IMAGE_NAME=$(echo "$raw" | tr '[:upper:]' '[:lower:]') echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV ``` This job **pulls from the registry** (line 395): ```yaml - name: Pull Docker image run: docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }} ``` ✅ **Works because**: It pulls a pushed image from the registry, not a locally loaded one. **Line 516** (in `verify-supply-chain-pr` job): ```yaml - name: Normalize image name run: | IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]') echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV ``` ✅ **Would work if**: The artifact existed. This job loads the image from the tar file, which preserves the exact tags. **Key Difference**: The failing step tries to save an image **before we know its exact tag**, while the working patterns either: - Pull from registry with a known tag - Load from artifact with preserved tags --- ## The Solution ### Option 1: Use Metadata Output Tag (RECOMMENDED ✅) **Strategy**: Extract the exact tag from `steps.meta.outputs.tags` and use it directly in `docker save`. **Why This Works**: - The `docker/metadata-action` generates the tags that `docker/build-push-action` **actually applies** to the image - For PR builds, this is: `ghcr.io//charon:pr-` (normalized, lowercase) - This is the **exact tag** that exists in the local Docker daemon after `load: true` **Rationale**: - Avoids manual tag reconstruction - Uses the authoritative source of truth for image tags - Eliminates assumption-based errors **Risk Level**: **Low** - Read-only operation on existing step outputs ### Option 2: Inspect Local Images (ALTERNATIVE) **Strategy**: Use `docker images` to discover the actual tag before saving. **Why Not Recommended**: - Adds complexity - Requires pattern matching or parsing - Less reliable than using metadata output ### Option 3: Override Tag for PRs (FALLBACK) **Strategy**: Modify the build step to apply a deterministic local tag for PR builds. **Why Not Recommended**: - Requires more changes (build step + save step) - Breaks consistency with existing tag patterns - Downstream jobs expect registry-style tags --- ## Recommended Fix: Option 1 ### Implementation **File**: `.github/workflows/docker-build.yml` **Location**: Lines 135-142 (Save Docker Image as Artifact step) #### Before (Current - BROKEN) ```yaml - name: Save Docker Image as Artifact if: github.event_name == 'pull_request' run: | IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]') docker save ghcr.io/${IMAGE_NAME}:pr-${{ github.event.pull_request.number }} -o /tmp/charon-pr-image.tar ls -lh /tmp/charon-pr-image.tar ``` **Issue**: Manually constructs the image reference, which may not match the actual tag applied by `docker/build-push-action`. #### After (FIXED - Concise Version) ```yaml - name: Save Docker Image as Artifact if: github.event_name == 'pull_request' run: | # Extract the first tag from metadata action (PR tag) IMAGE_TAG=$(echo "${{ steps.meta.outputs.tags }}" | head -n 1) echo "🔍 Detected image tag: ${IMAGE_TAG}" # Verify the image exists locally echo "📋 Available local images:" docker images --filter "reference=*charon*" # Save the image using the exact tag from metadata echo "💾 Saving image: ${IMAGE_TAG}" docker save "${IMAGE_TAG}" -o /tmp/charon-pr-image.tar # Verify the artifact was created echo "✅ Artifact created:" ls -lh /tmp/charon-pr-image.tar ``` #### After (FIXED - Defensive Version for Production) ```yaml - name: Save Docker Image as Artifact if: github.event_name == 'pull_request' run: | # Extract the first tag from metadata action (PR tag) IMAGE_TAG=$(echo "${{ steps.meta.outputs.tags }}" | head -n 1) if [[ -z "${IMAGE_TAG}" ]]; then echo "❌ ERROR: No image tag found in metadata output" echo "Metadata tags output:" echo "${{ steps.meta.outputs.tags }}" exit 1 fi echo "🔍 Detected image tag: ${IMAGE_TAG}" # Verify the image exists locally if ! docker image inspect "${IMAGE_TAG}" >/dev/null 2>&1; then echo "❌ ERROR: Image ${IMAGE_TAG} not found locally" echo "📋 Available images:" docker images exit 1 fi # Save the image using the exact tag from metadata echo "💾 Saving image: ${IMAGE_TAG}" docker save "${IMAGE_TAG}" -o /tmp/charon-pr-image.tar # Verify the artifact was created echo "✅ Artifact created:" ls -lh /tmp/charon-pr-image.tar ``` **Key Changes**: 1. **Extract exact tag**: `IMAGE_TAG=$(echo "${{ steps.meta.outputs.tags }}" | head -n 1)` - Uses the first (and only) tag from metadata output - For PR builds: `ghcr.io/wikid82/charon:pr-123` 2. **Add debugging**: `docker images --filter "reference=*charon*"` - Shows available images for troubleshooting - Helps diagnose tag mismatches in logs 3. **Use extracted tag**: `docker save "${IMAGE_TAG}" -o /tmp/charon-pr-image.tar` - No manual reconstruction - Guaranteed to match the actual image tag 4. **Defensive checks** (production version only): - Verify `IMAGE_TAG` is not empty - Verify image exists before attempting save - Fail fast with clear error messages **Why This Works**: - ✅ The `docker/metadata-action` output is the **authoritative source** of tags - ✅ These are the **exact tags** applied by `docker/build-push-action` - ✅ No assumptions or manual reconstruction - ✅ Works for any repository owner name (uppercase, lowercase, mixed case) - ✅ Consistent with downstream jobs that expect the same tag format **Null Safety**: - If `steps.meta.outputs.tags` is empty (shouldn't happen), `IMAGE_TAG` will be empty - The defensive version explicitly checks for this and fails with a clear message - The concise version will fail at `docker save` with a clear error about missing image reference --- ## Side Effects & Related Updates ### No Changes Needed ✅ The following steps/jobs **already handle the image correctly** and require **no modifications**: 1. **Upload Image Artifact** (lines 144-150) - ✅ Uses the saved tar file from the previous step - ✅ No dependency on image tag format 2. **verify-supply-chain-pr job** (lines 495-722) - ✅ Downloads and loads the tar file - ✅ References image using the same normalization logic - ✅ Will work correctly once artifact exists 3. **Load Docker Image step** (lines 524-529) - ✅ Loads from tar file (preserves original tags) - ✅ No changes needed ### Why No Downstream Changes Are Needed When you save a Docker image to a tar file using `docker save`, the tar file contains: - The image layers - The image configuration - **The exact tags that were applied to the image** When you load the image using `docker load -i charon-pr-image.tar`, Docker restores: - All image layers - The image configuration - **The exact same tags** that were saved **Example**: ```bash # Save with tag: ghcr.io/wikid82/charon:pr-123 docker save ghcr.io/wikid82/charon:pr-123 -o image.tar # Load restores the exact same tag docker load -i image.tar # Image is now available as: ghcr.io/wikid82/charon:pr-123 docker images ghcr.io/wikid82/charon:pr-123 ``` The `verify-supply-chain-pr` job references: ```bash IMAGE_REF="ghcr.io/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}" ``` This will match perfectly because: - `IMAGE_NAME` is normalized the same way (lines 516-518) - The PR number is the same - The loaded image has the exact tag we saved --- ## Testing Plan ### Phase 1: Local Verification (Recommended) Before pushing to CI, verify the fix locally: ```bash # 1. Build a PR-style image locally docker build -t ghcr.io/wikid82/charon:pr-test . # 2. Verify the image exists docker images ghcr.io/wikid82/charon:pr-test # 3. Save the image docker save ghcr.io/wikid82/charon:pr-test -o /tmp/test-image.tar # 4. Verify the tar was created ls -lh /tmp/test-image.tar # 5. Load the image in a clean environment docker rmi ghcr.io/wikid82/charon:pr-test # Remove original docker load -i /tmp/test-image.tar # Reload from tar docker images ghcr.io/wikid82/charon:pr-test # Verify it's back ``` **Expected Result**: All steps succeed without "reference does not exist" errors. ### Phase 2: CI Testing 1. **Apply the fix** to `.github/workflows/docker-build.yml` (lines 135-142) 2. **Create a test PR** on the `feature/beta-release` branch 3. **Verify the workflow execution**: - ✅ `build-and-push` job completes successfully - ✅ "Save Docker Image as Artifact" step shows detected tag in logs - ✅ "Upload Image Artifact" step uploads the tar file - ✅ `verify-supply-chain-pr` job runs and downloads the artifact - ✅ "Load Docker Image" step loads the image successfully - ✅ SBOM generation and vulnerability scanning complete ### Phase 3: Edge Cases Test the following scenarios: 1. **Different repository owners** (uppercase, lowercase, mixed case): - `Wikid82/charon` → `wikid82/charon` - `TestUser/charon` → `testuser/charon` - `UPPERCASE/charon` → `uppercase/charon` 2. **Multiple rapid commits** to the same PR: - Verify no artifact conflicts - Verify each commit gets its own workflow run 3. **Skipped builds** (chore commits): - Verify `verify-supply-chain-pr-skipped` runs correctly - Verify feedback comment is posted 4. **Different PR numbers**: - Single digit (PR #5) - Double digit (PR #42) - Triple digit (PR #123) ### Phase 4: Rollback Plan If the fix causes issues: 1. **Immediate rollback**: Revert the commit that applied this fix 2. **Temporary workaround**: Disable artifact save/upload steps: ```yaml if: github.event_name == 'pull_request' && false # Temporarily disabled ``` 3. **Investigation**: Check GitHub Actions logs for actual image tags: ```yaml # Add this step before the save step - name: Debug Image Tags if: github.event_name == 'pull_request' run: | echo "Metadata tags:" echo "${{ steps.meta.outputs.tags }}" echo "" echo "Local images:" docker images ``` --- ## Success Criteria ### Functional - ✅ `build-and-push` job completes successfully for all PR builds - ✅ Docker image artifact is saved and uploaded for all PR builds - ✅ `verify-supply-chain-pr` job runs and downloads the artifact - ✅ No "reference does not exist" errors in any step - ✅ Supply chain verification completes for all PR builds ### Observable Metrics - 📊 **Job Success Rate**: 100% for `build-and-push` job on PRs - 📦 **Artifact Upload Rate**: 100% for PR builds - 🔒 **Supply Chain Verification Rate**: 100% for PR builds (excluding skipped) - ⏱️ **Build Time**: No significant increase (<30 seconds for artifact save) ### Quality - 🔍 **Clear logging** of detected image tags - 🛡️ **Defensive error handling** (fails fast with clear messages) - 📝 **Consistent** with existing patterns in the workflow --- ## Implementation Checklist ### Pre-Implementation - [x] Analyze the root cause (line 141 in docker-build.yml) - [x] Identify the exact failing step and command - [x] Review job dependencies and downstream impacts - [x] Design the fix with before/after comparison - [x] Document testing plan and success criteria ### Implementation - [ ] Apply the fix to `.github/workflows/docker-build.yml` (lines 135-142) - [ ] Choose between concise or defensive version (recommend defensive for production) - [ ] Commit with message: `fix(ci): use metadata tag for docker save in PR builds` - [ ] Push to `feature/beta-release` branch ### Testing - [ ] Create a test PR and verify workflow runs successfully - [ ] Check GitHub Actions logs for "🔍 Detected image tag" output - [ ] Verify artifact is uploaded (check Actions artifacts tab) - [ ] Verify `verify-supply-chain-pr` job completes successfully - [ ] Test edge cases (uppercase owner, different PR numbers) - [ ] Monitor 2-3 additional PR builds for stability ### Post-Implementation - [ ] Update CHANGELOG.md with the fix - [ ] Close any related GitHub issues - [ ] Document lessons learned (if applicable) - [ ] Monitor for regressions over next week --- ## Appendix A: Error Analysis Summary ### Error Signature ``` Run IMAGE_NAME=$(echo "Wikid82/charon" | tr '[:upper:]' '[:lower:]') Error response from daemon: reference does not exist Error: Process completed with exit code 1. ``` ### Error Details - **File**: `.github/workflows/docker-build.yml` - **Job**: `build-and-push` - **Step**: "Save Docker Image as Artifact" - **Lines**: 135-142 - **Failing Command**: Line 141 → `docker save ghcr.io/${IMAGE_NAME}:pr-${PR_NUMBER} -o /tmp/charon-pr-image.tar` ### Error Type **Docker Daemon Error**: The Docker daemon cannot find the image reference specified in the `docker save` command. ### Root Cause Categories | Category | Likelihood | Evidence | |----------|-----------|----------| | **Tag Mismatch** | ✅ **Most Likely** | Manual reconstruction doesn't match actual tag | | Image Not Loaded | ❌ Unlikely | Build step succeeds | | Timing Issue | ❌ Unlikely | Steps are sequential | | Permissions Issue | ❌ Unlikely | Other Docker commands work | **Conclusion**: **Tag Mismatch** is the root cause. ### Evidence Supporting Root Cause 1. ✅ **Build step succeeds** (no reported build failures) 2. ✅ **Error occurs at `docker save`** (after successful build) 3. ✅ **Manual tag reconstruction** (lines 140-141) 4. ✅ **Inconsistent with docker/build-push-action behavior** when `load: true` 5. ✅ **Similar patterns work** because they either: - Pull from registry (test-image job) - Load from artifact (verify-supply-chain-pr job) ### Fix Summary **What Changed**: Use exact tag from `steps.meta.outputs.tags` instead of manually constructing it **Why It Works**: The metadata action output is the authoritative source of tags applied by docker/build-push-action **Risk Level**: **Low** - Read-only operation on existing step outputs --- ## Appendix B: Relevant Documentation - [Docker Build-Push-Action - Load Option](https://github.com/docker/build-push-action#load) - [Docker Metadata-Action - Outputs](https://github.com/docker/metadata-action#outputs) - [Docker CLI - save command](https://docs.docker.com/engine/reference/commandline/save/) - [GitHub Actions - Artifacts](https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts) - [Docker Buildx - Multi-platform builds](https://docs.docker.com/build/building/multi-platform/) --- **END OF ANALYSIS & FIX PLAN**