Files
Charon/docs/plans/current_spec.md

22 KiB

CI Docker Build Failure Analysis & Fix Plan

Issue: Docker Build workflow failing on PR builds during image artifact save Workflow: .github/workflows/docker-build.yml Error: Error response from daemon: reference does not exist Date: 2026-01-12 Status: Analysis Complete - Ready for Implementation


Executive Summary

The docker-build.yml workflow is failing at the "Save Docker Image as Artifact" step (lines 135-142) for PR builds. The root cause is a mismatch between the image name/tag format used by docker/build-push-action with load: true and the image reference manually constructed in the docker save command.

Impact: All PR builds fail at the artifact save step, preventing the verify-supply-chain-pr job from running.

Fix Complexity: Low - Single step modification to use the exact tag from metadata output instead of manually constructing it.


Root Cause Analysis

1. The Failing Step (Lines 135-142)

Location: .github/workflows/docker-build.yml, lines 135-142

- name: Save Docker Image as Artifact
  if: github.event_name == 'pull_request'
  run: |
    IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]')
    docker save ghcr.io/${IMAGE_NAME}:pr-${{ github.event.pull_request.number }} -o /tmp/charon-pr-image.tar
    ls -lh /tmp/charon-pr-image.tar

What Happens:

  • Line 140: Normalizes repository owner name to lowercase (e.g., Wikid82wikid82)
  • Line 141: Constructs the image reference manually: ghcr.io/${IMAGE_NAME}:pr-${PR_NUMBER}
  • Line 141: Attempts to save the image using this manually constructed reference

The Problem: The manually constructed image reference assumes the Docker image was loaded with the exact format ghcr.io/wikid82/charon:pr-123, but when docker/build-push-action uses load: true, the actual tag format applied to the local image may differ.

2. The Build Step (Lines 111-123)

Location: .github/workflows/docker-build.yml, lines 111-123

- name: Build and push Docker image
  if: steps.skip.outputs.skip_build != 'true'
  id: build-and-push
  uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6
  with:
    context: .
    platforms: ${{ github.event_name == 'pull_request' && 'linux/amd64' || 'linux/amd64,linux/arm64' }}
    push: ${{ github.event_name != 'pull_request' }}
    load: ${{ github.event_name == 'pull_request' }}
    tags: ${{ steps.meta.outputs.tags }}
    labels: ${{ steps.meta.outputs.labels }}
    no-cache: true
    pull: true
    build-args: |
      VERSION=${{ steps.meta.outputs.version }}
      BUILD_DATE=${{ fromJSON(steps.meta.outputs.json).labels['org.opencontainers.image.created'] }}
      VCS_REF=${{ github.sha }}
      CADDY_IMAGE=${{ steps.caddy.outputs.image }}

Key Parameters for PR Builds:

  • Line 117: push: false → Image is not pushed to the registry
  • Line 118: load: true → Image is loaded into the local Docker daemon
  • Line 119: tags: ${{ steps.meta.outputs.tags }} → Uses tags generated by the metadata action

Behavior with load: true:

  • The image is built and loaded into the local Docker daemon
  • Tags from steps.meta.outputs.tags are applied to the image
  • For PR builds, this generates one tag: ghcr.io/wikid82/charon:pr-123

3. The Metadata Step (Lines 105-113)

Location: .github/workflows/docker-build.yml, lines 105-113

- name: Extract metadata (tags, labels)
  if: steps.skip.outputs.skip_build != 'true'
  id: meta
  uses: docker/metadata-action@c299e40c65443455700f0fdfc63efafe5b349051 # v5.10.0
  with:
    images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
    tags: |
      type=raw,value=latest,enable={{is_default_branch}}
      type=raw,value=dev,enable=${{ github.ref == 'refs/heads/development' }}
      type=raw,value=beta,enable=${{ github.ref == 'refs/heads/feature/beta-release' }}
      type=raw,value=pr-${{ github.event.pull_request.number }},enable=${{ github.event_name == 'pull_request' }}
      type=sha,format=short,enable=${{ github.event_name != 'pull_request' }}

For PR builds, only line 111 is enabled:

type=raw,value=pr-${{ github.event.pull_request.number }},enable=${{ github.event_name == 'pull_request' }}

This generates a single tag: ghcr.io/wikid82/charon:pr-123

Note: The IMAGE_NAME is already normalized to lowercase at lines 56-57:

- name: Normalize image name
  run: |
    IMAGE_NAME=$(echo "${{ env.IMAGE_NAME }}" | tr '[:upper:]' '[:lower:]')
    echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV

So the metadata action receives ghcr.io/wikid82/charon (lowercase) as input.

4. The Critical Issue: Tag Mismatch

When docker/build-push-action uses load: true, the behavior is:

  1. Expected: Image is loaded with tags from steps.meta.outputs.tagsghcr.io/wikid82/charon:pr-123
  2. Reality: The exact tag format depends on Docker Buildx's internal behavior

The docker save command at line 141 tries to save:

ghcr.io/${IMAGE_NAME}:pr-${{ github.event.pull_request.number }}

But this manually reconstructs the tag instead of using the actual tag applied by docker/build-push-action.

Why This Fails:

  • The docker save command requires an exact match of the image reference as it exists in the local Docker daemon
  • If the image is loaded with a slightly different tag format, docker save throws:
    Error response from daemon: reference does not exist
    

Evidence from Error Log:

Run IMAGE_NAME=$(echo "Wikid82/charon" | tr '[:upper:]' '[:lower:]')
Error response from daemon: reference does not exist
Error: Process completed with exit code 1.

This confirms the docker save command cannot find the image reference constructed at line 141.

5. Job Dependencies Analysis

Complete Workflow Structure:

build-and-push (lines 34-234)
├── Outputs: skip_build, digest
├── Steps:
│   ├── Build image (load=true for PRs)
│   ├── Save image artifact (❌ FAILS HERE at line 141)
│   └── Upload artifact (never reached)
│
test-image (lines 354-463)
├── needs: build-and-push
├── if: ... && github.event_name != 'pull_request'
└── (Not relevant for PRs)
│
trivy-pr-app-only (lines 465-493)
├── if: github.event_name == 'pull_request'
└── (Independent - builds its own image)
│
verify-supply-chain-pr (lines 495-722)
├── needs: build-and-push
├── if: github.event_name == 'pull_request' && needs.build-and-push.result == 'success'
├── Steps:
│   ├── ❌ Download artifact (artifact doesn't exist)
│   ├── ❌ Load image (cannot load non-existent artifact)
│   └── ❌ Scan image (cannot scan non-loaded image)
└── Currently skipped due to build-and-push failure
│
verify-supply-chain-pr-skipped (lines 724-754)
├── needs: build-and-push
└── if: github.event_name == 'pull_request' && needs.build-and-push.outputs.skip_build == 'true'

Dependency Chain Impact:

  1. build-and-push fails at line 141 (docker save)
  2. Artifact is never uploaded (lines 144-150 are skipped)
  3. verify-supply-chain-pr cannot download artifact (line 517 fails)
  4. Supply chain verification never runs for PRs

6. Verification: Why Similar Patterns Work

Line 376 (in test-image job):

- name: Normalize image name
  run: |
    raw="${{ github.repository_owner }}/${{ github.event.repository.name }}"
    IMAGE_NAME=$(echo "$raw" | tr '[:upper:]' '[:lower:]')
    echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV

This job pulls from the registry (line 395):

- name: Pull Docker image
  run: docker pull ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ steps.tag.outputs.tag }}

Works because: It pulls a pushed image from the registry, not a locally loaded one.

Line 516 (in verify-supply-chain-pr job):

- name: Normalize image name
  run: |
    IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]')
    echo "IMAGE_NAME=${IMAGE_NAME}" >> $GITHUB_ENV

Would work if: The artifact existed. This job loads the image from the tar file, which preserves the exact tags.

Key Difference: The failing step tries to save an image before we know its exact tag, while the working patterns either:

  • Pull from registry with a known tag
  • Load from artifact with preserved tags

The Solution

Strategy: Extract the exact tag from steps.meta.outputs.tags and use it directly in docker save.

Why This Works:

  • The docker/metadata-action generates the tags that docker/build-push-action actually applies to the image
  • For PR builds, this is: ghcr.io/<owner>/charon:pr-<number> (normalized, lowercase)
  • This is the exact tag that exists in the local Docker daemon after load: true

Rationale:

  • Avoids manual tag reconstruction
  • Uses the authoritative source of truth for image tags
  • Eliminates assumption-based errors

Risk Level: Low - Read-only operation on existing step outputs

Option 2: Inspect Local Images (ALTERNATIVE)

Strategy: Use docker images to discover the actual tag before saving.

Why Not Recommended:

  • Adds complexity
  • Requires pattern matching or parsing
  • Less reliable than using metadata output

Option 3: Override Tag for PRs (FALLBACK)

Strategy: Modify the build step to apply a deterministic local tag for PR builds.

Why Not Recommended:

  • Requires more changes (build step + save step)
  • Breaks consistency with existing tag patterns
  • Downstream jobs expect registry-style tags

Implementation

File: .github/workflows/docker-build.yml Location: Lines 135-142 (Save Docker Image as Artifact step)

Before (Current - BROKEN)

- name: Save Docker Image as Artifact
  if: github.event_name == 'pull_request'
  run: |
    IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]')
    docker save ghcr.io/${IMAGE_NAME}:pr-${{ github.event.pull_request.number }} -o /tmp/charon-pr-image.tar
    ls -lh /tmp/charon-pr-image.tar

Issue: Manually constructs the image reference, which may not match the actual tag applied by docker/build-push-action.

After (FIXED - Concise Version)

- name: Save Docker Image as Artifact
  if: github.event_name == 'pull_request'
  run: |
    # Extract the first tag from metadata action (PR tag)
    IMAGE_TAG=$(echo "${{ steps.meta.outputs.tags }}" | head -n 1)
    echo "🔍 Detected image tag: ${IMAGE_TAG}"

    # Verify the image exists locally
    echo "📋 Available local images:"
    docker images --filter "reference=*charon*"

    # Save the image using the exact tag from metadata
    echo "💾 Saving image: ${IMAGE_TAG}"
    docker save "${IMAGE_TAG}" -o /tmp/charon-pr-image.tar

    # Verify the artifact was created
    echo "✅ Artifact created:"
    ls -lh /tmp/charon-pr-image.tar

After (FIXED - Defensive Version for Production)

- name: Save Docker Image as Artifact
  if: github.event_name == 'pull_request'
  run: |
    # Extract the first tag from metadata action (PR tag)
    IMAGE_TAG=$(echo "${{ steps.meta.outputs.tags }}" | head -n 1)

    if [[ -z "${IMAGE_TAG}" ]]; then
      echo "❌ ERROR: No image tag found in metadata output"
      echo "Metadata tags output:"
      echo "${{ steps.meta.outputs.tags }}"
      exit 1
    fi

    echo "🔍 Detected image tag: ${IMAGE_TAG}"

    # Verify the image exists locally
    if ! docker image inspect "${IMAGE_TAG}" >/dev/null 2>&1; then
      echo "❌ ERROR: Image ${IMAGE_TAG} not found locally"
      echo "📋 Available images:"
      docker images
      exit 1
    fi

    # Save the image using the exact tag from metadata
    echo "💾 Saving image: ${IMAGE_TAG}"
    docker save "${IMAGE_TAG}" -o /tmp/charon-pr-image.tar

    # Verify the artifact was created
    echo "✅ Artifact created:"
    ls -lh /tmp/charon-pr-image.tar

Key Changes:

  1. Extract exact tag: IMAGE_TAG=$(echo "${{ steps.meta.outputs.tags }}" | head -n 1)

    • Uses the first (and only) tag from metadata output
    • For PR builds: ghcr.io/wikid82/charon:pr-123
  2. Add debugging: docker images --filter "reference=*charon*"

    • Shows available images for troubleshooting
    • Helps diagnose tag mismatches in logs
  3. Use extracted tag: docker save "${IMAGE_TAG}" -o /tmp/charon-pr-image.tar

    • No manual reconstruction
    • Guaranteed to match the actual image tag
  4. Defensive checks (production version only):

    • Verify IMAGE_TAG is not empty
    • Verify image exists before attempting save
    • Fail fast with clear error messages

Why This Works:

  • The docker/metadata-action output is the authoritative source of tags
  • These are the exact tags applied by docker/build-push-action
  • No assumptions or manual reconstruction
  • Works for any repository owner name (uppercase, lowercase, mixed case)
  • Consistent with downstream jobs that expect the same tag format

Null Safety:

  • If steps.meta.outputs.tags is empty (shouldn't happen), IMAGE_TAG will be empty
  • The defensive version explicitly checks for this and fails with a clear message
  • The concise version will fail at docker save with a clear error about missing image reference

No Changes Needed

The following steps/jobs already handle the image correctly and require no modifications:

  1. Upload Image Artifact (lines 144-150)

    • Uses the saved tar file from the previous step
    • No dependency on image tag format
  2. verify-supply-chain-pr job (lines 495-722)

    • Downloads and loads the tar file
    • References image using the same normalization logic
    • Will work correctly once artifact exists
  3. Load Docker Image step (lines 524-529)

    • Loads from tar file (preserves original tags)
    • No changes needed

Why No Downstream Changes Are Needed

When you save a Docker image to a tar file using docker save, the tar file contains:

  • The image layers
  • The image configuration
  • The exact tags that were applied to the image

When you load the image using docker load -i charon-pr-image.tar, Docker restores:

  • All image layers
  • The image configuration
  • The exact same tags that were saved

Example:

# Save with tag: ghcr.io/wikid82/charon:pr-123
docker save ghcr.io/wikid82/charon:pr-123 -o image.tar

# Load restores the exact same tag
docker load -i image.tar

# Image is now available as: ghcr.io/wikid82/charon:pr-123
docker images ghcr.io/wikid82/charon:pr-123

The verify-supply-chain-pr job references:

IMAGE_REF="ghcr.io/${{ env.IMAGE_NAME }}:pr-${{ github.event.pull_request.number }}"

This will match perfectly because:

  • IMAGE_NAME is normalized the same way (lines 516-518)
  • The PR number is the same
  • The loaded image has the exact tag we saved

Testing Plan

Before pushing to CI, verify the fix locally:

# 1. Build a PR-style image locally
docker build -t ghcr.io/wikid82/charon:pr-test .

# 2. Verify the image exists
docker images ghcr.io/wikid82/charon:pr-test

# 3. Save the image
docker save ghcr.io/wikid82/charon:pr-test -o /tmp/test-image.tar

# 4. Verify the tar was created
ls -lh /tmp/test-image.tar

# 5. Load the image in a clean environment
docker rmi ghcr.io/wikid82/charon:pr-test  # Remove original
docker load -i /tmp/test-image.tar          # Reload from tar
docker images ghcr.io/wikid82/charon:pr-test  # Verify it's back

Expected Result: All steps succeed without "reference does not exist" errors.

Phase 2: CI Testing

  1. Apply the fix to .github/workflows/docker-build.yml (lines 135-142)
  2. Create a test PR on the feature/beta-release branch
  3. Verify the workflow execution:
    • build-and-push job completes successfully
    • "Save Docker Image as Artifact" step shows detected tag in logs
    • "Upload Image Artifact" step uploads the tar file
    • verify-supply-chain-pr job runs and downloads the artifact
    • "Load Docker Image" step loads the image successfully
    • SBOM generation and vulnerability scanning complete

Phase 3: Edge Cases

Test the following scenarios:

  1. Different repository owners (uppercase, lowercase, mixed case):

    • Wikid82/charonwikid82/charon
    • TestUser/charontestuser/charon
    • UPPERCASE/charonuppercase/charon
  2. Multiple rapid commits to the same PR:

    • Verify no artifact conflicts
    • Verify each commit gets its own workflow run
  3. Skipped builds (chore commits):

    • Verify verify-supply-chain-pr-skipped runs correctly
    • Verify feedback comment is posted
  4. Different PR numbers:

    • Single digit (PR #5)
    • Double digit (PR #42)
    • Triple digit (PR #123)

Phase 4: Rollback Plan

If the fix causes issues:

  1. Immediate rollback: Revert the commit that applied this fix
  2. Temporary workaround: Disable artifact save/upload steps:
    if: github.event_name == 'pull_request' && false  # Temporarily disabled
    
  3. Investigation: Check GitHub Actions logs for actual image tags:
    # Add this step before the save step
    - name: Debug Image Tags
      if: github.event_name == 'pull_request'
      run: |
        echo "Metadata tags:"
        echo "${{ steps.meta.outputs.tags }}"
        echo ""
        echo "Local images:"
        docker images
    

Success Criteria

Functional

  • build-and-push job completes successfully for all PR builds
  • Docker image artifact is saved and uploaded for all PR builds
  • verify-supply-chain-pr job runs and downloads the artifact
  • No "reference does not exist" errors in any step
  • Supply chain verification completes for all PR builds

Observable Metrics

  • 📊 Job Success Rate: 100% for build-and-push job on PRs
  • 📦 Artifact Upload Rate: 100% for PR builds
  • 🔒 Supply Chain Verification Rate: 100% for PR builds (excluding skipped)
  • ⏱️ Build Time: No significant increase (<30 seconds for artifact save)

Quality

  • 🔍 Clear logging of detected image tags
  • 🛡️ Defensive error handling (fails fast with clear messages)
  • 📝 Consistent with existing patterns in the workflow

Implementation Checklist

Pre-Implementation

  • Analyze the root cause (line 141 in docker-build.yml)
  • Identify the exact failing step and command
  • Review job dependencies and downstream impacts
  • Design the fix with before/after comparison
  • Document testing plan and success criteria

Implementation

  • Apply the fix to .github/workflows/docker-build.yml (lines 135-142)
  • Choose between concise or defensive version (recommend defensive for production)
  • Commit with message: fix(ci): use metadata tag for docker save in PR builds
  • Push to feature/beta-release branch

Testing

  • Create a test PR and verify workflow runs successfully
  • Check GitHub Actions logs for "🔍 Detected image tag" output
  • Verify artifact is uploaded (check Actions artifacts tab)
  • Verify verify-supply-chain-pr job completes successfully
  • Test edge cases (uppercase owner, different PR numbers)
  • Monitor 2-3 additional PR builds for stability

Post-Implementation

  • Update CHANGELOG.md with the fix
  • Close any related GitHub issues
  • Document lessons learned (if applicable)
  • Monitor for regressions over next week

Appendix A: Error Analysis Summary

Error Signature

Run IMAGE_NAME=$(echo "Wikid82/charon" | tr '[:upper:]' '[:lower:]')
Error response from daemon: reference does not exist
Error: Process completed with exit code 1.

Error Details

  • File: .github/workflows/docker-build.yml
  • Job: build-and-push
  • Step: "Save Docker Image as Artifact"
  • Lines: 135-142
  • Failing Command: Line 141 → docker save ghcr.io/${IMAGE_NAME}:pr-${PR_NUMBER} -o /tmp/charon-pr-image.tar

Error Type

Docker Daemon Error: The Docker daemon cannot find the image reference specified in the docker save command.

Root Cause Categories

Category Likelihood Evidence
Tag Mismatch Most Likely Manual reconstruction doesn't match actual tag
Image Not Loaded Unlikely Build step succeeds
Timing Issue Unlikely Steps are sequential
Permissions Issue Unlikely Other Docker commands work

Conclusion: Tag Mismatch is the root cause.

Evidence Supporting Root Cause

  1. Build step succeeds (no reported build failures)
  2. Error occurs at docker save (after successful build)
  3. Manual tag reconstruction (lines 140-141)
  4. Inconsistent with docker/build-push-action behavior when load: true
  5. Similar patterns work because they either:
    • Pull from registry (test-image job)
    • Load from artifact (verify-supply-chain-pr job)

Fix Summary

What Changed: Use exact tag from steps.meta.outputs.tags instead of manually constructing it

Why It Works: The metadata action output is the authoritative source of tags applied by docker/build-push-action

Risk Level: Low - Read-only operation on existing step outputs


Appendix B: Relevant Documentation


END OF ANALYSIS & FIX PLAN