Files
Charon/docs/plans/archive/docker_compose_ci_fix.md
akanealw eec8c28fb3
Some checks are pending
Go Benchmark / Performance Regression Check (push) Waiting to run
Cerberus Integration / Cerberus Security Stack Integration (push) Waiting to run
Upload Coverage to Codecov / Backend Codecov Upload (push) Waiting to run
Upload Coverage to Codecov / Frontend Codecov Upload (push) Waiting to run
CodeQL - Analyze / CodeQL analysis (go) (push) Waiting to run
CodeQL - Analyze / CodeQL analysis (javascript-typescript) (push) Waiting to run
CrowdSec Integration / CrowdSec Bouncer Integration (push) Waiting to run
Docker Build, Publish & Test / build-and-push (push) Waiting to run
Docker Build, Publish & Test / Security Scan PR Image (push) Blocked by required conditions
Quality Checks / Auth Route Protection Contract (push) Waiting to run
Quality Checks / Codecov Trigger/Comment Parity Guard (push) Waiting to run
Quality Checks / Backend (Go) (push) Waiting to run
Quality Checks / Frontend (React) (push) Waiting to run
Rate Limit integration / Rate Limiting Integration (push) Waiting to run
Security Scan (PR) / Trivy Binary Scan (push) Waiting to run
Supply Chain Verification (PR) / Verify Supply Chain (push) Waiting to run
WAF integration / Coraza WAF Integration (push) Waiting to run
changed perms
2026-04-22 18:19:14 +00:00

17 KiB
Executable File

Docker Compose CI Failure Remediation Plan

Status: Active Created: 2026-01-30 Priority: CRITICAL (Blocking CI)


Executive Summary

The E2E test workflow (e2e-tests.yml) is failing when attempting to start containers via docker-compose.playwright-ci.yml. The root cause is an incorrect Docker image reference format in the compose file that attempts to use a bare SHA256 digest instead of a fully-qualified image reference with registry and repository.

Error Message:

charon-app Error pull access denied for sha256, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

Root Cause: The compose file's image: directive evaluates to a bare SHA256 digest (e.g., sha256:057a9998...) instead of a properly formatted image reference like ghcr.io/wikid82/charon@sha256:057a9998....


Root Cause Analysis

Current Implementation (Broken)

File: .docker/compose/docker-compose.playwright-ci.yml Lines: 29-37

charon-app:
  # CI default (digest-pinned via workflow output):
  # CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>
  # Local override (tag-based):
  # CHARON_E2E_IMAGE=charon:e2e-test
  image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}

Workflow Environment Variable

File: .github/workflows/e2e-tests.yml Line: 158

env:
  CHARON_E2E_IMAGE_DIGEST: ${{ needs.build.outputs.image_digest }}

Problem: The needs.build.outputs.image_digest from the build job in e2e-tests.yml returns only the SHA256 digest (e.g., sha256:057a9998fa7a5b224a06ec8989c892d2ac8f9323530470965baaf5fcaab7557c), not a fully-qualified image reference.

Why Docker Fails

Docker Compose interprets the image: field as:

  • sha256:057a9998...Bare digest, no registry/repository

Docker then tries to:

  1. Parse this as a repository name
  2. Look for a repository literally named "sha256"
  3. Fail with "pull access denied" because no such repository exists

Correct Reference Format

Docker requires one of these formats:

  1. Tag-based: charon:e2e-test (local image)
  2. Digest-pinned: ghcr.io/wikid82/charon@sha256:057a9998... (registry + repo + digest)

Technical Investigation

How the Image is Built and Loaded

Workflow Flow (e2e-tests.yml):

  1. Build Job (lines 90-148):

    • Builds Docker image with tag charon:e2e-test
    • Saves image to charon-e2e-image.tar artifact
    • Outputs image digest from build step
  2. E2E Test Job (lines 173-177):

    • Downloads charon-e2e-image.tar artifact
    • Loads image with: docker load -i charon-e2e-image.tar
    • Loaded image has tag: charon:e2e-test (from build step)
  3. Start Container (line 219):

    • Runs: docker compose -f .docker/compose/docker-compose.playwright-ci.yml up -d
    • Compose file tries to use $CHARON_E2E_IMAGE_DIGEST (bare SHA256)
    • Docker cannot find image because the digest doesn't match loaded tag

Mismatch Between Build and Reference

Step Image Reference Status
Build charon:e2e-test Image tagged
Save/Load charon:e2e-test Tag preserved in tar
Compose sha256:057a9998... Wrong reference type

The loaded image is available as charon:e2e-test, but the compose file is looking for sha256:...


Comparison with Working Workflow

playwright.yml (Working) vs e2e-tests.yml (Broken)

playwright.yml (lines 207-209):

- name: Load Docker image
  run: |
    docker load < charon-pr-image.tar
    docker images | grep charon

Container Start (lines 213-277):

- name: Start Charon container
  run: |
    # Explicitly constructs image reference from variables
    IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]')
    IMAGE_REF="ghcr.io/${IMAGE_NAME}:pr-${{ steps.pr-info.outputs.pr_number }}"

    docker run -d \
      --name charon-test \
      -e CHARON_ENV="${CHARON_ENV}" \
      # ... (uses constructed IMAGE_REF)

Key Difference: playwright.yml uses docker run directly with explicit image reference construction, not Docker Compose with environment variable substitution.


Solution Architecture

Rationale: The loaded image is already tagged as charon:e2e-test. We should use this tag directly instead of trying to use a digest.

Change: Set CHARON_E2E_IMAGE_DIGEST to the tag instead of the digest, or use a different variable name.

Option 2: Re-tag Image with Digest

Rationale: Re-tag the loaded image to match the digest-based reference expected by the compose file.

Change: After loading, re-tag the image with the full digest reference.

Option 3: Simplify Compose File

Rationale: Remove the digest-based environment variable and always use the local tag for CI.

Change: Hard-code charon:e2e-test or use a simpler env var pattern.


Strategy

Use the pre-built tag for CI, not the digest. The digest output from the build is metadata but not needed for referencing a locally loaded image.

Implementation

Change 1: Remove Digest from Workflow Environment

File: .github/workflows/e2e-tests.yml Lines: 155-158

Current:

env:
  # Required for security teardown (emergency reset fallback when ACL blocks API)
  CHARON_EMERGENCY_TOKEN: ${{ secrets.CHARON_EMERGENCY_TOKEN }}
  # Enable security-focused endpoints and test gating
  CHARON_EMERGENCY_SERVER_ENABLED: "true"
  CHARON_SECURITY_TESTS_ENABLED: "true"
  CHARON_E2E_IMAGE_DIGEST: ${{ needs.build.outputs.image_digest }}

Corrected:

env:
  # Required for security teardown (emergency reset fallback when ACL blocks API)
  CHARON_EMERGENCY_TOKEN: ${{ secrets.CHARON_EMERGENCY_TOKEN }}
  # Enable security-focused endpoints and test gating
  CHARON_EMERGENCY_SERVER_ENABLED: "true"
  CHARON_SECURITY_TESTS_ENABLED: "true"
  # Use local tag for pre-built image (loaded from artifact)
  CHARON_E2E_IMAGE: charon:e2e-test

Rationale:

  • The docker load command restores the image with its original tag charon:e2e-test
  • We should use this tag, not the digest
  • The digest is only useful for verifying image integrity, not for referencing locally loaded images

Change 2: Update Compose File Comment Documentation

File: .docker/compose/docker-compose.playwright-ci.yml Lines: 31-37

Current:

  charon-app:
    # CI default (digest-pinned via workflow output):
    # CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>
    # Local override (tag-based):
    # CHARON_E2E_IMAGE=charon:e2e-test
    image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}

Corrected:

  charon-app:
    # CI default: Uses pre-built image loaded from artifact
    # Set via workflow: CHARON_E2E_IMAGE=charon:e2e-test
    # Local development: Uses locally built image
    # Override with: CHARON_E2E_IMAGE=charon:local-dev
    image: ${CHARON_E2E_IMAGE:-charon:e2e-test}

Rationale:

  • Simplify the environment variable fallback chain
  • Remove confusing CHARON_E2E_IMAGE_DIGEST variable that was set incorrectly
  • Document the actual behavior: CI loads pre-built image with known tag
  • Make local development override clearer

Alternative Solution: Option 2 (If Digest-Pinning Required)

If there's a requirement to use digest-based references for security/reproducibility, we must re-tag the loaded image.

Implementation

Change 1: Re-tag After Load

File: .github/workflows/e2e-tests.yml After Line: 177 (in "Load Docker image" step)

Add:

      - name: Load and re-tag Docker image
        run: |
          # Load the pre-built image
          docker load -i charon-e2e-image.tar
          docker images | grep charon

          # Re-tag for digest-based reference if needed
          IMAGE_DIGEST="${{ needs.build.outputs.image_digest }}"
          if [[ -n "$IMAGE_DIGEST" ]]; then
            # Extract just the digest hash (sha256:...)
            DIGEST_HASH=$(echo "$IMAGE_DIGEST" | grep -oP 'sha256:[a-f0-9]{64}')

            # Construct full reference
            FULL_REF="ghcr.io/wikid82/charon@${DIGEST_HASH}"

            echo "Re-tagging charon:e2e-test as $FULL_REF"
            docker tag charon:e2e-test "$FULL_REF"

            # Export for compose file
            echo "CHARON_E2E_IMAGE_DIGEST=$FULL_REF" >> $GITHUB_ENV
          else
            # Fallback to tag-based reference
            echo "CHARON_E2E_IMAGE=charon:e2e-test" >> $GITHUB_ENV
          fi

Change 2: Update Compose File

File: .docker/compose/docker-compose.playwright-ci.yml Lines: 31-37

Keep the current implementation but fix the comment:

  charon-app:
    # CI: Digest-pinned reference (re-tagged from loaded artifact)
    #     CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon@sha256:<digest>
    # Local: Tag-based reference for development
    #     CHARON_E2E_IMAGE=charon:e2e-test
    image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}

Rationale:

  • Preserves digest-based pinning for supply chain security
  • Re-tagging creates a local image reference that Docker can resolve
  • Falls back gracefully to tag-based reference for local development

Why Option 1:

  1. Simpler: No re-tagging logic needed
  2. Faster: Fewer Docker operations
  3. Sufficient: The image is already built and loaded; tag reference is adequate
  4. Consistent: Matches how playwright.yml handles loaded images
  5. Local-first: The image is local after docker load, not in a registry

When to use Option 2:

  • If there's a compliance requirement to use digest references
  • If SBOM/attestation workflows need digest traceability
  • If multi-registry scenarios require content-addressable references

Implementation Steps

  1. Update workflow environment variables

    • File: .github/workflows/e2e-tests.yml
    • Line: 158
    • Change: Replace CHARON_E2E_IMAGE_DIGEST with CHARON_E2E_IMAGE: charon:e2e-test
  2. Update compose file documentation

    • File: .docker/compose/docker-compose.playwright-ci.yml
    • Lines: 31-37
    • Change: Simplify variable fallback and update comments
  3. Verify changes

    • Run: docker compose -f .docker/compose/docker-compose.playwright-ci.yml config
    • Ensure: image: charon:e2e-test in output
    • Validate: No environment variable warnings

Phase 2: Test in CI

  1. Create test PR

    • Branch: fix/docker-compose-image-reference
    • Include: Both file changes from Phase 1
  2. Monitor workflow execution

    • Watch: e2e-tests.yml workflow
    • Check: "Start test environment" step succeeds
    • Verify: Container starts and health check passes
  3. Validate container

    • Check: docker ps shows charon-playwright running
    • Test: Health endpoint responds at http://localhost:8080/api/v1/health
    • Confirm: Playwright tests execute successfully

Phase 3: Documentation Update

  1. Update workflow documentation

    • File: .github/workflows/e2e-tests.yml
    • Section: Top-level comments (lines 1-29)
    • Add: Note about using local tag vs. digest
  2. Update compose file documentation

    • File: .docker/compose/docker-compose.playwright-ci.yml
    • Section: Usage section (lines 11-16)
    • Clarify: Environment variable expectations

Verification Checklist

Pre-Deployment Validation

  • Syntax Check: Run docker compose config with test environment variables
  • Variable Resolution: Confirm image: field resolves to charon:e2e-test
  • Local Test: Load image locally and run compose up
  • Workflow Dry-run: Test changes in a draft PR before merging

CI Validation Points

  • Build Job: Completes successfully, uploads image artifact
  • Download: Image artifact downloads correctly
  • Load: docker load succeeds, image appears in docker images
  • Compose Up: Container starts without pull errors
  • Health Check: Container becomes healthy within timeout
  • Test Execution: Playwright tests run and report results

Post-Deployment Monitoring

  • Success Rate: Monitor e2e-tests.yml success rate for 10 runs
  • Startup Time: Verify container startup time remains under 30s
  • Resource Usage: Check for memory/CPU regressions
  • Flake Rate: Ensure no new test flakiness introduced

Risk Assessment

Low Risk Changes

Workflow environment variable change (isolated to CI) Compose file comment updates (documentation only)

Medium Risk Changes

⚠️ Compose file image: field modification

  • Mitigation: Test locally before pushing
  • Rollback: Revert single line in compose file

No Risk

Read-only investigation and analysis Documentation improvements


Rollback Plan

If Option 1 Fails

Symptoms:

  • Container still fails to start
  • Error: "No such image: charon:e2e-test"

Rollback:

git revert <commit-hash>  # Revert the workflow change

Alternative Fix: Switch to Option 2 (re-tagging approach)

If Option 2 Fails

Symptoms:

  • Re-tag logic fails
  • Digest extraction errors

Rollback:

  1. Remove re-tagging step
  2. Fall back to simple tag reference: CHARON_E2E_IMAGE=charon:e2e-test

Success Metrics

Immediate Success Indicators

  • docker compose up starts container without errors
  • Container health check passes within 30 seconds
  • Playwright tests execute (pass or fail is separate concern)

Long-term Success Indicators

  • E2E workflow success rate returns to baseline (>95%)
  • No image reference errors in CI logs for 2 weeks
  • Local development workflow unaffected

Why Was Digest Being Used?

Comment from compose file (line 33):

# CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>

Hypothesis: The original intent was to support digest-pinned references for security/reproducibility, but the implementation was incomplete:

  1. The workflow sets only the digest hash, not the full reference
  2. The compose file expects the full reference format
  3. No re-tagging step bridges the gap

Why Does playwright.yml Work?

Key difference (lines 213-277):

  • Uses docker run directly with explicit image reference
  • Constructs full ghcr.io/... reference from variables
  • Does not rely on environment variable substitution in compose file

Lesson: Direct Docker commands give more control than Compose environment variable interpolation.


Dependencies

Required Secrets

  • CHARON_EMERGENCY_TOKEN (already configured)
  • CHARON_CI_ENCRYPTION_KEY (generated in workflow)

Required Tools

  • Docker Compose (available in GitHub Actions)
  • Docker CLI (available in GitHub Actions)

No External Dependencies

  • No registry authentication needed (local image)
  • No network calls required (image pre-loaded)

Timeline

Phase Duration Blocking
Analysis & Planning Complete
Implementation 30 minutes
Testing (PR) 10-15 minutes (CI runtime)
Verification 2 hours (10 workflow runs)
Documentation 15 minutes

Estimated Total: 3-4 hours from start to complete verification


Next Actions

  1. Immediate: Implement Option 1 changes (2 file modifications)
  2. Test: Create PR and monitor e2e-tests.yml workflow
  3. Verify: Check container startup and health check success
  4. Document: Update this plan with results
  5. Close: Mark as complete once verified in main branch

Appendix: Full File Changes

File 1: .github/workflows/e2e-tests.yml

Line 158: Change environment variable

  e2e-tests:
    name: E2E Tests (Shard ${{ matrix.shard }}/${{ matrix.total-shards }})
    runs-on: ubuntu-latest
    needs: build
    timeout-minutes: 30
    env:
      # Required for security teardown (emergency reset fallback when ACL blocks API)
      CHARON_EMERGENCY_TOKEN: ${{ secrets.CHARON_EMERGENCY_TOKEN }}
      # Enable security-focused endpoints and test gating
      CHARON_EMERGENCY_SERVER_ENABLED: "true"
      CHARON_SECURITY_TESTS_ENABLED: "true"
-     CHARON_E2E_IMAGE_DIGEST: ${{ needs.build.outputs.image_digest }}
+     # Use local tag for pre-built image (loaded from artifact)
+     CHARON_E2E_IMAGE: charon:e2e-test

File 2: .docker/compose/docker-compose.playwright-ci.yml

Lines 31-37: Simplify image reference

  charon-app:
-   # CI default (digest-pinned via workflow output):
-   # CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>
-   # Local override (tag-based):
+   # CI default: Uses pre-built image loaded from artifact
+   # Set via workflow: CHARON_E2E_IMAGE=charon:e2e-test
+   # Local development: Uses locally built image
+   # Override with: CHARON_E2E_IMAGE=charon:local-dev
-   image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}
+   image: ${CHARON_E2E_IMAGE:-charon:e2e-test}

End of Remediation Plan