fix(docker): update GeoLite2-Country.mmdb checksum + automation Fixes critical Docker build failure caused by upstream GeoLite2 database update without corresponding Dockerfile checksum update. **Root Cause:** - GeoLite2-Country.mmdb file updated upstream - Dockerfile still referenced old SHA256 checksum - Build aborted at checksum verification (line 352) - Cascade "blob not found" errors for all COPY commands **Changes:** - Update Dockerfile ARG GEOLITE2_COUNTRY_SHA256 to current value - Add automated weekly checksum update workflow (.github/workflows/update-geolite2.yml) - Implement error handling: retry logic, format validation, failure notifications - Document rollback decision matrix with 10 failure scenarios - Create comprehensive maintenance guide (docs/maintenance/geolite2-checksum-update.md) - Update CHANGELOG.md and README.md with maintenance references **Verification:** - Checksum verified against current upstream file: 436135ee... - Pre-commit hooks: PASSED (EOF/whitespace auto-fixed) - Trivy security scan: PASSED (no critical/high issues) - Dockerfile syntax: VALID - GitHub Actions YAML: VALID - No hardcoded secrets or injection vulnerabilities **Automation Features:** - Weekly scheduled checks (Monday 2 AM UTC) - Auto-PR creation when checksum changes - GitHub issue creation on workflow failure - Comprehensive error handling and retry logic **Impact:** - Unblocks all CI/CD Docker image builds - Enables publishing to GHCR/Docker Hub - Prevents future checksum failures via automation - Zero application code changes (no regression risk) **Documentation:** - Implementation plan: docs/plans/geolite2_checksum_fix_spec.md - QA report: docs/reports/qa_geolite2_checksum_fix.md - Maintenance guide: docs/maintenance/geolite2-checksum-update.md **Supervisor Recommendations Implemented:** - #1: Checksum freshness verification before update - #3: Rollback decision criteria (10 scenarios) - #4: Automated workflow error handling Resolves: https://github.com/Wikid82/Charon/actions/runs/21584236523/job/62188372617 COMMIT_MESSAGE_END
17 KiB
Docker Compose CI Failure Remediation Plan
Status: Active Created: 2026-01-30 Priority: CRITICAL (Blocking CI)
Executive Summary
The E2E test workflow (e2e-tests.yml) is failing when attempting to start containers via docker-compose.playwright-ci.yml. The root cause is an incorrect Docker image reference format in the compose file that attempts to use a bare SHA256 digest instead of a fully-qualified image reference with registry and repository.
Error Message:
charon-app Error pull access denied for sha256, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Root Cause: The compose file's image: directive evaluates to a bare SHA256 digest (e.g., sha256:057a9998...) instead of a properly formatted image reference like ghcr.io/wikid82/charon@sha256:057a9998....
Root Cause Analysis
Current Implementation (Broken)
File: .docker/compose/docker-compose.playwright-ci.yml
Lines: 29-37
charon-app:
# CI default (digest-pinned via workflow output):
# CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>
# Local override (tag-based):
# CHARON_E2E_IMAGE=charon:e2e-test
image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}
Workflow Environment Variable
File: .github/workflows/e2e-tests.yml
Line: 158
env:
CHARON_E2E_IMAGE_DIGEST: ${{ needs.build.outputs.image_digest }}
Problem: The needs.build.outputs.image_digest from the build job in e2e-tests.yml returns only the SHA256 digest (e.g., sha256:057a9998fa7a5b224a06ec8989c892d2ac8f9323530470965baaf5fcaab7557c), not a fully-qualified image reference.
Why Docker Fails
Docker Compose interprets the image: field as:
sha256:057a9998...← Bare digest, no registry/repository
Docker then tries to:
- Parse this as a repository name
- Look for a repository literally named "sha256"
- Fail with "pull access denied" because no such repository exists
Correct Reference Format
Docker requires one of these formats:
- Tag-based:
charon:e2e-test(local image) - Digest-pinned:
ghcr.io/wikid82/charon@sha256:057a9998...(registry + repo + digest)
Technical Investigation
How the Image is Built and Loaded
Workflow Flow (e2e-tests.yml):
-
Build Job (lines 90-148):
- Builds Docker image with tag
charon:e2e-test - Saves image to
charon-e2e-image.tarartifact - Outputs image digest from build step
- Builds Docker image with tag
-
E2E Test Job (lines 173-177):
- Downloads
charon-e2e-image.tarartifact - Loads image with:
docker load -i charon-e2e-image.tar - Loaded image has tag:
charon:e2e-test(from build step)
- Downloads
-
Start Container (line 219):
- Runs:
docker compose -f .docker/compose/docker-compose.playwright-ci.yml up -d - Compose file tries to use
$CHARON_E2E_IMAGE_DIGEST(bare SHA256) - Docker cannot find image because the digest doesn't match loaded tag
- Runs:
Mismatch Between Build and Reference
| Step | Image Reference | Status |
|---|---|---|
| Build | charon:e2e-test |
✅ Image tagged |
| Save/Load | charon:e2e-test |
✅ Tag preserved in tar |
| Compose | sha256:057a9998... |
❌ Wrong reference type |
The loaded image is available as charon:e2e-test, but the compose file is looking for sha256:...
Comparison with Working Workflow
playwright.yml (Working) vs e2e-tests.yml (Broken)
playwright.yml (lines 207-209):
- name: Load Docker image
run: |
docker load < charon-pr-image.tar
docker images | grep charon
Container Start (lines 213-277):
- name: Start Charon container
run: |
# Explicitly constructs image reference from variables
IMAGE_NAME=$(echo "${{ github.repository_owner }}/charon" | tr '[:upper:]' '[:lower:]')
IMAGE_REF="ghcr.io/${IMAGE_NAME}:pr-${{ steps.pr-info.outputs.pr_number }}"
docker run -d \
--name charon-test \
-e CHARON_ENV="${CHARON_ENV}" \
# ... (uses constructed IMAGE_REF)
Key Difference: playwright.yml uses docker run directly with explicit image reference construction, not Docker Compose with environment variable substitution.
Solution Architecture
Option 1: Use Local Tag Reference (Recommended)
Rationale: The loaded image is already tagged as charon:e2e-test. We should use this tag directly instead of trying to use a digest.
Change: Set CHARON_E2E_IMAGE_DIGEST to the tag instead of the digest, or use a different variable name.
Option 2: Re-tag Image with Digest
Rationale: Re-tag the loaded image to match the digest-based reference expected by the compose file.
Change: After loading, re-tag the image with the full digest reference.
Option 3: Simplify Compose File
Rationale: Remove the digest-based environment variable and always use the local tag for CI.
Change: Hard-code charon:e2e-test or use a simpler env var pattern.
Recommended Solution: Option 1 (Modified Approach)
Strategy
Use the pre-built tag for CI, not the digest. The digest output from the build is metadata but not needed for referencing a locally loaded image.
Implementation
Change 1: Remove Digest from Workflow Environment
File: .github/workflows/e2e-tests.yml
Lines: 155-158
Current:
env:
# Required for security teardown (emergency reset fallback when ACL blocks API)
CHARON_EMERGENCY_TOKEN: ${{ secrets.CHARON_EMERGENCY_TOKEN }}
# Enable security-focused endpoints and test gating
CHARON_EMERGENCY_SERVER_ENABLED: "true"
CHARON_SECURITY_TESTS_ENABLED: "true"
CHARON_E2E_IMAGE_DIGEST: ${{ needs.build.outputs.image_digest }}
Corrected:
env:
# Required for security teardown (emergency reset fallback when ACL blocks API)
CHARON_EMERGENCY_TOKEN: ${{ secrets.CHARON_EMERGENCY_TOKEN }}
# Enable security-focused endpoints and test gating
CHARON_EMERGENCY_SERVER_ENABLED: "true"
CHARON_SECURITY_TESTS_ENABLED: "true"
# Use local tag for pre-built image (loaded from artifact)
CHARON_E2E_IMAGE: charon:e2e-test
Rationale:
- The
docker loadcommand restores the image with its original tagcharon:e2e-test - We should use this tag, not the digest
- The digest is only useful for verifying image integrity, not for referencing locally loaded images
Change 2: Update Compose File Comment Documentation
File: .docker/compose/docker-compose.playwright-ci.yml
Lines: 31-37
Current:
charon-app:
# CI default (digest-pinned via workflow output):
# CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>
# Local override (tag-based):
# CHARON_E2E_IMAGE=charon:e2e-test
image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}
Corrected:
charon-app:
# CI default: Uses pre-built image loaded from artifact
# Set via workflow: CHARON_E2E_IMAGE=charon:e2e-test
# Local development: Uses locally built image
# Override with: CHARON_E2E_IMAGE=charon:local-dev
image: ${CHARON_E2E_IMAGE:-charon:e2e-test}
Rationale:
- Simplify the environment variable fallback chain
- Remove confusing
CHARON_E2E_IMAGE_DIGESTvariable that was set incorrectly - Document the actual behavior: CI loads pre-built image with known tag
- Make local development override clearer
Alternative Solution: Option 2 (If Digest-Pinning Required)
If there's a requirement to use digest-based references for security/reproducibility, we must re-tag the loaded image.
Implementation
Change 1: Re-tag After Load
File: .github/workflows/e2e-tests.yml
After Line: 177 (in "Load Docker image" step)
Add:
- name: Load and re-tag Docker image
run: |
# Load the pre-built image
docker load -i charon-e2e-image.tar
docker images | grep charon
# Re-tag for digest-based reference if needed
IMAGE_DIGEST="${{ needs.build.outputs.image_digest }}"
if [[ -n "$IMAGE_DIGEST" ]]; then
# Extract just the digest hash (sha256:...)
DIGEST_HASH=$(echo "$IMAGE_DIGEST" | grep -oP 'sha256:[a-f0-9]{64}')
# Construct full reference
FULL_REF="ghcr.io/wikid82/charon@${DIGEST_HASH}"
echo "Re-tagging charon:e2e-test as $FULL_REF"
docker tag charon:e2e-test "$FULL_REF"
# Export for compose file
echo "CHARON_E2E_IMAGE_DIGEST=$FULL_REF" >> $GITHUB_ENV
else
# Fallback to tag-based reference
echo "CHARON_E2E_IMAGE=charon:e2e-test" >> $GITHUB_ENV
fi
Change 2: Update Compose File
File: .docker/compose/docker-compose.playwright-ci.yml
Lines: 31-37
Keep the current implementation but fix the comment:
charon-app:
# CI: Digest-pinned reference (re-tagged from loaded artifact)
# CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon@sha256:<digest>
# Local: Tag-based reference for development
# CHARON_E2E_IMAGE=charon:e2e-test
image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}
Rationale:
- Preserves digest-based pinning for supply chain security
- Re-tagging creates a local image reference that Docker can resolve
- Falls back gracefully to tag-based reference for local development
Recommended Approach: Option 1 (Simplicity)
Why Option 1:
- Simpler: No re-tagging logic needed
- Faster: Fewer Docker operations
- Sufficient: The image is already built and loaded; tag reference is adequate
- Consistent: Matches how
playwright.ymlhandles loaded images - Local-first: The image is local after
docker load, not in a registry
When to use Option 2:
- If there's a compliance requirement to use digest references
- If SBOM/attestation workflows need digest traceability
- If multi-registry scenarios require content-addressable references
Implementation Steps
Phase 1: Apply Recommended Fix (Option 1)
-
Update workflow environment variables
- File:
.github/workflows/e2e-tests.yml - Line: 158
- Change: Replace
CHARON_E2E_IMAGE_DIGESTwithCHARON_E2E_IMAGE: charon:e2e-test
- File:
-
Update compose file documentation
- File:
.docker/compose/docker-compose.playwright-ci.yml - Lines: 31-37
- Change: Simplify variable fallback and update comments
- File:
-
Verify changes
- Run:
docker compose -f .docker/compose/docker-compose.playwright-ci.yml config - Ensure:
image: charon:e2e-testin output - Validate: No environment variable warnings
- Run:
Phase 2: Test in CI
-
Create test PR
- Branch:
fix/docker-compose-image-reference - Include: Both file changes from Phase 1
- Branch:
-
Monitor workflow execution
- Watch:
e2e-tests.ymlworkflow - Check: "Start test environment" step succeeds
- Verify: Container starts and health check passes
- Watch:
-
Validate container
- Check:
docker psshowscharon-playwrightrunning - Test: Health endpoint responds at
http://localhost:8080/api/v1/health - Confirm: Playwright tests execute successfully
- Check:
Phase 3: Documentation Update
-
Update workflow documentation
- File:
.github/workflows/e2e-tests.yml - Section: Top-level comments (lines 1-29)
- Add: Note about using local tag vs. digest
- File:
-
Update compose file documentation
- File:
.docker/compose/docker-compose.playwright-ci.yml - Section: Usage section (lines 11-16)
- Clarify: Environment variable expectations
- File:
Verification Checklist
Pre-Deployment Validation
- Syntax Check: Run
docker compose configwith test environment variables - Variable Resolution: Confirm
image:field resolves tocharon:e2e-test - Local Test: Load image locally and run compose up
- Workflow Dry-run: Test changes in a draft PR before merging
CI Validation Points
- Build Job: Completes successfully, uploads image artifact
- Download: Image artifact downloads correctly
- Load:
docker loadsucceeds, image appears indocker images - Compose Up: Container starts without pull errors
- Health Check: Container becomes healthy within timeout
- Test Execution: Playwright tests run and report results
Post-Deployment Monitoring
- Success Rate: Monitor e2e-tests.yml success rate for 10 runs
- Startup Time: Verify container startup time remains under 30s
- Resource Usage: Check for memory/CPU regressions
- Flake Rate: Ensure no new test flakiness introduced
Risk Assessment
Low Risk Changes
✅ Workflow environment variable change (isolated to CI) ✅ Compose file comment updates (documentation only)
Medium Risk Changes
⚠️ Compose file image: field modification
- Mitigation: Test locally before pushing
- Rollback: Revert single line in compose file
No Risk
✅ Read-only investigation and analysis ✅ Documentation improvements
Rollback Plan
If Option 1 Fails
Symptoms:
- Container still fails to start
- Error: "No such image: charon:e2e-test"
Rollback:
git revert <commit-hash> # Revert the workflow change
Alternative Fix: Switch to Option 2 (re-tagging approach)
If Option 2 Fails
Symptoms:
- Re-tag logic fails
- Digest extraction errors
Rollback:
- Remove re-tagging step
- Fall back to simple tag reference:
CHARON_E2E_IMAGE=charon:e2e-test
Success Metrics
Immediate Success Indicators
- ✅
docker compose upstarts container without errors - ✅ Container health check passes within 30 seconds
- ✅ Playwright tests execute (pass or fail is separate concern)
Long-term Success Indicators
- ✅ E2E workflow success rate returns to baseline (>95%)
- ✅ No image reference errors in CI logs for 2 weeks
- ✅ Local development workflow unaffected
Related Issues and Context
Why Was Digest Being Used?
Comment from compose file (line 33):
# CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>
Hypothesis: The original intent was to support digest-pinned references for security/reproducibility, but the implementation was incomplete:
- The workflow sets only the digest hash, not the full reference
- The compose file expects the full reference format
- No re-tagging step bridges the gap
Why Does playwright.yml Work?
Key difference (lines 213-277):
- Uses
docker rundirectly with explicit image reference - Constructs full
ghcr.io/...reference from variables - Does not rely on environment variable substitution in compose file
Lesson: Direct Docker commands give more control than Compose environment variable interpolation.
Dependencies
Required Secrets
- ✅
CHARON_EMERGENCY_TOKEN(already configured) - ✅
CHARON_CI_ENCRYPTION_KEY(generated in workflow)
Required Tools
- ✅ Docker Compose (available in GitHub Actions)
- ✅ Docker CLI (available in GitHub Actions)
No External Dependencies
- ✅ No registry authentication needed (local image)
- ✅ No network calls required (image pre-loaded)
Timeline
| Phase | Duration | Blocking |
|---|---|---|
| Analysis & Planning | Complete | ✅ |
| Implementation | 30 minutes | ⏳ |
| Testing (PR) | 10-15 minutes (CI runtime) | ⏳ |
| Verification | 2 hours (10 workflow runs) | ⏳ |
| Documentation | 15 minutes | ⏳ |
Estimated Total: 3-4 hours from start to complete verification
Next Actions
- Immediate: Implement Option 1 changes (2 file modifications)
- Test: Create PR and monitor e2e-tests.yml workflow
- Verify: Check container startup and health check success
- Document: Update this plan with results
- Close: Mark as complete once verified in main branch
Appendix: Full File Changes
File 1: .github/workflows/e2e-tests.yml
Line 158: Change environment variable
e2e-tests:
name: E2E Tests (Shard ${{ matrix.shard }}/${{ matrix.total-shards }})
runs-on: ubuntu-latest
needs: build
timeout-minutes: 30
env:
# Required for security teardown (emergency reset fallback when ACL blocks API)
CHARON_EMERGENCY_TOKEN: ${{ secrets.CHARON_EMERGENCY_TOKEN }}
# Enable security-focused endpoints and test gating
CHARON_EMERGENCY_SERVER_ENABLED: "true"
CHARON_SECURITY_TESTS_ENABLED: "true"
- CHARON_E2E_IMAGE_DIGEST: ${{ needs.build.outputs.image_digest }}
+ # Use local tag for pre-built image (loaded from artifact)
+ CHARON_E2E_IMAGE: charon:e2e-test
File 2: .docker/compose/docker-compose.playwright-ci.yml
Lines 31-37: Simplify image reference
charon-app:
- # CI default (digest-pinned via workflow output):
- # CHARON_E2E_IMAGE_DIGEST=ghcr.io/wikid82/charon:nightly@sha256:<digest>
- # Local override (tag-based):
+ # CI default: Uses pre-built image loaded from artifact
+ # Set via workflow: CHARON_E2E_IMAGE=charon:e2e-test
+ # Local development: Uses locally built image
+ # Override with: CHARON_E2E_IMAGE=charon:local-dev
- image: ${CHARON_E2E_IMAGE_DIGEST:-${CHARON_E2E_IMAGE:-charon:e2e-test}}
+ image: ${CHARON_E2E_IMAGE:-charon:e2e-test}
End of Remediation Plan