Files
Charon/docs/plans/current_spec.md
GitHub Actions 60c3336725 COMMIT_MESSAGE_START
fix(docker): update GeoLite2-Country.mmdb checksum + automation

Fixes critical Docker build failure caused by upstream GeoLite2 database
update without corresponding Dockerfile checksum update.

**Root Cause:**
- GeoLite2-Country.mmdb file updated upstream
- Dockerfile still referenced old SHA256 checksum
- Build aborted at checksum verification (line 352)
- Cascade "blob not found" errors for all COPY commands

**Changes:**
- Update Dockerfile ARG GEOLITE2_COUNTRY_SHA256 to current value
- Add automated weekly checksum update workflow (.github/workflows/update-geolite2.yml)
- Implement error handling: retry logic, format validation, failure notifications
- Document rollback decision matrix with 10 failure scenarios
- Create comprehensive maintenance guide (docs/maintenance/geolite2-checksum-update.md)
- Update CHANGELOG.md and README.md with maintenance references

**Verification:**
- Checksum verified against current upstream file: 436135ee...
- Pre-commit hooks: PASSED (EOF/whitespace auto-fixed)
- Trivy security scan: PASSED (no critical/high issues)
- Dockerfile syntax: VALID
- GitHub Actions YAML: VALID
- No hardcoded secrets or injection vulnerabilities

**Automation Features:**
- Weekly scheduled checks (Monday 2 AM UTC)
- Auto-PR creation when checksum changes
- GitHub issue creation on workflow failure
- Comprehensive error handling and retry logic

**Impact:**
- Unblocks all CI/CD Docker image builds
- Enables publishing to GHCR/Docker Hub
- Prevents future checksum failures via automation
- Zero application code changes (no regression risk)

**Documentation:**
- Implementation plan: docs/plans/geolite2_checksum_fix_spec.md
- QA report: docs/reports/qa_geolite2_checksum_fix.md
- Maintenance guide: docs/maintenance/geolite2-checksum-update.md

**Supervisor Recommendations Implemented:**
- #1: Checksum freshness verification before update
- #3: Rollback decision criteria (10 scenarios)
- #4: Automated workflow error handling

Resolves: https://github.com/Wikid82/Charon/actions/runs/21584236523/job/62188372617
COMMIT_MESSAGE_END
2026-02-02 13:31:56 +00:00

20 KiB

Docker Build Failure Fix - Comprehensive Implementation Plan

Date: February 2, 2026 Status: 🔴 CRITICAL - BLOCKING CI/CD Priority: P0 - Immediate Action Required Build URL: https://github.com/Wikid82/Charon/actions/runs/21584236523/job/62188372617


Executive Summary

The GitHub Actions Docker build workflow is failing due to a GeoLite2-Country.mmdb checksum mismatch, causing cascade failures in multi-stage Docker builds.

Root Cause: The upstream GeoLite2 database file was updated, but the Dockerfile still references the old SHA256 checksum.

Impact:

  • All CI/CD Docker builds failing since database update
  • Cannot publish new images to GHCR/Docker Hub
  • Blocks all releases and deployments

Solution: Update one line in Dockerfile (line 352) with correct checksum.

Estimated Time to Fix: 5 minutes Testing Time: 15 minutes (local + CI verification)


Critical Issue Analysis

Issue #1: GeoLite2-Country.mmdb Checksum Mismatch (ROOT CAUSE)

Location: /projects/Charon/Dockerfile - Line 352

Current Value (WRONG):

ARG GEOLITE2_COUNTRY_SHA256=6b778471c086c44d15bd4df954661d441a5513ec48f1af5545cb05af8f2e15b9

Correct Value (VERIFIED):

ARG GEOLITE2_COUNTRY_SHA256=436135ee98a521da715a6d483951f3dbbd62557637f2d50d1987fc048874bd5d

Verification Method:

curl -fsSL "https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-Country.mmdb" -o /tmp/test.mmdb
sha256sum /tmp/test.mmdb
# Output: 436135ee98a521da715a6d483951f3dbbd62557637f2d50d1987fc048874bd5d

Error Message:

sha256sum: /app/data/geoip/GeoLite2-Country.mmdb: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
The command '/bin/sh -c mkdir -p /app/data/geoip && curl -fSL ...' returned a non-zero code: 1

Issue #2: Blob Not Found Errors (CASCADE FAILURE)

Error Examples:

COPY configs/crowdsec/acquis.yaml /etc/crowdsec.dist/acquis.yaml: blob not found
COPY --from=backend-builder /app/backend/charon /app/charon: blob not found
COPY --from=frontend-builder /app/frontend/dist /app/frontend/dist: blob not found

Analysis: These are NOT missing files. All files exist in the repository:

✅ configs/crowdsec/acquis.yaml
✅ configs/crowdsec/install_hub_items.sh
✅ configs/crowdsec/register_bouncer.sh
✅ frontend/package.json
✅ frontend/package-lock.json
✅ .docker/docker-entrypoint.sh
✅ scripts/db-recovery.sh

Root Cause: The GeoLite2 checksum failure causes the Docker build to abort during the final runtime stage (line 352-356). When the build aborts, the multi-stage build artifacts from earlier stages (backend-builder, frontend-builder, caddy-builder, crowdsec-builder) are not persisted to the builder cache. Subsequent COPY commands trying to reference these non-existent artifacts fail with "blob not found".

This is a cascade failure from Issue #1 - fixing the checksum will resolve all blob errors.


Implementation Plan

PHASE 1: Fix Checksum (5 minutes)

Step 1.1: Update Dockerfile

File: /projects/Charon/Dockerfile Line: 352

Exact Change:

cd /projects/Charon
sed -i 's/ARG GEOLITE2_COUNTRY_SHA256=6b778471c086c44d15bd4df954661d441a5513ec48f1af5545cb05af8f2e15b9/ARG GEOLITE2_COUNTRY_SHA256=436135ee98a521da715a6d483951f3dbbd62557637f2d50d1987fc048874bd5d/' Dockerfile

Verification:

grep "GEOLITE2_COUNTRY_SHA256" Dockerfile
# Expected: ARG GEOLITE2_COUNTRY_SHA256=436135ee98a521da715a6d483951f3dbbd62557637f2d50d1987fc048874bd5d

Step 1.2: Commit Change

git add Dockerfile
git commit -m "fix(docker): update GeoLite2-Country.mmdb checksum

The upstream GeoLite2 database file was updated, requiring a checksum update.

Old: 6b778471c086c44d15bd4df954661d441a5513ec48f1af5545cb05af8f2e15b9
New: 436135ee98a521da715a6d483951f3dbbd62557637f2d50d1987fc048874bd5d

Fixes: #<issue-number>
Resolves: Blob not found errors (cascade failure from checksum mismatch)"

PHASE 2: Local Testing (15 minutes)

Step 2.1: Clean Build Environment

# Remove all build cache
docker builder prune -af

# Remove previous test images
docker images | grep charon | awk '{print $3}' | xargs -r docker rmi -f

Step 2.2: Build for amd64 (Same as CI)

cd /projects/Charon

docker buildx build \
  --platform linux/amd64 \
  --no-cache \
  --pull \
  --progress=plain \
  --build-arg VERSION=test-fix \
  --build-arg BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ") \
  --build-arg VCS_REF=$(git rev-parse HEAD) \
  -t charon:test-amd64 \
  . 2>&1 | tee /tmp/docker-build-test.log

Expected Success Indicators:

✅ Step X: RUN echo "${GEOLITE2_COUNTRY_SHA256}  /app/data/geoip/GeoLite2-Country.mmdb" | sha256sum -c -
   /app/data/geoip/GeoLite2-Country.mmdb: OK
✅ Step Y: COPY --from=gosu-builder /gosu-out/gosu /usr/sbin/gosu
✅ Step Z: COPY --from=frontend-builder /app/frontend/dist /app/frontend/dist
✅ Step AA: COPY --from=backend-builder /app/backend/charon /app/charon
✅ Step AB: COPY --from=caddy-builder /usr/bin/caddy /usr/bin/caddy
✅ Step AC: COPY --from=crowdsec-builder /crowdsec-out/crowdsec /usr/local/bin/crowdsec
✅ Successfully tagged charon:test-amd64

If Build Fails:

# Check for errors
grep -A 5 "ERROR\|FAILED\|blob not found" /tmp/docker-build-test.log

# Verify checksum in Dockerfile
grep "GEOLITE2_COUNTRY_SHA256" Dockerfile

# Re-download and verify checksum
curl -fsSL "https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-Country.mmdb" \
  -o /tmp/verify.mmdb
sha256sum /tmp/verify.mmdb

Step 2.3: Runtime Verification

# Start container
docker run -d \
  --name charon-test \
  -p 8080:8080 \
  charon:test-amd64

# Wait for startup (30 seconds)
sleep 30

# Check health
docker ps --filter "name=charon-test"
# Expected: Status includes "(healthy)"

# Test API
curl -sf http://localhost:8080/api/v1/health | jq .
# Expected: {"status":"ok","version":"test-fix",...}

# Check for errors in logs
docker logs charon-test 2>&1 | grep -i "error\|failed\|fatal"
# Expected: No critical errors

# Cleanup
docker stop charon-test && docker rm charon-test

PHASE 3: Push and Monitor CI (30 minutes)

Step 3.1: Push to GitHub

git push origin <branch-name>

Step 3.2: Monitor Workflow

  1. Navigate to Actions: https://github.com/Wikid82/Charon/actions

  2. Watch "Docker Build, Publish & Test" workflow:

    • Should trigger automatically on push
    • Monitor build progress
  3. Expected Stages:

    ✅ Build and push (linux/amd64, linux/arm64)
    ✅ Verify Caddy Security Patches
    ✅ Verify CrowdSec Security Patches
    ✅ Run Trivy scan
    ✅ Generate SBOM
    ✅ Attest SBOM
    ✅ Sign image (Cosign)
    ✅ Test image (integration-test.sh)
    

Step 3.3: Verify Published Images

# Pull from GHCR
docker pull ghcr.io/wikid82/charon:<tag>

# Verify image works
docker run --rm ghcr.io/wikid82/charon:<tag> /app/charon --version
# Expected: Output shows version info

Step 3.4: Check Security Scans


Success Criteria

Build Success Indicators

  • Local docker build completes without errors
  • No "sha256sum: FAILED" errors
  • No "blob not found" errors
  • All COPY commands execute successfully
  • Container starts and becomes healthy
  • API responds to /health endpoint
  • GitHub Actions workflow passes all stages
  • Multi-platform build succeeds (amd64 + arm64)

Deployment Success Indicators

  • Image published to GHCR: ghcr.io/wikid82/charon:<tag>
  • Image signed with Sigstore/Cosign
  • SBOM attached and attestation created
  • Trivy scan shows no critical regressions
  • Integration tests pass (integration-test.sh)

Rollback Plan

If the fix introduces new issues:

Step 1: Revert Commit

git revert <commit-sha>
git push origin <branch-name>

Step 2: Emergency Image Rollback (if needed)

# Retag previous working image as latest
docker pull ghcr.io/wikid82/charon:sha-<previous-working-commit>
docker tag ghcr.io/wikid82/charon:sha-<previous-working-commit> \
           ghcr.io/wikid82/charon:latest
docker push ghcr.io/wikid82/charon:latest

Step 3: Communicate Status

  • Update issue with rollback details
  • Document root cause of new failure
  • Create follow-up issue if needed

Rollback Decision Matrix

Use this matrix to determine whether to rollback or proceed with remediation:

Scenario Impact Decision Action Timeline
Checksum update breaks local build 🔴 Critical ROLLBACK immediately Revert commit, investigate upstream changes < 5 minutes
Local build passes, CI build fails 🟡 High INVESTIGATE first Check CI environment differences, then decide 15-30 minutes
Build passes, container fails healthcheck 🔴 Critical ROLLBACK immediately Revert commit, test with previous checksum < 10 minutes
Build passes, security scan fails 🟠 Medium REMEDIATE if < 2 hours Fix security issues if quick, else rollback < 2 hours
New checksum breaks runtime GeoIP lookups 🔴 Critical ROLLBACK immediately Revert commit, verify database integrity < 5 minutes
Automated PR fails syntax validation 🟢 Low REMEDIATE in PR Fix workflow and retry, no production impact < 1 hour
Upstream source unavailable (404) 🟡 High BLOCK deployment Document issue, find alternative source N/A
Checksum mismatch on re-download 🔴 Critical BLOCK deployment Investigate cache poisoning, verify source N/A
Multi-platform build succeeds (amd64), fails (arm64) 🟡 High CONDITIONAL: Proceed for amd64, investigate arm64 Deploy amd64, fix arm64 separately < 1 hour
Integration tests pass, E2E tests fail 🟠 Medium INVESTIGATE first Isolate test failure cause, rollback if service-breaking 30-60 minutes

Decision Criteria:

  • ROLLBACK immediately if:

    • Production deployments are affected
    • Core functionality breaks (API, routing, healthchecks)
    • Security posture degrades
    • No clear remediation path within 30 minutes
  • INVESTIGATE first if:

    • Only test/CI environments affected
    • Failure is non-deterministic
    • Clear path to remediation exists
    • Can be fixed within 2 hours
  • BLOCK deployment if:

    • Upstream integrity cannot be verified
    • Security validation fails
    • Checksum verification fails on any attempt

Escalation Triggers:

  • Cannot rollback within 15 minutes
  • Rollback itself fails
  • Production outage extends beyond 30 minutes
  • Security incident detected (cache poisoning, supply chain attack)
  • Multiple rollback attempts required

Future Maintenance

Preventing Future Checksum Failures

Option A: Automated Checksum Updates (Recommended)

Create a GitHub Actions workflow to detect and update GeoLite2 checksums automatically:

File: .github/workflows/update-geolite2.yml

name: Update GeoLite2 Checksum

on:
  schedule:
    - cron: '0 2 * * 1'  # Weekly on Mondays at 2 AM UTC
  workflow_dispatch:

jobs:
  update-checksum:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Download and calculate checksum
        id: checksum
        run: |
          CURRENT=$(curl -fsSL "https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-Country.mmdb" | sha256sum | cut -d' ' -f1)
          OLD=$(grep "ARG GEOLITE2_COUNTRY_SHA256=" Dockerfile | cut -d'=' -f2)
          echo "current=$CURRENT" >> $GITHUB_OUTPUT
          echo "old=$OLD" >> $GITHUB_OUTPUT

      - name: Update Dockerfile
        if: steps.checksum.outputs.current != steps.checksum.outputs.old
        run: |
          sed -i "s/ARG GEOLITE2_COUNTRY_SHA256=.*/ARG GEOLITE2_COUNTRY_SHA256=${{ steps.checksum.outputs.current }}/" Dockerfile

      - name: Create Pull Request
        if: steps.checksum.outputs.current != steps.checksum.outputs.old
        uses: peter-evans/create-pull-request@v5
        with:
          title: "chore(docker): update GeoLite2-Country.mmdb checksum"
          body: |
            Automated checksum update for GeoLite2-Country.mmdb

            - Old: `${{ steps.checksum.outputs.old }}`
            - New: `${{ steps.checksum.outputs.current }}`

            **Changes:**
            - Updated `Dockerfile` line 352

            **Testing:**
            - [ ] Local build passes
            - [ ] CI build passes
            - [ ] Container starts successfully
          branch: bot/update-geolite2-checksum
          delete-branch: true

Option B: Manual Update Documentation

Create documentation for manual checksum updates:

File: /projects/Charon/docs/maintenance/geolite2-checksum-update.md

# GeoLite2 Database Checksum Update Guide

## When to Update

Update the checksum when Docker build fails with:

sha256sum: /app/data/geoip/GeoLite2-Country.mmdb: FAILED


## Quick Fix (5 minutes)

1. Download and calculate new checksum:
   ```bash
   curl -fsSL "https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-Country.mmdb" -o /tmp/test.mmdb
   sha256sum /tmp/test.mmdb
  1. Update Dockerfile (line 352):

    ARG GEOLITE2_COUNTRY_SHA256=<new-checksum-from-step-1>
    
  2. Test locally:

    docker build --no-cache -t test .
    
  3. Commit and push:

    git add Dockerfile
    git commit -m "fix(docker): update GeoLite2-Country.mmdb checksum"
    git push
    

Verification Script

Use this script to verify before updating:

#!/bin/bash
# verify-geolite2-checksum.sh

EXPECTED=$(grep "ARG GEOLITE2_COUNTRY_SHA256=" Dockerfile | cut -d'=' -f2)
ACTUAL=$(curl -fsSL "https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-Country.mmdb" | sha256sum | cut -d' ' -f1)

echo "Expected: $EXPECTED"
echo "Actual:   $ACTUAL"

if [ "$EXPECTED" = "$ACTUAL" ]; then
  echo "✅ Checksum matches"
  exit 0
else
  echo "❌ Checksum mismatch - update required"
  echo "Run: sed -i 's/ARG GEOLITE2_COUNTRY_SHA256=.*/ARG GEOLITE2_COUNTRY_SHA256=$ACTUAL/' Dockerfile"
  exit 1
fi

**Recommended Approach:** Implement Option A (automated updates) to prevent future failures.

---

## Related Files

### Modified Files
- `/projects/Charon/Dockerfile` (line 352)

### Reference Files
- `.dockerignore` - Build context exclusions (no changes needed)
- `.gitignore` - Version control exclusions (no changes needed)
- `.github/workflows/docker-build.yml` - CI/CD workflow (no changes needed)

### Documentation
- `docs/maintenance/geolite2-checksum-update.md` (to be created)
- `.github/workflows/update-geolite2.yml` (optional automation)

---

##Appendix A: Multi-Stage Build Structure

### Build Stages (Dependency Graph)

  1. xx (tonistiigi/xx) ─────────────────────────────┐ ├──> 2. gosu-builder ──> final ├──> 3. backend-builder ──> final ├──> 5. crowdsec-builder ──> final └──> (cross-compile helpers)

  2. frontend-builder (standalone) ──────────────────────> final

  3. caddy-builder (standalone) ─────────────────────────> final

  4. crowdsec-fallback (not used in normal flow)

  5. final (debian:trixie-slim) ◄─── Copies from all stages above

    • Downloads GeoLite2 (FAILS HERE if checksum wrong)
    • Copies binaries from builder stages
    • Sets up runtime environment

### COPY Commands in Final Stage

**Line 349:** `COPY --from=gosu-builder /gosu-out/gosu /usr/sbin/gosu`
**Line 359:** `COPY --from=caddy-builder /usr/bin/caddy /usr/bin/caddy`
**Line 366-368:** `COPY --from=crowdsec-builder ...`
**Line 393-395:** `COPY configs/crowdsec/* ...`
**Line 401:** `COPY --from=backend-builder /app/backend/charon /app/charon`
**Line 404:** `COPY --from=backend-builder /go/bin/dlv /usr/local/bin/dlv`
**Line 408:** `COPY --from=frontend-builder /app/frontend/dist /app/frontend/dist`
**Line 411:** `COPY .docker/docker-entrypoint.sh /docker-entrypoint.sh`
**Line 414:** `COPY scripts/ /app/scripts/`

**All of these fail with "blob not found" if GeoLite2 download fails**, because Docker aborts the build before persisting build stage outputs.

---

## Appendix B: Verification Commands

### Pre-Fix Verification
```bash
# Verify current checksum is wrong
grep "GEOLITE2_COUNTRY_SHA256" Dockerfile
# Should show: 6b778471c086c44d15bd4df954661d441a5513ec48f1af5545cb05af8f2e15b9

# Download and check actual checksum
curl -fsSL "https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-Country.mmdb" | sha256sum
# Should show: 436135ee98a521da715a6d483951f3dbbd62557637f2d50d1987fc048874bd5d

Post-Fix Verification

# Verify Dockerfile was updated
grep "GEOLITE2_COUNTRY_SHA256" Dockerfile
# Should show: 436135ee98a521da715a6d483951f3dbbd62557637f2d50d1987fc048874bd5d

# Test build
docker build --no-cache --pull -t test .

# Verify container
docker run --rm test /app/charon --version

CI Verification

# Check latest workflow run
gh run list --workflow=docker-build.yml --limit=1

# View workflow logs
gh run view <run-id> --log

# Check for success indicators
gh run view <run-id> --log | grep "✅"

Appendix C: Troubleshooting

Issue: Build Still Fails After Checksum Update

Symptoms:

  • Upload checksum is correct in Dockerfile
  • Build still fails with sha256sum error
  • Error message shows different checksum

Possible Causes:

  1. Browser cached old file: Clear Docker build cache

    docker builder prune -af
    
  2. Git cached old file: Verify committed change

    git show HEAD:Dockerfile | grep "GEOLITE2_COUNTRY_SHA256"
    
  3. Upstream file changed again: Re-download and recalculate

    curl -fsSL "https://github.com/P3TERX/GeoLite.mmdb/raw/download/GeoLite2-Country.mmdb" | sha256sum
    

Issue: Blob Not Found Persists

Symptoms:

  • GeoLite2 checksum passes
  • Blob not found errors still occur
  • Specific COPY command fails

Debug Steps:

  1. Check specific stage build:

    # Test specific stage
    docker build --target backend-builder -t test-backend .
    docker build --target frontend-builder -t test-frontend .
    
  2. Check file existence in context:

    # List build context files
    docker build --dry-run -t test . 2>&1 | grep "COPY\|ADD"
    
  3. Verify .dockerignore:

    # Check if required files are excluded
    grep -E "(configs|scripts|frontend)" .dockerignore
    

Issue: Container Fails Healthcheck

Symptoms:

  • Build succeeds
  • Container starts but never becomes healthy
  • Healthcheck fails repeatedly

Debug Steps:

# Check container logs
docker logs <container-name>

# Check healthcheck status
docker inspect <container-name> | jq '.[0].State.Health'

# Manual healthcheck
docker exec <container-name> curl -f http://localhost:8080/api/v1/health

Conclusion

This is a straightforward fix requiring a single-line change in the Dockerfile. The "blob not found" errors are a cascade failure and will be resolved automatically once the GeoLite2 checksum is corrected.

Immediate Action Required:

  1. Update Dockerfile line 352 with correct checksum
  2. Test build locally
  3. Commit and push
  4. Monitor CI/CD pipeline

Estimated Total Time: 20 minutes (5 min fix + 15 min testing)


Plan Status: Ready for Implementation Confidence Level: 100% - Root cause identified with exact fix Risk Assessment: Low - Single line change, well-tested pattern