Files
Charon/docs/plans/archive/docker_socket_trace.md
akanealw eec8c28fb3
Some checks are pending
Go Benchmark / Performance Regression Check (push) Waiting to run
Cerberus Integration / Cerberus Security Stack Integration (push) Waiting to run
Upload Coverage to Codecov / Backend Codecov Upload (push) Waiting to run
Upload Coverage to Codecov / Frontend Codecov Upload (push) Waiting to run
CodeQL - Analyze / CodeQL analysis (go) (push) Waiting to run
CodeQL - Analyze / CodeQL analysis (javascript-typescript) (push) Waiting to run
CrowdSec Integration / CrowdSec Bouncer Integration (push) Waiting to run
Docker Build, Publish & Test / build-and-push (push) Waiting to run
Docker Build, Publish & Test / Security Scan PR Image (push) Blocked by required conditions
Quality Checks / Auth Route Protection Contract (push) Waiting to run
Quality Checks / Codecov Trigger/Comment Parity Guard (push) Waiting to run
Quality Checks / Backend (Go) (push) Waiting to run
Quality Checks / Frontend (React) (push) Waiting to run
Rate Limit integration / Rate Limiting Integration (push) Waiting to run
Security Scan (PR) / Trivy Binary Scan (push) Waiting to run
Supply Chain Verification (PR) / Verify Supply Chain (push) Waiting to run
WAF integration / Coraza WAF Integration (push) Waiting to run
changed perms
2026-04-22 18:19:14 +00:00

12 KiB
Executable File

Docker Socket Trace Analysis

Date: 2025-12-22 Issue: Creating a new proxy host using the local docker socket fails with 503 (previously 500) Status: Root cause identified


Executive Summary

ROOT CAUSE: The container runs as non-root user charon (uid=1000, gid=1000), but the Docker socket mounted into the container is owned by root:docker (gid=988 on host). The charon user is not a member of the docker group, so socket access is denied with Permission denied.

The 503 is correct behavior - it accurately reflects that Docker is unavailable due to permission restrictions. The error handling code change from 500 to 503 was an improvement, not a bug.


1. Full Workflow Trace

Frontend Layer

A. ProxyHostForm Component

  • File: frontend/src/components/ProxyHostForm.tsx

  • State: connectionSource - defaults to 'custom', can be 'local' or a remote server UUID

  • Hook invocation (line ~146):

    const { containers: dockerContainers, isLoading: dockerLoading, error: dockerError } = useDocker(
      connectionSource === 'local' ? 'local' : undefined,
      connectionSource !== 'local' && connectionSource !== 'custom' ? connectionSource : undefined
    )
    
  • Error display (line ~361):

    {dockerError && connectionSource !== 'custom' && (
      <p className="text-xs text-red-400 mt-1">
        Failed to connect: {(dockerError as Error).message}
      </p>
    )}
    

B. useDocker Hook

  • File: frontend/src/hooks/useDocker.ts

  • Function: useDocker(host?: string | null, serverId?: string | null)

  • Query configuration:

    useQuery({
      queryKey: ['docker-containers', host, serverId],
      queryFn: () => dockerApi.listContainers(host || undefined, serverId || undefined),
      enabled: Boolean(host) || Boolean(serverId),
      retry: 1,
    })
    
  • When connectionSource === 'local', calls dockerApi.listContainers('local', undefined)

C. Docker API Client

  • File: frontend/src/api/docker.ts
  • Function: dockerApi.listContainers(host?: string, serverId?: string)
  • Request: GET /api/v1/docker/containers?host=local
  • Response type: DockerContainer[]

Backend Layer

D. Routes Registration

  • File: backend/internal/api/routes/routes.go

  • Registration (lines 199-204):

    dockerService, err := services.NewDockerService()
    if err == nil { // Only register if Docker is available
        dockerHandler := handlers.NewDockerHandler(dockerService, remoteServerService)
        dockerHandler.RegisterRoutes(protected)
    } else {
        logger.Log().WithError(err).Warn("Docker service unavailable")
    }
    
  • CRITICAL: Docker routes only register if NewDockerService() succeeds (client construction, not socket access)

  • Route: GET /api/v1/docker/containers (protected, requires auth)

E. Docker Handler

  • File: backend/internal/api/handlers/docker_handler.go

  • Function: ListContainers(c *gin.Context)

  • Input validation (SSRF hardening):

    host := strings.TrimSpace(c.Query("host"))
    serverID := strings.TrimSpace(c.Query("server_id"))
    
    // SSRF hardening: only allow "local" or empty
    if host != "" && host != "local" {
        c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid docker host selector"})
        return
    }
    
  • Service call: h.dockerService.ListContainers(c.Request.Context(), host)

  • Error handling (lines 60-69):

    if err != nil {
        var unavailableErr *services.DockerUnavailableError
        if errors.As(err, &unavailableErr) {
            c.JSON(http.StatusServiceUnavailable, gin.H{"error": "Docker daemon unavailable"})  // 503
            return
        }
        c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to list containers"})  // 500
        return
    }
    

F. Docker Service

  • File: backend/internal/services/docker_service.go

  • Constructor: NewDockerService()

    cli, err := client.NewClientWithOpts(client.FromEnv, client.WithAPIVersionNegotiation())
    
    • Uses client.FromEnv which reads DOCKER_HOST env var (defaults to unix:///var/run/docker.sock)
    • Does NOT verify socket access - only constructs client object
  • Function: ListContainers(ctx context.Context, host string)

    if host == "" || host == "local" {
        cli = s.client  // Use default local client
    }
    containers, err := cli.ContainerList(ctx, container.ListOptions{All: false})
    if err != nil {
        if isDockerConnectivityError(err) {
            return nil, &DockerUnavailableError{err: err}  // Triggers 503
        }
        return nil, fmt.Errorf("failed to list containers: %w", err)  // Triggers 500
    }
    
  • Error detection: isDockerConnectivityError(err) (lines 104-152)

    • Checks for: "cannot connect to docker daemon", "is the docker daemon running", timeout errors
    • Checks syscall errors: ENOENT, EACCES, EPERM, ECONNREFUSED
    • Matches syscall.EACCES (permission denied) → returns DockerUnavailableError503

2. Request/Response Shapes

Frontend → Backend Request

GET /api/v1/docker/containers?host=local
Authorization: Bearer <jwt_token>

Backend → Frontend Response (Success - 200)

[
  {
    "id": "abc123def456",
    "names": ["my-container"],
    "image": "nginx:latest",
    "state": "running",
    "status": "Up 2 hours",
    "network": "bridge",
    "ip": "172.17.0.2",
    "ports": [{"private_port": 80, "public_port": 8080, "type": "tcp"}]
  }
]

Backend → Frontend Response (Error - 503)

{
  "error": "Docker daemon unavailable"
}

3. Error Conditions Triggering 503

The 503 Service Unavailable is returned when isDockerConnectivityError() returns true:

Condition Check in Code Matches Our Case
Socket missing syscall.ENOENT or os.ErrNotExist No
Permission denied syscall.EACCES or syscall.EPERM YES
Connection refused syscall.ECONNREFUSED No
Timeout net.Error.Timeout() or context.DeadlineExceeded No
Daemon not running String contains "cannot connect" / "daemon running" No

4. Docker Configuration Analysis

Dockerfile

  • File: Dockerfile

  • User creation (lines 154-156):

    RUN addgroup -g 1000 charon && \
        adduser -D -u 1000 -G charon -h /app -s /sbin/nologin charon
    
  • Runtime user (line 286):

    USER charon
    
  • Result: Container runs as uid=1000, gid=1000 (charon:charon)

Docker Compose Files

All compose files mount the socket identically:

volumes:
  - /var/run/docker.sock:/var/run/docker.sock:ro
File Mount Present
.docker/compose/docker-compose.yml
.docker/compose/docker-compose.local.yml
.docker/compose/docker-compose.dev.yml
docker-compose.test.yml

Runtime Verification (from live container)

# Socket exists inside container
$ ls -la /var/run/docker.sock
srw-rw----    1 root     988     0 Dec 12 22:40 /var/run/docker.sock

# Container user identity
$ id
uid=1000(charon) gid=1000(charon) groups=1000(charon)

# Direct socket access test
$ curl --unix-socket /var/run/docker.sock http://localhost/containers/json
# Returns: exit code 7 (connection refused due to permission denied)

# Explicit permission check
$ cat /var/run/docker.sock
cat: can't open '/var/run/docker.sock': Permission denied

Host System

$ getent group 988
docker:x:988:

$ stat -c '%U:%G' /var/run/docker.sock
root:docker

5. Root Cause Analysis

The Permission Gap

Component Value
Socket owner root:docker (gid=988)
Socket permissions srw-rw---- (660)
Container user charon (uid=1000, gid=1000)
Container groups Only charon (1000)
Docker group in container Does not exist

The charon user cannot access the socket because:

  1. Not owner (not root)
  2. Not in the socket's group (gid=988 doesn't exist in container, and charon isn't in it)
  3. No "other" permissions on socket

Why This Happens

The Docker socket's group ID (988 on this host) is a host-specific value. Different systems assign different GIDs to the docker group:

  • Debian/Ubuntu: often 999 or 998
  • Alpine: often 101 (from docker package)
  • RHEL/CentOS: varies
  • This host: 988

The container has no knowledge of the host's group mappings. When the socket is mounted, it retains the host's numeric GID, but the container has no group with that GID.


6. Why 503 (Not 500) Is Correct

The error mapping change that returned 503 instead of 500 was correct and intentional:

  • 500 Internal Server Error: Indicates a bug or unexpected failure in the application
  • 503 Service Unavailable: Indicates the requested service is temporarily unavailable due to external factors

Docker being inaccessible due to socket permissions is an environmental/configuration issue, not an application bug. The 503 correctly signals:

  1. The API endpoint is working
  2. The underlying Docker service is unavailable
  3. The issue is likely external (deployment configuration)

7. Solutions

Remove USER charon from Dockerfile. Breaks security best practices (CIS Docker Benchmark 4.1).

Option B: Add Docker Group to Container at Build Time

# Problem: GID varies by host system
RUN addgroup -g 988 docker && adduser charon docker

Issue: Assumes host Docker GID is 988; breaks on other systems.

Modify entrypoint to detect and add the socket's group:

# In docker-entrypoint.sh, before starting the app:
if [ -S /var/run/docker.sock ]; then
    DOCKER_GID=$(stat -c '%g' /var/run/docker.sock)
    if ! getent group "$DOCKER_GID" >/dev/null 2>&1; then
        # Create a group with the socket's GID
        addgroup -g "$DOCKER_GID" docker 2>/dev/null || true
    fi
    # Add charon user to the docker group
    adduser charon docker 2>/dev/null || true
fi

Issue: Requires container to start as root, then drop privileges.

Option D: Use DOCKER_HOST Environment Variable

Allow users to specify an alternative Docker endpoint (TCP, SSH, or different socket path):

environment:
  - DOCKER_HOST=tcp://host.docker.internal:2375

Issue: Requires exposing Docker API over network (security implications).

Option E: Document User Requirement (Workaround)

Add documentation requiring users to either:

  1. Run the container with --user root (not recommended)
  2. Change socket permissions on host: chmod 666 /var/run/docker.sock (security risk)
  3. Accept that Docker integration is unavailable when running as non-root

8. Recommendations

Immediate (No Code Change)

  1. Update documentation to explain the permission requirement
  2. Add health check for Docker availability in the UI (show "Docker integration unavailable" gracefully)

Short Term

  1. Add startup warning log when Docker socket is inaccessible:

    // In routes.go or docker_service.go
    if _, err := cli.Ping(ctx); err != nil {
        logger.Log().Warn("Docker socket inaccessible - container discovery disabled")
    }
    

Medium Term

  1. Implement Option C with proper privilege dropping
  2. Add environment variable CHARON_DOCKER_ENABLED=false to explicitly disable Docker integration

Long Term

  1. Consider podman socket compatibility
  2. Consider Docker SDK over TCP as alternative

9. Conclusion

The 503 error is working as designed. The Docker socket permission model fundamentally conflicts with running containers as non-root users unless explicit configuration is done at deployment time.

The fix is not in the code, but in deployment configuration or documentation.