12 KiB
Executable File
Docker Socket Trace Analysis
Date: 2025-12-22 Issue: Creating a new proxy host using the local docker socket fails with 503 (previously 500) Status: Root cause identified
Executive Summary
ROOT CAUSE: The container runs as non-root user charon (uid=1000, gid=1000), but the Docker socket mounted into the container is owned by root:docker (gid=988 on host). The charon user is not a member of the docker group, so socket access is denied with Permission denied.
The 503 is correct behavior - it accurately reflects that Docker is unavailable due to permission restrictions. The error handling code change from 500 to 503 was an improvement, not a bug.
1. Full Workflow Trace
Frontend Layer
A. ProxyHostForm Component
-
State:
connectionSource- defaults to'custom', can be'local'or a remote server UUID -
Hook invocation (line ~146):
const { containers: dockerContainers, isLoading: dockerLoading, error: dockerError } = useDocker( connectionSource === 'local' ? 'local' : undefined, connectionSource !== 'local' && connectionSource !== 'custom' ? connectionSource : undefined ) -
Error display (line ~361):
{dockerError && connectionSource !== 'custom' && ( <p className="text-xs text-red-400 mt-1"> Failed to connect: {(dockerError as Error).message} </p> )}
B. useDocker Hook
-
Function:
useDocker(host?: string | null, serverId?: string | null) -
Query configuration:
useQuery({ queryKey: ['docker-containers', host, serverId], queryFn: () => dockerApi.listContainers(host || undefined, serverId || undefined), enabled: Boolean(host) || Boolean(serverId), retry: 1, }) -
When
connectionSource === 'local', callsdockerApi.listContainers('local', undefined)
C. Docker API Client
- File: frontend/src/api/docker.ts
- Function:
dockerApi.listContainers(host?: string, serverId?: string) - Request:
GET /api/v1/docker/containers?host=local - Response type:
DockerContainer[]
Backend Layer
D. Routes Registration
-
Registration (lines 199-204):
dockerService, err := services.NewDockerService() if err == nil { // Only register if Docker is available dockerHandler := handlers.NewDockerHandler(dockerService, remoteServerService) dockerHandler.RegisterRoutes(protected) } else { logger.Log().WithError(err).Warn("Docker service unavailable") } -
CRITICAL: Docker routes only register if
NewDockerService()succeeds (client construction, not socket access) -
Route:
GET /api/v1/docker/containers(protected, requires auth)
E. Docker Handler
-
Function:
ListContainers(c *gin.Context) -
Input validation (SSRF hardening):
host := strings.TrimSpace(c.Query("host")) serverID := strings.TrimSpace(c.Query("server_id")) // SSRF hardening: only allow "local" or empty if host != "" && host != "local" { c.JSON(http.StatusBadRequest, gin.H{"error": "Invalid docker host selector"}) return } -
Service call:
h.dockerService.ListContainers(c.Request.Context(), host) -
Error handling (lines 60-69):
if err != nil { var unavailableErr *services.DockerUnavailableError if errors.As(err, &unavailableErr) { c.JSON(http.StatusServiceUnavailable, gin.H{"error": "Docker daemon unavailable"}) // 503 return } c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to list containers"}) // 500 return }
F. Docker Service
-
Constructor:
NewDockerService()cli, err := client.NewClientWithOpts(client.FromEnv, client.WithAPIVersionNegotiation())- Uses
client.FromEnvwhich readsDOCKER_HOSTenv var (defaults tounix:///var/run/docker.sock) - Does NOT verify socket access - only constructs client object
- Uses
-
Function:
ListContainers(ctx context.Context, host string)if host == "" || host == "local" { cli = s.client // Use default local client } containers, err := cli.ContainerList(ctx, container.ListOptions{All: false}) if err != nil { if isDockerConnectivityError(err) { return nil, &DockerUnavailableError{err: err} // Triggers 503 } return nil, fmt.Errorf("failed to list containers: %w", err) // Triggers 500 } -
Error detection:
isDockerConnectivityError(err)(lines 104-152)- Checks for: "cannot connect to docker daemon", "is the docker daemon running", timeout errors
- Checks syscall errors:
ENOENT,EACCES,EPERM,ECONNREFUSED - Matches
syscall.EACCES(permission denied) → returnsDockerUnavailableError→ 503
2. Request/Response Shapes
Frontend → Backend Request
GET /api/v1/docker/containers?host=local
Authorization: Bearer <jwt_token>
Backend → Frontend Response (Success - 200)
[
{
"id": "abc123def456",
"names": ["my-container"],
"image": "nginx:latest",
"state": "running",
"status": "Up 2 hours",
"network": "bridge",
"ip": "172.17.0.2",
"ports": [{"private_port": 80, "public_port": 8080, "type": "tcp"}]
}
]
Backend → Frontend Response (Error - 503)
{
"error": "Docker daemon unavailable"
}
3. Error Conditions Triggering 503
The 503 Service Unavailable is returned when isDockerConnectivityError() returns true:
| Condition | Check in Code | Matches Our Case |
|---|---|---|
| Socket missing | syscall.ENOENT or os.ErrNotExist |
No |
| Permission denied | syscall.EACCES or syscall.EPERM |
YES ✓ |
| Connection refused | syscall.ECONNREFUSED |
No |
| Timeout | net.Error.Timeout() or context.DeadlineExceeded |
No |
| Daemon not running | String contains "cannot connect" / "daemon running" | No |
4. Docker Configuration Analysis
Dockerfile
-
File: Dockerfile
-
User creation (lines 154-156):
RUN addgroup -g 1000 charon && \ adduser -D -u 1000 -G charon -h /app -s /sbin/nologin charon -
Runtime user (line 286):
USER charon -
Result: Container runs as
uid=1000, gid=1000(charon:charon)
Docker Compose Files
All compose files mount the socket identically:
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
| File | Mount Present |
|---|---|
| .docker/compose/docker-compose.yml | ✓ |
| .docker/compose/docker-compose.local.yml | ✓ |
| .docker/compose/docker-compose.dev.yml | ✓ |
| docker-compose.test.yml | ✓ |
Runtime Verification (from live container)
# Socket exists inside container
$ ls -la /var/run/docker.sock
srw-rw---- 1 root 988 0 Dec 12 22:40 /var/run/docker.sock
# Container user identity
$ id
uid=1000(charon) gid=1000(charon) groups=1000(charon)
# Direct socket access test
$ curl --unix-socket /var/run/docker.sock http://localhost/containers/json
# Returns: exit code 7 (connection refused due to permission denied)
# Explicit permission check
$ cat /var/run/docker.sock
cat: can't open '/var/run/docker.sock': Permission denied
Host System
$ getent group 988
docker:x:988:
$ stat -c '%U:%G' /var/run/docker.sock
root:docker
5. Root Cause Analysis
The Permission Gap
| Component | Value |
|---|---|
| Socket owner | root:docker (gid=988) |
| Socket permissions | srw-rw---- (660) |
| Container user | charon (uid=1000, gid=1000) |
| Container groups | Only charon (1000) |
| Docker group in container | Does not exist |
The charon user cannot access the socket because:
- Not owner (not root)
- Not in the socket's group (gid=988 doesn't exist in container, and charon isn't in it)
- No "other" permissions on socket
Why This Happens
The Docker socket's group ID (988 on this host) is a host-specific value. Different systems assign different GIDs to the docker group:
- Debian/Ubuntu: often 999 or 998
- Alpine: often 101 (from
dockerpackage) - RHEL/CentOS: varies
- This host: 988
The container has no knowledge of the host's group mappings. When the socket is mounted, it retains the host's numeric GID, but the container has no group with that GID.
6. Why 503 (Not 500) Is Correct
The error mapping change that returned 503 instead of 500 was correct and intentional:
- 500 Internal Server Error: Indicates a bug or unexpected failure in the application
- 503 Service Unavailable: Indicates the requested service is temporarily unavailable due to external factors
Docker being inaccessible due to socket permissions is an environmental/configuration issue, not an application bug. The 503 correctly signals:
- The API endpoint is working
- The underlying Docker service is unavailable
- The issue is likely external (deployment configuration)
7. Solutions
Option A: Run Container as Root (Not Recommended)
Remove USER charon from Dockerfile. Breaks security best practices (CIS Docker Benchmark 4.1).
Option B: Add Docker Group to Container at Build Time
# Problem: GID varies by host system
RUN addgroup -g 988 docker && adduser charon docker
Issue: Assumes host Docker GID is 988; breaks on other systems.
Option C: Dynamic Group Assignment at Runtime (Recommended)
Modify entrypoint to detect and add the socket's group:
# In docker-entrypoint.sh, before starting the app:
if [ -S /var/run/docker.sock ]; then
DOCKER_GID=$(stat -c '%g' /var/run/docker.sock)
if ! getent group "$DOCKER_GID" >/dev/null 2>&1; then
# Create a group with the socket's GID
addgroup -g "$DOCKER_GID" docker 2>/dev/null || true
fi
# Add charon user to the docker group
adduser charon docker 2>/dev/null || true
fi
Issue: Requires container to start as root, then drop privileges.
Option D: Use DOCKER_HOST Environment Variable
Allow users to specify an alternative Docker endpoint (TCP, SSH, or different socket path):
environment:
- DOCKER_HOST=tcp://host.docker.internal:2375
Issue: Requires exposing Docker API over network (security implications).
Option E: Document User Requirement (Workaround)
Add documentation requiring users to either:
- Run the container with
--user root(not recommended) - Change socket permissions on host:
chmod 666 /var/run/docker.sock(security risk) - Accept that Docker integration is unavailable when running as non-root
8. Recommendations
Immediate (No Code Change)
- Update documentation to explain the permission requirement
- Add health check for Docker availability in the UI (show "Docker integration unavailable" gracefully)
Short Term
-
Add startup warning log when Docker socket is inaccessible:
// In routes.go or docker_service.go if _, err := cli.Ping(ctx); err != nil { logger.Log().Warn("Docker socket inaccessible - container discovery disabled") }
Medium Term
- Implement Option C with proper privilege dropping
- Add environment variable
CHARON_DOCKER_ENABLED=falseto explicitly disable Docker integration
Long Term
- Consider podman socket compatibility
- Consider Docker SDK over TCP as alternative
9. Conclusion
The 503 error is working as designed. The Docker socket permission model fundamentally conflicts with running containers as non-root users unless explicit configuration is done at deployment time.
The fix is not in the code, but in deployment configuration or documentation.