Docker for AI Workloads: Isolation & GPU Access
AI workloads need proper Docker isolation — untrusted code execution, GPU access for inference, and resource limits. Here's the production configuration I use.
AI workloads have unusual Docker requirements: some need GPU access for inference, others need strict isolation for running untrusted LLM-generated code, and all need careful resource limits to prevent runaway processes. Here's how I configure Docker for each pattern.
Pattern 1: Isolated code execution (no GPU)
QuantumSketch runs LLM-generated Manim code in Docker. The code is untrusted — it could contain import subprocess or file system operations that shouldn't run on the host.
Security requirements:
- No network access (prevent data exfiltration, no outbound calls)
- Read-only filesystem (prevent host modification)
- Limited resources (prevent DoS from infinite loops)
- No privilege escalation
docker run --rm \
--network none \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=512m \
--memory 2g \
--memory-swap 2g \ # same as memory = no swap
--cpus 2.0 \
--pids-limit 128 \
--cap-drop ALL \
--security-opt no-new-privileges \
-v /host/output:/output \ # write-only output directory
my-sandbox-image:latest \
python /workspace/scene.py
Flag breakdown:
--network none: completely disables networking--read-only: root filesystem is read-only--tmpfs /tmp: writable tmpfs in /tmp only,noexecprevents executing binaries from it--memory-swap = --memory: disables swap (prevents swap exhaustion on host)--cap-drop ALL: removes all Linux capabilities--security-opt no-new-privileges: prevents setuid escalation
Pattern 2: GPU inference container
For GPU-accelerated inference (vLLM, Whisper, Stable Diffusion):
docker run --rm \
--gpus all \ # pass all GPUs
--ipc=host \ # shared memory for PyTorch (required)
--ulimit memlock=-1 \ # required for CUDA pinned memory
--shm-size=8g \ # shared memory for GPU operations
-e CUDA_VISIBLE_DEVICES=0 \ # limit to specific GPU
-p 8000:8000 \
vllm/vllm-openai:latest \
--model meta-llama/Meta-Llama-3.1-8B-Instruct
Key flags for GPU containers:
--gpus all: exposes GPU via NVIDIA Container Toolkit--ipc=host: PyTorch multiprocessing requires shared memory between processes; IPC namespace isolation breaks this--ulimit memlock=-1: CUDA pinned memory requires no lock limit--shm-size=8g: shared memory for GPU tensor operations
Never run GPU inference containers with --network none — they need to download model weights on first run. Use network-enabled for GPU, network-disabled for untrusted code.
Pattern 3: Multi-stage build for lean images
Production AI service images should be small — faster pulls, faster cold starts:
# Multi-stage: build stage
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --prefix=/install --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.11-slim
# Don't run as root
RUN useradd -m -u 1000 appuser
WORKDIR /app
COPY --from=builder /install /usr/local
COPY --chown=appuser:appuser . .
USER appuser
# Explicit CMD, not shell form
CMD ["/usr/local/bin/python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0"]
Key practices:
- Multi-stage: build dependencies in one stage, copy only what's needed to production
- Non-root user:
USER appuser— never run production as root CMDin exec form (array), not shell form — proper signal handling for graceful shutdown--no-cache-diron pip: reduces image size
Pattern 4: Docker Compose for local development
# docker-compose.yml for QuantumSketch local dev
version: "3.9"
services:
api:
build: ./services/quantumsketch-api
ports:
- "8080:8080"
environment:
DATABASE_URL: postgres://dev:dev@postgres:5432/qs
REDIS_URL: redis://redis:6379
TEMPORAL_HOST: temporal:7233
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
worker:
build: ./services/quantumsketch-worker
environment:
TEMPORAL_HOST: temporal:7233
depends_on:
- temporal
postgres:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: qs
POSTGRES_USER: dev
POSTGRES_PASSWORD: dev
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "dev"]
interval: 5s
timeout: 5s
retries: 5
redis:
image: redis:7-alpine
temporal:
image: temporalio/auto-setup:latest
environment:
DB: postgresql
DB_PORT: 5432
POSTGRES_USER: dev
POSTGRES_PWD: dev
POSTGRES_SEEDS: postgres
depends_on:
postgres:
condition: service_healthy
volumes:
pgdata:
pgvector/pgvector:pg16 — official PostgreSQL image with pgvector pre-installed. No manual extension setup.
Resource limits for production (ECS Fargate)
ECS Fargate enforces CPU and memory hard limits:
{
"containerDefinitions": [{
"name": "manim-worker",
"image": "123456789.dkr.ecr.ap-south-1.amazonaws.com/manim-worker:latest",
"cpu": 2048,
"memory": 4096,
"memoryReservation": 2048,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/manim-worker",
"awslogs-region": "ap-south-1",
"awslogs-stream-prefix": "ecs"
}
},
"ulimits": [
{"name": "nofile", "softLimit": 65536, "hardLimit": 65536}
]
}]
}
memoryReservation < memory: the container is guaranteed memoryReservation (soft limit) but can burst to memory (hard limit). This allows overcommitting when other containers aren't using their full allocation.
FAQ
How do you isolate LLM-generated code in Docker?
Use --network none, --read-only, --cap-drop ALL, --security-opt no-new-privileges, and --pids-limit. Together, these prevent network access, filesystem writes, privilege escalation, and process proliferation.
What is --ipc=host and when do I need it?
--ipc=host shares the host's IPC namespace with the container, enabling shared memory between processes. Required for PyTorch multi-GPU inference and some multiprocessing workloads. Do not use for untrusted code containers — it reduces isolation.
Why does --gpus all require NVIDIA Container Toolkit?
Docker doesn't have native GPU passthrough. NVIDIA Container Toolkit (nvidia-container-runtime) intercepts --gpus flags and sets up device bindings between the GPU driver on the host and the container. Install it on the host before using --gpus.
How do I use pgvector in Docker for local development?
Use the official pgvector/pgvector:pg16 image — it's PostgreSQL 16 with pgvector pre-installed. Just docker run pgvector/pgvector:pg16 and run CREATE EXTENSION vector;.
Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: Building QuantumSketch: AI + Manim for STEM Video · Microservices as One Engineer.