Debugging Slow Docker Containers: CPU Throttling, I/O Bottlenecks, and Misconfigured Limits
Slow containers are rarely “just Docker being slow.” In most cases, performance problems come from one (or several) of these categories:
- CPU throttling (CFS quota/period, cpuset pinning, noisy neighbors, host CPU saturation)
- I/O bottlenecks (storage driver behavior, overlay filesystem overhead, slow/contended disks, sync-heavy workloads)
- Misconfigured limits (memory, swap, ulimits, pids limit, kernel settings, cgroup version mismatch)
This tutorial walks through a practical, command-driven workflow to identify the bottleneck and fix it. It assumes Docker Engine on Linux (most of the tooling and cgroup paths below are Linux-specific).
1) Establish a Baseline: “Is it the host or the container?”
Before diving into container internals, confirm the host isn’t already saturated.
Host-level quick checks
# CPU usage and load
uptime
top -o %CPU
# Per-core utilization and run queue
mpstat -P ALL 1
# Memory pressure and swapping
free -h
vmstat 1
# Disk I/O saturation and latency
iostat -xz 1
# If you have it: per-process I/O
sudo iotop -oPa
Interpretation tips:
- High load average with low CPU utilization can indicate I/O wait or blocked tasks.
iostat -xz:%utilnear 100% indicates the disk is saturated.- High
await(latency) suggests contention or slow storage.
vmstat:- Non-zero
si/so(swap in/out) indicates swapping, which can devastate container performance.
- Non-zero
If the host looks healthy but the container is slow, proceed.
2) Identify the Slow Container and Its Limits
List containers and observe live stats:
docker ps
docker stats
docker stats shows CPU%, memory usage, network, and block I/O. It’s a good starting point, but it can hide why CPU is low (e.g., throttling) or why I/O is high (overlay overhead vs physical disk).
Inspect container configuration:
CID=<container_id_or_name>
docker inspect "$CID" --format '{{json .HostConfig}}' | jq
Look for:
NanoCpus,CpuQuota,CpuPeriod,CpuShares,CpusetCpusMemory,MemorySwap,OomKillDisableBlkioWeight,BlkioDeviceReadBps,BlkioDeviceWriteBpsPidsLimitUlimits
Also check what Docker thinks the container is doing:
docker inspect "$CID" --format 'Name={{.Name}} Image={{.Config.Image}} Cmd={{json .Config.Cmd}}'
3) Determine Your Cgroup Version (Important for Paths and Metrics)
Many debugging steps depend on whether your system uses cgroup v1 or cgroup v2.
stat -fc %T /sys/fs/cgroup
cgroup2fs→ cgroup v2- anything else (often
tmpfs) → likely cgroup v1
You can also check:
mount | grep cgroup
Docker supports both, but metric file locations differ.
4) CPU Throttling: Detect, Measure, Fix
4.1 Understand CPU limits in Docker
Docker can limit CPU in several ways:
- CFS quota/period (hard cap)
--cpus 1.0sets quota to allow ~1 CPU worth of time.- Internally:
cpu.cfs_quota_usandcpu.cfs_period_us(cgroup v1)
- CPU shares (relative weight, not a hard cap)
--cpu-sharesaffects scheduling under contention.
- cpuset (pin to specific cores)
--cpuset-cpus="0,2"restricts which cores can be used.
A container can show low CPU usage because it’s:
- blocked on I/O,
- waiting on locks,
- or being throttled by CFS quota.
4.2 Check CPU throttling from inside the container (or from host)
First, get the container’s main PID:
PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
echo "$PID"
cgroup v2 throttling metrics
On cgroup v2, CPU stats are often in:
/sys/fs/cgroup/<scope>/cpu.stat
Docker’s exact cgroup path varies by distro/systemd. A robust approach is to find the cgroup path for the PID:
cat /proc/$PID/cgroup
You may see something like:
- v2:
0::/system.slice/docker-<id>.scope - v1: multiple controllers listed separately
For cgroup v2:
CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)
cat /sys/fs/cgroup$CGPATH/cpu.stat
Example output:
usage_usec 123456789
user_usec 100000000
system_usec 23456789
nr_periods 1200
nr_throttled 800
throttled_usec 987654321
Key fields:
nr_throttled: number of periods where throttling occurredthrottled_usec: total time throttled
If nr_throttled grows rapidly and throttled_usec increases steadily during slowness, you have CPU throttling.
cgroup v1 throttling metrics
For cgroup v1, locate the cpu cgroup path:
cat /proc/$PID/cgroup | grep cpu
Then read:
# Example path; yours will differ
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.stat
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.cfs_quota_us
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.cfs_period_us
cpu.stat often includes:
nr_periods 1200
nr_throttled 800
throttled_time 987654321000
(throttled_time is usually in nanoseconds.)
4.3 Verify the configured CPU limits
Check what Docker set:
docker inspect "$CID" --format \
'NanoCpus={{.HostConfig.NanoCpus}} CpuQuota={{.HostConfig.CpuQuota}} CpuPeriod={{.HostConfig.CpuPeriod}} CpuShares={{.HostConfig.CpuShares}} CpusetCpus={{.HostConfig.CpusetCpus}}'
Common gotchas:
--cpus=0.5might be too low for bursty workloads (GC, JIT, compaction).--cpuset-cpuspinned to a busy core can look like “random slowness.”- CPU shares don’t help if you have a hard quota set.
4.4 Fix CPU throttling
Option A: Increase CPU quota
Run a new container with more CPU:
docker run --cpus=2.0 yourimage
Or update an existing container:
docker update --cpus=2.0 "$CID"
Option B: Remove quota (no hard cap)
If you previously set quota, remove it by setting --cpus to 0 is not valid; instead set quota to -1 via update flags:
docker update --cpu-quota=-1 "$CID"
(You may also need to ensure --cpu-period is default.)
Option C: Adjust cpuset pinning
If pinned to a congested core:
docker update --cpuset-cpus="0-3" "$CID"
Option D: Diagnose application-level CPU stalls
If there’s no throttling but CPU is low, the app may be blocked. Use perf (host) against the container process:
sudo perf top -p "$PID"
Or capture a short profile:
sudo perf record -F 99 -p "$PID" -g -- sleep 15
sudo perf report
This helps distinguish “CPU-bound and slow” from “not getting CPU.”
5) I/O Bottlenecks: Storage Driver, Overlay Overhead, Disk Saturation
I/O issues are extremely common and often misdiagnosed as “CPU is low so it must be fine.” A container can be slow with low CPU because it’s waiting on disk.
5.1 Start with container-visible symptoms
Check block I/O in docker stats:
docker stats "$CID"
If block I/O grows quickly during slowness, suspect disk.
Inside the container, you can also check if processes are stuck in I/O wait. From the host:
ps -o pid,stat,wchan,comm -p "$PID"
Dstate indicates uninterruptible sleep (often I/O).wchanmay show kernel wait function names.
5.2 Identify the storage driver and filesystem
docker info | grep -E 'Storage Driver|Backing Filesystem|Supports d_type|Native Overlay Diff'
Common drivers:
overlay2(most common; generally good)devicemapper(older; can be slow when misconfigured)btrfs,zfs(feature-rich, different performance characteristics)
Also check where Docker stores data:
docker info | grep "Docker Root Dir"
df -hT /var/lib/docker
If /var/lib/docker is on a slow disk (or nearly full), performance suffers.
5.3 Overlay filesystem overhead and “small write” workloads
overlay2 merges layers. Heavy write workloads into the container’s writable layer can be slower than writing to a mounted volume because:
- copy-on-write behavior can trigger extra metadata operations
- lots of small fsyncs can amplify latency
Rule of thumb: If your workload writes frequently (databases, queues, build caches), prefer volumes or bind mounts.
Check mounts:
docker inspect "$CID" --format '{{json .Mounts}}' | jq
If your database is writing to /var/lib/... inside the container filesystem rather than a volume, consider moving it.
5.4 Measure disk latency and saturation on the host
Use iostat:
iostat -xz 1
Look at the device backing Docker’s root dir (e.g., nvme0n1, sda).
If %util is high and await is high, the disk is saturated or slow.
Use pidstat to see per-process I/O:
sudo pidstat -d 1 -p "$PID"
If the container spawns multiple processes, you may want the whole cgroup rather than one PID, but this still helps.
5.5 Find which files are hot
If you have lsof:
sudo lsof -p "$PID" | head
For deeper I/O tracing, use strace carefully (it adds overhead):
sudo strace -ff -p "$PID" -e trace=openat,read,write,fdatasync,fsync -ttT
If you see frequent fsync() calls taking milliseconds to seconds, storage latency is hurting you.
5.6 Docker-specific I/O limits (blkio)
Docker can throttle I/O via blkio settings. Check:
docker inspect "$CID" --format \
'BlkioWeight={{.HostConfig.BlkioWeight}} ReadBps={{json .HostConfig.BlkioDeviceReadBps}} WriteBps={{json .HostConfig.BlkioDeviceWriteBps}} ReadIOps={{json .HostConfig.BlkioDeviceReadIOps}} WriteIOps={{json .HostConfig.BlkioDeviceWriteIOps}}'
If limits are set too low, the container will be artificially slow.
Update to remove or raise limits (example):
docker update --blkio-weight 500 "$CID"
Or remove device limits by re-creating the container without them.
5.7 Fix common I/O bottlenecks
Use volumes for write-heavy paths
Example: PostgreSQL data directory:
docker run -d --name pg \
-v pgdata:/var/lib/postgresql/data \
postgres:16
Ensure enough free space and healthy filesystem
df -h
sudo dmesg -T | tail -n 200
Kernel logs showing I/O errors, resets, or filesystem warnings are red flags.
Consider storage options
- Put
/var/lib/dockeron SSD/NVMe. - If running on cloud, verify the volume type and provisioned IOPS.
- Avoid running many write-heavy containers on a single slow disk.
6) Memory Pressure and Misconfigured Limits (Often Masquerading as CPU/I/O)
A container can be “slow” because it’s constantly reclaiming memory, swapping, or being OOM-killed and restarted.
6.1 Check container memory limits
docker inspect "$CID" --format \
'Memory={{.HostConfig.Memory}} MemorySwap={{.HostConfig.MemorySwap}} MemoryReservation={{.HostConfig.MemoryReservation}} OomKillDisable={{.HostConfig.OomKillDisable}}'
Values are in bytes. 0 often means “no explicit limit.”
6.2 Detect OOM kills and memory reclaim
Check container events:
docker events --since 1h | grep -i oom
Check kernel logs:
sudo dmesg -T | grep -i -E 'oom|killed process' | tail -n 50
If you see OOM kills, the container may restart or the app may degrade.
6.3 cgroup memory stats (v2)
Again, find cgroup path:
PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)
Read memory metrics:
cat /sys/fs/cgroup$CGPATH/memory.current
cat /sys/fs/cgroup$CGPATH/memory.max
cat /sys/fs/cgroup$CGPATH/memory.stat | head -n 50
Useful fields in memory.stat:
anon,filepgfault,pgmajfault(major faults imply disk I/O)workingset_refault(can indicate cache thrash)
Check pressure stall information (PSI), which is extremely helpful:
cat /sys/fs/cgroup$CGPATH/cpu.pressure
cat /sys/fs/cgroup$CGPATH/memory.pressure
cat /sys/fs/cgroup$CGPATH/io.pressure
If memory.pressure shows high some/full stall time, the container is spending time waiting on memory reclaim.
6.4 Swap behavior
On many systems, containers share host swap behavior unless configured. If the host is swapping, containers slow down.
Host swap check:
swapon --show
vmstat 1
If swap is active and si/so are non-zero during slowness, consider:
- adding RAM,
- reducing memory limits contention,
- tuning
vm.swappiness, - or moving workloads.
6.5 Fix memory misconfiguration
Increase container memory limit
docker update --memory 2g --memory-swap 2g "$CID"
Notes:
- Setting
--memory-swapequal to--memoryeffectively disables swap for that container (behavior depends on kernel/cgroup version). - If you set
--memorytoo low, the app may GC/compact constantly, appearing “CPU slow.”
Set a reservation (soft limit) to reduce contention
docker update --memory-reservation 1g "$CID"
7) PIDs Limit, ulimits, and “It’s Slow Because It Can’t Spawn”
Sometimes “slowness” is actually the app failing to create threads/processes or open files, leading to timeouts and retries.
7.1 Check pids limit
docker inspect "$CID" --format 'PidsLimit={{.HostConfig.PidsLimit}}'
If it’s low (e.g., 100) and your runtime needs many threads (JVM, Node, Python gunicorn), you can hit the limit.
Update:
docker update --pids-limit 1000 "$CID"
7.2 Check ulimits (nofile, nproc)
docker inspect "$CID" --format '{{json .HostConfig.Ulimits}}' | jq
Inside container:
docker exec "$CID" sh -lc 'ulimit -a'
If nofile is too low, network servers can degrade under load.
Run with higher ulimit:
docker run --ulimit nofile=1048576:1048576 yourimage
8) Network Isn’t the Focus, But Don’t Ignore It
A container can be slow because it’s waiting on remote services. Quick checks:
docker exec "$CID" sh -lc 'getent hosts example.com'
docker exec "$CID" sh -lc 'time wget -qO- https://example.com >/dev/null'
On the host, look for retransmits:
ss -s
netstat -s | grep -i retrans
If network is the issue, CPU/I/O tuning won’t help.
9) A Practical Step-by-Step Workflow (Repeatable)
Use this sequence when you’re on-call and need answers quickly.
Step 1: Confirm symptoms and scope
- Is one container slow or many?
- Is the host slow too?
Commands:
docker stats
uptime
iostat -xz 1
vmstat 1
Step 2: Check if the container is throttled
CID=<id>
PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
cat /proc/$PID/cgroup
- cgroup v2:
CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)
cat /sys/fs/cgroup$CGPATH/cpu.stat
If throttling is high, raise/remove CPU quota:
docker update --cpus 2.0 "$CID"
# or
docker update --cpu-quota=-1 "$CID"
Step 3: If not throttled, check I/O pressure and disk saturation
- PSI (v2):
cat /sys/fs/cgroup$CGPATH/io.pressure
- Host disk:
iostat -xz 1
- Per-process I/O:
sudo pidstat -d 1 -p "$PID"
If I/O is the bottleneck:
- move write-heavy paths to volumes,
- remove blkio limits,
- improve underlying storage.
Step 4: Check memory pressure/OOM
docker inspect "$CID" --format 'Memory={{.HostConfig.Memory}} MemorySwap={{.HostConfig.MemorySwap}}'
sudo dmesg -T | grep -i oom | tail
If memory pressure is high:
- increase memory limit,
- reduce co-located workloads,
- avoid host swapping.
Step 5: Check pids/ulimits
docker inspect "$CID" --format 'PidsLimit={{.HostConfig.PidsLimit}}'
docker exec "$CID" sh -lc 'ulimit -a'
10) Common Misconfigurations and Their “Slow” Signatures
Misconfig: CPU quota too low
Signature:
cpu.statshows increasing throttling- Response times spike under load even though CPU% looks capped
Fix:
- Increase
--cpusor remove quota.
Misconfig: Writing to container layer instead of a volume
Signature:
- High block I/O
- Many fsyncs, slow metadata operations
- Performance degrades as writable layer grows
Fix:
- Use
-vvolumes/bind mounts for write-heavy directories.
Misconfig: Memory limit too low for workload
Signature:
- Frequent GC/compaction (language-dependent)
- High major page faults
- OOM kills or near-OOM reclaim stalls (PSI)
Fix:
- Increase memory, tune app memory usage, reduce co-tenancy.
Misconfig: PIDs limit too low
Signature:
- Timeouts, inability to spawn workers/threads
- Logs show “resource temporarily unavailable” or fork failures
Fix:
- Raise
--pids-limit.
Misconfig: nofile too low
Signature:
- Connection failures under concurrency
- “Too many open files” errors, degraded throughput
Fix:
- Raise
--ulimit nofile=....
11) Reproducing and Proving the Root Cause (So Fixes Stick)
Performance debugging goes better when you can prove the bottleneck with a metric that changes when you apply a fix.
Examples of “proof” metrics:
-
CPU throttling:
- Before:
nr_throttledclimbs fast - After raising CPU:
nr_throttledbarely increases; latency improves
- Before:
-
I/O bottleneck:
- Before:
iostat awaithigh,%utilhigh,io.pressurehigh - After moving data to faster disk/volume: latency drops,
awaitdrops
- Before:
-
Memory pressure:
- Before:
memory.pressureshows high stall time; major faults increase - After: stall time decreases; throughput improves
- Before:
Keep a short capture before and after:
# Capture a 30s snapshot of key host metrics
iostat -xz 1 30 > /tmp/iostat.txt
vmstat 1 30 > /tmp/vmstat.txt
# Capture cgroup cpu throttling (v2)
for i in $(seq 1 30); do
date +%s
cat /sys/fs/cgroup$CGPATH/cpu.stat
sleep 1
done > /tmp/cpu_stat.txt
12) Appendix: Handy One-Liners
Show container limits quickly
docker inspect "$CID" --format \
'CPUs: Nano={{.HostConfig.NanoCpus}} Quota={{.HostConfig.CpuQuota}} Period={{.HostConfig.CpuPeriod}} Cpuset={{.HostConfig.CpusetCpus}}
Mem: Max={{.HostConfig.Memory}} Swap={{.HostConfig.MemorySwap}} Res={{.HostConfig.MemoryReservation}}
PIDs: {{.HostConfig.PidsLimit}}
Ulimits: {{json .HostConfig.Ulimits}}'
Find top CPU containers
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.BlockIO}}"
Attach a shell and check app-level behavior
docker exec -it "$CID" sh
# or bash if available
docker exec -it "$CID" bash
Closing Notes
Debugging slow Docker containers is mostly about observability and correct attribution:
- If CPU is low, don’t assume CPU is fine—check throttling and I/O wait.
- If disk I/O is high, determine whether it’s the underlying disk, overlay overhead, or blkio limits.
- If everything looks “normal,” check memory pressure, OOMs, pids/ulimits, and application-level blocking.
If you share (1) docker inspect HostConfig, (2) cpu.stat throttling metrics, and (3) iostat -xz output during slowness, you can usually pinpoint the cause quickly and choose the right fix instead of guessing.