← Back to Tutorials

Debugging Slow Docker Containers: CPU Throttling, I/O Bottlenecks, and Misconfigured Limits

dockercontainer-performancecpu-throttlingio-bottleneckscgroupslinux-performanceobservabilityresource-limitsdevopstroubleshooting

Debugging Slow Docker Containers: CPU Throttling, I/O Bottlenecks, and Misconfigured Limits

Slow containers are rarely “just Docker being slow.” In most cases, performance problems come from one (or several) of these categories:

This tutorial walks through a practical, command-driven workflow to identify the bottleneck and fix it. It assumes Docker Engine on Linux (most of the tooling and cgroup paths below are Linux-specific).


1) Establish a Baseline: “Is it the host or the container?”

Before diving into container internals, confirm the host isn’t already saturated.

Host-level quick checks

# CPU usage and load
uptime
top -o %CPU

# Per-core utilization and run queue
mpstat -P ALL 1

# Memory pressure and swapping
free -h
vmstat 1

# Disk I/O saturation and latency
iostat -xz 1

# If you have it: per-process I/O
sudo iotop -oPa

Interpretation tips:

If the host looks healthy but the container is slow, proceed.


2) Identify the Slow Container and Its Limits

List containers and observe live stats:

docker ps
docker stats

docker stats shows CPU%, memory usage, network, and block I/O. It’s a good starting point, but it can hide why CPU is low (e.g., throttling) or why I/O is high (overlay overhead vs physical disk).

Inspect container configuration:

CID=<container_id_or_name>
docker inspect "$CID" --format '{{json .HostConfig}}' | jq

Look for:

Also check what Docker thinks the container is doing:

docker inspect "$CID" --format 'Name={{.Name}} Image={{.Config.Image}} Cmd={{json .Config.Cmd}}'

3) Determine Your Cgroup Version (Important for Paths and Metrics)

Many debugging steps depend on whether your system uses cgroup v1 or cgroup v2.

stat -fc %T /sys/fs/cgroup

You can also check:

mount | grep cgroup

Docker supports both, but metric file locations differ.


4) CPU Throttling: Detect, Measure, Fix

4.1 Understand CPU limits in Docker

Docker can limit CPU in several ways:

  1. CFS quota/period (hard cap)
    • --cpus 1.0 sets quota to allow ~1 CPU worth of time.
    • Internally: cpu.cfs_quota_us and cpu.cfs_period_us (cgroup v1)
  2. CPU shares (relative weight, not a hard cap)
    • --cpu-shares affects scheduling under contention.
  3. cpuset (pin to specific cores)
    • --cpuset-cpus="0,2" restricts which cores can be used.

A container can show low CPU usage because it’s:

4.2 Check CPU throttling from inside the container (or from host)

First, get the container’s main PID:

PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
echo "$PID"

cgroup v2 throttling metrics

On cgroup v2, CPU stats are often in:

Docker’s exact cgroup path varies by distro/systemd. A robust approach is to find the cgroup path for the PID:

cat /proc/$PID/cgroup

You may see something like:

For cgroup v2:

CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)
cat /sys/fs/cgroup$CGPATH/cpu.stat

Example output:

usage_usec 123456789
user_usec 100000000
system_usec 23456789
nr_periods 1200
nr_throttled 800
throttled_usec 987654321

Key fields:

If nr_throttled grows rapidly and throttled_usec increases steadily during slowness, you have CPU throttling.

cgroup v1 throttling metrics

For cgroup v1, locate the cpu cgroup path:

cat /proc/$PID/cgroup | grep cpu

Then read:

# Example path; yours will differ
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.stat
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.cfs_quota_us
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.cfs_period_us

cpu.stat often includes:

nr_periods 1200
nr_throttled 800
throttled_time 987654321000

(throttled_time is usually in nanoseconds.)

4.3 Verify the configured CPU limits

Check what Docker set:

docker inspect "$CID" --format \
'NanoCpus={{.HostConfig.NanoCpus}} CpuQuota={{.HostConfig.CpuQuota}} CpuPeriod={{.HostConfig.CpuPeriod}} CpuShares={{.HostConfig.CpuShares}} CpusetCpus={{.HostConfig.CpusetCpus}}'

Common gotchas:

4.4 Fix CPU throttling

Option A: Increase CPU quota

Run a new container with more CPU:

docker run --cpus=2.0 yourimage

Or update an existing container:

docker update --cpus=2.0 "$CID"

Option B: Remove quota (no hard cap)

If you previously set quota, remove it by setting --cpus to 0 is not valid; instead set quota to -1 via update flags:

docker update --cpu-quota=-1 "$CID"

(You may also need to ensure --cpu-period is default.)

Option C: Adjust cpuset pinning

If pinned to a congested core:

docker update --cpuset-cpus="0-3" "$CID"

Option D: Diagnose application-level CPU stalls

If there’s no throttling but CPU is low, the app may be blocked. Use perf (host) against the container process:

sudo perf top -p "$PID"

Or capture a short profile:

sudo perf record -F 99 -p "$PID" -g -- sleep 15
sudo perf report

This helps distinguish “CPU-bound and slow” from “not getting CPU.”


5) I/O Bottlenecks: Storage Driver, Overlay Overhead, Disk Saturation

I/O issues are extremely common and often misdiagnosed as “CPU is low so it must be fine.” A container can be slow with low CPU because it’s waiting on disk.

5.1 Start with container-visible symptoms

Check block I/O in docker stats:

docker stats "$CID"

If block I/O grows quickly during slowness, suspect disk.

Inside the container, you can also check if processes are stuck in I/O wait. From the host:

ps -o pid,stat,wchan,comm -p "$PID"

5.2 Identify the storage driver and filesystem

docker info | grep -E 'Storage Driver|Backing Filesystem|Supports d_type|Native Overlay Diff'

Common drivers:

Also check where Docker stores data:

docker info | grep "Docker Root Dir"
df -hT /var/lib/docker

If /var/lib/docker is on a slow disk (or nearly full), performance suffers.

5.3 Overlay filesystem overhead and “small write” workloads

overlay2 merges layers. Heavy write workloads into the container’s writable layer can be slower than writing to a mounted volume because:

Rule of thumb: If your workload writes frequently (databases, queues, build caches), prefer volumes or bind mounts.

Check mounts:

docker inspect "$CID" --format '{{json .Mounts}}' | jq

If your database is writing to /var/lib/... inside the container filesystem rather than a volume, consider moving it.

5.4 Measure disk latency and saturation on the host

Use iostat:

iostat -xz 1

Look at the device backing Docker’s root dir (e.g., nvme0n1, sda).

If %util is high and await is high, the disk is saturated or slow.

Use pidstat to see per-process I/O:

sudo pidstat -d 1 -p "$PID"

If the container spawns multiple processes, you may want the whole cgroup rather than one PID, but this still helps.

5.5 Find which files are hot

If you have lsof:

sudo lsof -p "$PID" | head

For deeper I/O tracing, use strace carefully (it adds overhead):

sudo strace -ff -p "$PID" -e trace=openat,read,write,fdatasync,fsync -ttT

If you see frequent fsync() calls taking milliseconds to seconds, storage latency is hurting you.

5.6 Docker-specific I/O limits (blkio)

Docker can throttle I/O via blkio settings. Check:

docker inspect "$CID" --format \
'BlkioWeight={{.HostConfig.BlkioWeight}} ReadBps={{json .HostConfig.BlkioDeviceReadBps}} WriteBps={{json .HostConfig.BlkioDeviceWriteBps}} ReadIOps={{json .HostConfig.BlkioDeviceReadIOps}} WriteIOps={{json .HostConfig.BlkioDeviceWriteIOps}}'

If limits are set too low, the container will be artificially slow.

Update to remove or raise limits (example):

docker update --blkio-weight 500 "$CID"

Or remove device limits by re-creating the container without them.

5.7 Fix common I/O bottlenecks

Use volumes for write-heavy paths

Example: PostgreSQL data directory:

docker run -d --name pg \
  -v pgdata:/var/lib/postgresql/data \
  postgres:16

Ensure enough free space and healthy filesystem

df -h
sudo dmesg -T | tail -n 200

Kernel logs showing I/O errors, resets, or filesystem warnings are red flags.

Consider storage options


6) Memory Pressure and Misconfigured Limits (Often Masquerading as CPU/I/O)

A container can be “slow” because it’s constantly reclaiming memory, swapping, or being OOM-killed and restarted.

6.1 Check container memory limits

docker inspect "$CID" --format \
'Memory={{.HostConfig.Memory}} MemorySwap={{.HostConfig.MemorySwap}} MemoryReservation={{.HostConfig.MemoryReservation}} OomKillDisable={{.HostConfig.OomKillDisable}}'

Values are in bytes. 0 often means “no explicit limit.”

6.2 Detect OOM kills and memory reclaim

Check container events:

docker events --since 1h | grep -i oom

Check kernel logs:

sudo dmesg -T | grep -i -E 'oom|killed process' | tail -n 50

If you see OOM kills, the container may restart or the app may degrade.

6.3 cgroup memory stats (v2)

Again, find cgroup path:

PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)

Read memory metrics:

cat /sys/fs/cgroup$CGPATH/memory.current
cat /sys/fs/cgroup$CGPATH/memory.max
cat /sys/fs/cgroup$CGPATH/memory.stat | head -n 50

Useful fields in memory.stat:

Check pressure stall information (PSI), which is extremely helpful:

cat /sys/fs/cgroup$CGPATH/cpu.pressure
cat /sys/fs/cgroup$CGPATH/memory.pressure
cat /sys/fs/cgroup$CGPATH/io.pressure

If memory.pressure shows high some/full stall time, the container is spending time waiting on memory reclaim.

6.4 Swap behavior

On many systems, containers share host swap behavior unless configured. If the host is swapping, containers slow down.

Host swap check:

swapon --show
vmstat 1

If swap is active and si/so are non-zero during slowness, consider:

6.5 Fix memory misconfiguration

Increase container memory limit

docker update --memory 2g --memory-swap 2g "$CID"

Notes:

Set a reservation (soft limit) to reduce contention

docker update --memory-reservation 1g "$CID"

7) PIDs Limit, ulimits, and “It’s Slow Because It Can’t Spawn”

Sometimes “slowness” is actually the app failing to create threads/processes or open files, leading to timeouts and retries.

7.1 Check pids limit

docker inspect "$CID" --format 'PidsLimit={{.HostConfig.PidsLimit}}'

If it’s low (e.g., 100) and your runtime needs many threads (JVM, Node, Python gunicorn), you can hit the limit.

Update:

docker update --pids-limit 1000 "$CID"

7.2 Check ulimits (nofile, nproc)

docker inspect "$CID" --format '{{json .HostConfig.Ulimits}}' | jq

Inside container:

docker exec "$CID" sh -lc 'ulimit -a'

If nofile is too low, network servers can degrade under load.

Run with higher ulimit:

docker run --ulimit nofile=1048576:1048576 yourimage

8) Network Isn’t the Focus, But Don’t Ignore It

A container can be slow because it’s waiting on remote services. Quick checks:

docker exec "$CID" sh -lc 'getent hosts example.com'
docker exec "$CID" sh -lc 'time wget -qO- https://example.com >/dev/null'

On the host, look for retransmits:

ss -s
netstat -s | grep -i retrans

If network is the issue, CPU/I/O tuning won’t help.


9) A Practical Step-by-Step Workflow (Repeatable)

Use this sequence when you’re on-call and need answers quickly.

Step 1: Confirm symptoms and scope

Commands:

docker stats
uptime
iostat -xz 1
vmstat 1

Step 2: Check if the container is throttled

CID=<id>
PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
cat /proc/$PID/cgroup
CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)
cat /sys/fs/cgroup$CGPATH/cpu.stat

If throttling is high, raise/remove CPU quota:

docker update --cpus 2.0 "$CID"
# or
docker update --cpu-quota=-1 "$CID"

Step 3: If not throttled, check I/O pressure and disk saturation

cat /sys/fs/cgroup$CGPATH/io.pressure
iostat -xz 1
sudo pidstat -d 1 -p "$PID"

If I/O is the bottleneck:

Step 4: Check memory pressure/OOM

docker inspect "$CID" --format 'Memory={{.HostConfig.Memory}} MemorySwap={{.HostConfig.MemorySwap}}'
sudo dmesg -T | grep -i oom | tail

If memory pressure is high:

Step 5: Check pids/ulimits

docker inspect "$CID" --format 'PidsLimit={{.HostConfig.PidsLimit}}'
docker exec "$CID" sh -lc 'ulimit -a'

10) Common Misconfigurations and Their “Slow” Signatures

Misconfig: CPU quota too low

Signature:

Fix:

Misconfig: Writing to container layer instead of a volume

Signature:

Fix:

Misconfig: Memory limit too low for workload

Signature:

Fix:

Misconfig: PIDs limit too low

Signature:

Fix:

Misconfig: nofile too low

Signature:

Fix:


11) Reproducing and Proving the Root Cause (So Fixes Stick)

Performance debugging goes better when you can prove the bottleneck with a metric that changes when you apply a fix.

Examples of “proof” metrics:

Keep a short capture before and after:

# Capture a 30s snapshot of key host metrics
iostat -xz 1 30 > /tmp/iostat.txt
vmstat 1 30 > /tmp/vmstat.txt

# Capture cgroup cpu throttling (v2)
for i in $(seq 1 30); do
  date +%s
  cat /sys/fs/cgroup$CGPATH/cpu.stat
  sleep 1
done > /tmp/cpu_stat.txt

12) Appendix: Handy One-Liners

Show container limits quickly

docker inspect "$CID" --format \
'CPUs: Nano={{.HostConfig.NanoCpus}} Quota={{.HostConfig.CpuQuota}} Period={{.HostConfig.CpuPeriod}} Cpuset={{.HostConfig.CpusetCpus}}
Mem:  Max={{.HostConfig.Memory}} Swap={{.HostConfig.MemorySwap}} Res={{.HostConfig.MemoryReservation}}
PIDs: {{.HostConfig.PidsLimit}}
Ulimits: {{json .HostConfig.Ulimits}}'

Find top CPU containers

docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.BlockIO}}"

Attach a shell and check app-level behavior

docker exec -it "$CID" sh
# or bash if available
docker exec -it "$CID" bash

Closing Notes

Debugging slow Docker containers is mostly about observability and correct attribution:

If you share (1) docker inspect HostConfig, (2) cpu.stat throttling metrics, and (3) iostat -xz output during slowness, you can usually pinpoint the cause quickly and choose the right fix instead of guessing.