Debugging Slow Docker Containers: CPU Throttling, I/O Bottlenecks, and Misconfigured Limits

Slow containers are rarely “just Docker being slow.” In most cases, performance problems come from one (or several) of these categories:

CPU throttling (CFS quota/period, cpuset pinning, noisy neighbors, host CPU saturation)
I/O bottlenecks (storage driver behavior, overlay filesystem overhead, slow/contended disks, sync-heavy workloads)
Misconfigured limits (memory, swap, ulimits, pids limit, kernel settings, cgroup version mismatch)

This tutorial walks through a practical, command-driven workflow to identify the bottleneck and fix it. It assumes Docker Engine on Linux (most of the tooling and cgroup paths below are Linux-specific).

1) Establish a Baseline: “Is it the host or the container?”

Before diving into container internals, confirm the host isn’t already saturated.

Host-level quick checks

# CPU usage and load
uptime
top -o %CPU

# Per-core utilization and run queue
mpstat -P ALL 1

# Memory pressure and swapping
free -h
vmstat 1

# Disk I/O saturation and latency
iostat -xz 1

# If you have it: per-process I/O
sudo iotop -oPa

Interpretation tips:

High load average with low CPU utilization can indicate I/O wait or blocked tasks.
iostat -xz:
- %util near 100% indicates the disk is saturated.
- High await (latency) suggests contention or slow storage.
vmstat:
- Non-zero si/so (swap in/out) indicates swapping, which can devastate container performance.

If the host looks healthy but the container is slow, proceed.

2) Identify the Slow Container and Its Limits

List containers and observe live stats:

docker ps
docker stats

docker stats shows CPU%, memory usage, network, and block I/O. It’s a good starting point, but it can hide why CPU is low (e.g., throttling) or why I/O is high (overlay overhead vs physical disk).

Inspect container configuration:

CID=<container_id_or_name>
docker inspect "$CID" --format '{{json .HostConfig}}' | jq

Look for:

NanoCpus, CpuQuota, CpuPeriod, CpuShares, CpusetCpus
Memory, MemorySwap, OomKillDisable
BlkioWeight, BlkioDeviceReadBps, BlkioDeviceWriteBps
PidsLimit
Ulimits

Also check what Docker thinks the container is doing:

docker inspect "$CID" --format 'Name={{.Name}} Image={{.Config.Image}} Cmd={{json .Config.Cmd}}'

3) Determine Your Cgroup Version (Important for Paths and Metrics)

Many debugging steps depend on whether your system uses cgroup v1 or cgroup v2.

stat -fc %T /sys/fs/cgroup

cgroup2fs → cgroup v2
anything else (often tmpfs) → likely cgroup v1

You can also check:

mount | grep cgroup

Docker supports both, but metric file locations differ.

4) CPU Throttling: Detect, Measure, Fix

4.1 Understand CPU limits in Docker

Docker can limit CPU in several ways:

CFS quota/period (hard cap)
- --cpus 1.0 sets quota to allow ~1 CPU worth of time.
- Internally: cpu.cfs_quota_us and cpu.cfs_period_us (cgroup v1)
CPU shares (relative weight, not a hard cap)
- --cpu-shares affects scheduling under contention.
cpuset (pin to specific cores)
- --cpuset-cpus="0,2" restricts which cores can be used.

A container can show low CPU usage because it’s:

blocked on I/O,
waiting on locks,
or being throttled by CFS quota.

4.2 Check CPU throttling from inside the container (or from host)

First, get the container’s main PID:

PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
echo "$PID"

cgroup v2 throttling metrics

On cgroup v2, CPU stats are often in:

/sys/fs/cgroup/<scope>/cpu.stat

Docker’s exact cgroup path varies by distro/systemd. A robust approach is to find the cgroup path for the PID:

cat /proc/$PID/cgroup

You may see something like:

v2: 0::/system.slice/docker-<id>.scope
v1: multiple controllers listed separately

For cgroup v2:

CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)
cat /sys/fs/cgroup$CGPATH/cpu.stat

Example output:

usage_usec 123456789
user_usec 100000000
system_usec 23456789
nr_periods 1200
nr_throttled 800
throttled_usec 987654321

Key fields:

nr_throttled: number of periods where throttling occurred
throttled_usec: total time throttled

If nr_throttled grows rapidly and throttled_usec increases steadily during slowness, you have CPU throttling.

cgroup v1 throttling metrics

For cgroup v1, locate the cpu cgroup path:

cat /proc/$PID/cgroup | grep cpu

Then read:

# Example path; yours will differ
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.stat
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.cfs_quota_us
cat /sys/fs/cgroup/cpu/docker/<container_id>/cpu.cfs_period_us

cpu.stat often includes:

nr_periods 1200
nr_throttled 800
throttled_time 987654321000

(throttled_time is usually in nanoseconds.)

4.3 Verify the configured CPU limits

Check what Docker set:

docker inspect "$CID" --format \
'NanoCpus={{.HostConfig.NanoCpus}} CpuQuota={{.HostConfig.CpuQuota}} CpuPeriod={{.HostConfig.CpuPeriod}} CpuShares={{.HostConfig.CpuShares}} CpusetCpus={{.HostConfig.CpusetCpus}}'

Common gotchas:

--cpus=0.5 might be too low for bursty workloads (GC, JIT, compaction).
--cpuset-cpus pinned to a busy core can look like “random slowness.”
CPU shares don’t help if you have a hard quota set.

4.4 Fix CPU throttling

Option A: Increase CPU quota

Run a new container with more CPU:

docker run --cpus=2.0 yourimage

Or update an existing container:

docker update --cpus=2.0 "$CID"

Option B: Remove quota (no hard cap)

If you previously set quota, remove it by setting --cpus to 0 is not valid; instead set quota to -1 via update flags:

docker update --cpu-quota=-1 "$CID"

(You may also need to ensure --cpu-period is default.)

Option C: Adjust cpuset pinning

If pinned to a congested core:

docker update --cpuset-cpus="0-3" "$CID"

Option D: Diagnose application-level CPU stalls

If there’s no throttling but CPU is low, the app may be blocked. Use perf (host) against the container process:

sudo perf top -p "$PID"

Or capture a short profile:

sudo perf record -F 99 -p "$PID" -g -- sleep 15
sudo perf report

This helps distinguish “CPU-bound and slow” from “not getting CPU.”

5) I/O Bottlenecks: Storage Driver, Overlay Overhead, Disk Saturation

I/O issues are extremely common and often misdiagnosed as “CPU is low so it must be fine.” A container can be slow with low CPU because it’s waiting on disk.

5.1 Start with container-visible symptoms

Check block I/O in docker stats:

docker stats "$CID"

If block I/O grows quickly during slowness, suspect disk.

Inside the container, you can also check if processes are stuck in I/O wait. From the host:

ps -o pid,stat,wchan,comm -p "$PID"

D state indicates uninterruptible sleep (often I/O).
wchan may show kernel wait function names.

5.2 Identify the storage driver and filesystem

docker info | grep -E 'Storage Driver|Backing Filesystem|Supports d_type|Native Overlay Diff'

Common drivers:

overlay2 (most common; generally good)
devicemapper (older; can be slow when misconfigured)
btrfs, zfs (feature-rich, different performance characteristics)

Also check where Docker stores data:

docker info | grep "Docker Root Dir"
df -hT /var/lib/docker

If /var/lib/docker is on a slow disk (or nearly full), performance suffers.

5.3 Overlay filesystem overhead and “small write” workloads

overlay2 merges layers. Heavy write workloads into the container’s writable layer can be slower than writing to a mounted volume because:

copy-on-write behavior can trigger extra metadata operations
lots of small fsyncs can amplify latency

Rule of thumb: If your workload writes frequently (databases, queues, build caches), prefer volumes or bind mounts.

Check mounts:

docker inspect "$CID" --format '{{json .Mounts}}' | jq

If your database is writing to /var/lib/... inside the container filesystem rather than a volume, consider moving it.

5.4 Measure disk latency and saturation on the host

Use iostat:

iostat -xz 1

Look at the device backing Docker’s root dir (e.g., nvme0n1, sda).

If %util is high and await is high, the disk is saturated or slow.

Use pidstat to see per-process I/O:

sudo pidstat -d 1 -p "$PID"

If the container spawns multiple processes, you may want the whole cgroup rather than one PID, but this still helps.

5.5 Find which files are hot

If you have lsof:

sudo lsof -p "$PID" | head

For deeper I/O tracing, use strace carefully (it adds overhead):

sudo strace -ff -p "$PID" -e trace=openat,read,write,fdatasync,fsync -ttT

If you see frequent fsync() calls taking milliseconds to seconds, storage latency is hurting you.

5.6 Docker-specific I/O limits (blkio)

Docker can throttle I/O via blkio settings. Check:

docker inspect "$CID" --format \
'BlkioWeight={{.HostConfig.BlkioWeight}} ReadBps={{json .HostConfig.BlkioDeviceReadBps}} WriteBps={{json .HostConfig.BlkioDeviceWriteBps}} ReadIOps={{json .HostConfig.BlkioDeviceReadIOps}} WriteIOps={{json .HostConfig.BlkioDeviceWriteIOps}}'

If limits are set too low, the container will be artificially slow.

Update to remove or raise limits (example):

docker update --blkio-weight 500 "$CID"

Or remove device limits by re-creating the container without them.

5.7 Fix common I/O bottlenecks

Use volumes for write-heavy paths

Example: PostgreSQL data directory:

docker run -d --name pg \
  -v pgdata:/var/lib/postgresql/data \
  postgres:16

Ensure enough free space and healthy filesystem

df -h
sudo dmesg -T | tail -n 200

Kernel logs showing I/O errors, resets, or filesystem warnings are red flags.

Consider storage options

Put /var/lib/docker on SSD/NVMe.
If running on cloud, verify the volume type and provisioned IOPS.
Avoid running many write-heavy containers on a single slow disk.

6) Memory Pressure and Misconfigured Limits (Often Masquerading as CPU/I/O)

A container can be “slow” because it’s constantly reclaiming memory, swapping, or being OOM-killed and restarted.

6.1 Check container memory limits

docker inspect "$CID" --format \
'Memory={{.HostConfig.Memory}} MemorySwap={{.HostConfig.MemorySwap}} MemoryReservation={{.HostConfig.MemoryReservation}} OomKillDisable={{.HostConfig.OomKillDisable}}'

Values are in bytes. 0 often means “no explicit limit.”

6.2 Detect OOM kills and memory reclaim

Check container events:

docker events --since 1h | grep -i oom

Check kernel logs:

sudo dmesg -T | grep -i -E 'oom|killed process' | tail -n 50

If you see OOM kills, the container may restart or the app may degrade.

6.3 cgroup memory stats (v2)

Again, find cgroup path:

PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)

Read memory metrics:

cat /sys/fs/cgroup$CGPATH/memory.current
cat /sys/fs/cgroup$CGPATH/memory.max
cat /sys/fs/cgroup$CGPATH/memory.stat | head -n 50

Useful fields in memory.stat:

anon, file
pgfault, pgmajfault (major faults imply disk I/O)
workingset_refault (can indicate cache thrash)

Check pressure stall information (PSI), which is extremely helpful:

cat /sys/fs/cgroup$CGPATH/cpu.pressure
cat /sys/fs/cgroup$CGPATH/memory.pressure
cat /sys/fs/cgroup$CGPATH/io.pressure

If memory.pressure shows high some/full stall time, the container is spending time waiting on memory reclaim.

6.4 Swap behavior

On many systems, containers share host swap behavior unless configured. If the host is swapping, containers slow down.

Host swap check:

swapon --show
vmstat 1

If swap is active and si/so are non-zero during slowness, consider:

adding RAM,
reducing memory limits contention,
tuning vm.swappiness,
or moving workloads.

6.5 Fix memory misconfiguration

Increase container memory limit

docker update --memory 2g --memory-swap 2g "$CID"

Notes:

Setting --memory-swap equal to --memory effectively disables swap for that container (behavior depends on kernel/cgroup version).
If you set --memory too low, the app may GC/compact constantly, appearing “CPU slow.”

Set a reservation (soft limit) to reduce contention

docker update --memory-reservation 1g "$CID"

7) PIDs Limit, ulimits, and “It’s Slow Because It Can’t Spawn”

Sometimes “slowness” is actually the app failing to create threads/processes or open files, leading to timeouts and retries.

7.1 Check pids limit

docker inspect "$CID" --format 'PidsLimit={{.HostConfig.PidsLimit}}'

If it’s low (e.g., 100) and your runtime needs many threads (JVM, Node, Python gunicorn), you can hit the limit.

Update:

docker update --pids-limit 1000 "$CID"

7.2 Check ulimits (nofile, nproc)

docker inspect "$CID" --format '{{json .HostConfig.Ulimits}}' | jq

Inside container:

docker exec "$CID" sh -lc 'ulimit -a'

If nofile is too low, network servers can degrade under load.

Run with higher ulimit:

docker run --ulimit nofile=1048576:1048576 yourimage

8) Network Isn’t the Focus, But Don’t Ignore It

A container can be slow because it’s waiting on remote services. Quick checks:

docker exec "$CID" sh -lc 'getent hosts example.com'
docker exec "$CID" sh -lc 'time wget -qO- https://example.com >/dev/null'

On the host, look for retransmits:

ss -s
netstat -s | grep -i retrans

If network is the issue, CPU/I/O tuning won’t help.

9) A Practical Step-by-Step Workflow (Repeatable)

Use this sequence when you’re on-call and need answers quickly.

Step 1: Confirm symptoms and scope

Is one container slow or many?
Is the host slow too?

Commands:

docker stats
uptime
iostat -xz 1
vmstat 1

Step 2: Check if the container is throttled

CID=<id>
PID=$(docker inspect -f '{{.State.Pid}}' "$CID")
cat /proc/$PID/cgroup

cgroup v2:

CGPATH=$(awk -F: '$1=="0"{print $3}' /proc/$PID/cgroup)
cat /sys/fs/cgroup$CGPATH/cpu.stat

If throttling is high, raise/remove CPU quota:

docker update --cpus 2.0 "$CID"
# or
docker update --cpu-quota=-1 "$CID"

Step 3: If not throttled, check I/O pressure and disk saturation

PSI (v2):

cat /sys/fs/cgroup$CGPATH/io.pressure

Host disk:

iostat -xz 1

Per-process I/O:

sudo pidstat -d 1 -p "$PID"

If I/O is the bottleneck:

move write-heavy paths to volumes,
remove blkio limits,
improve underlying storage.

Step 4: Check memory pressure/OOM

docker inspect "$CID" --format 'Memory={{.HostConfig.Memory}} MemorySwap={{.HostConfig.MemorySwap}}'
sudo dmesg -T | grep -i oom | tail

If memory pressure is high:

increase memory limit,
reduce co-located workloads,
avoid host swapping.

Step 5: Check pids/ulimits

docker inspect "$CID" --format 'PidsLimit={{.HostConfig.PidsLimit}}'
docker exec "$CID" sh -lc 'ulimit -a'

10) Common Misconfigurations and Their “Slow” Signatures

Misconfig: CPU quota too low

Signature:

cpu.stat shows increasing throttling
Response times spike under load even though CPU% looks capped

Fix:

Increase --cpus or remove quota.

Misconfig: Writing to container layer instead of a volume

Signature:

High block I/O
Many fsyncs, slow metadata operations
Performance degrades as writable layer grows

Fix:

Use -v volumes/bind mounts for write-heavy directories.

Misconfig: Memory limit too low for workload

Signature:

Frequent GC/compaction (language-dependent)
High major page faults
OOM kills or near-OOM reclaim stalls (PSI)

Fix:

Increase memory, tune app memory usage, reduce co-tenancy.

Misconfig: PIDs limit too low

Signature:

Timeouts, inability to spawn workers/threads
Logs show “resource temporarily unavailable” or fork failures

Fix:

Raise --pids-limit.

Misconfig: nofile too low

Signature:

Connection failures under concurrency
“Too many open files” errors, degraded throughput

Fix:

Raise --ulimit nofile=....

11) Reproducing and Proving the Root Cause (So Fixes Stick)

Performance debugging goes better when you can prove the bottleneck with a metric that changes when you apply a fix.

Examples of “proof” metrics:

CPU throttling:
- Before: nr_throttled climbs fast
- After raising CPU: nr_throttled barely increases; latency improves
I/O bottleneck:
- Before: iostat await high, %util high, io.pressure high
- After moving data to faster disk/volume: latency drops, await drops
Memory pressure:
- Before: memory.pressure shows high stall time; major faults increase
- After: stall time decreases; throughput improves

Keep a short capture before and after:

# Capture a 30s snapshot of key host metrics
iostat -xz 1 30 > /tmp/iostat.txt
vmstat 1 30 > /tmp/vmstat.txt

# Capture cgroup cpu throttling (v2)
for i in $(seq 1 30); do
  date +%s
  cat /sys/fs/cgroup$CGPATH/cpu.stat
  sleep 1
done > /tmp/cpu_stat.txt

12) Appendix: Handy One-Liners

Show container limits quickly

docker inspect "$CID" --format \
'CPUs: Nano={{.HostConfig.NanoCpus}} Quota={{.HostConfig.CpuQuota}} Period={{.HostConfig.CpuPeriod}} Cpuset={{.HostConfig.CpusetCpus}}
Mem:  Max={{.HostConfig.Memory}} Swap={{.HostConfig.MemorySwap}} Res={{.HostConfig.MemoryReservation}}
PIDs: {{.HostConfig.PidsLimit}}
Ulimits: {{json .HostConfig.Ulimits}}'

Find top CPU containers

docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.BlockIO}}"

Attach a shell and check app-level behavior

docker exec -it "$CID" sh
# or bash if available
docker exec -it "$CID" bash

Closing Notes

Debugging slow Docker containers is mostly about observability and correct attribution:

If CPU is low, don’t assume CPU is fine—check throttling and I/O wait.
If disk I/O is high, determine whether it’s the underlying disk, overlay overhead, or blkio limits.
If everything looks “normal,” check memory pressure, OOMs, pids/ulimits, and application-level blocking.

If you share (1) docker inspect HostConfig, (2) cpu.stat throttling metrics, and (3) iostat -xz output during slowness, you can usually pinpoint the cause quickly and choose the right fix instead of guessing.

Debugging Slow Docker Containers: CPU Throttling, I/O Bottlenecks, and Misconfigured Limits

Debugging Slow Docker Containers: CPU Throttling, I/O Bottlenecks, and Misconfigured Limits

1) Establish a Baseline: “Is it the host or the container?”

Host-level quick checks

2) Identify the Slow Container and Its Limits

3) Determine Your Cgroup Version (Important for Paths and Metrics)

4) CPU Throttling: Detect, Measure, Fix

4.1 Understand CPU limits in Docker

4.2 Check CPU throttling from inside the container (or from host)

cgroup v2 throttling metrics

cgroup v1 throttling metrics

4.3 Verify the configured CPU limits

4.4 Fix CPU throttling

Option A: Increase CPU quota

Option B: Remove quota (no hard cap)

Option C: Adjust cpuset pinning

Option D: Diagnose application-level CPU stalls

5) I/O Bottlenecks: Storage Driver, Overlay Overhead, Disk Saturation

5.1 Start with container-visible symptoms

5.2 Identify the storage driver and filesystem

5.3 Overlay filesystem overhead and “small write” workloads

5.4 Measure disk latency and saturation on the host

5.5 Find which files are hot

5.6 Docker-specific I/O limits (blkio)

5.7 Fix common I/O bottlenecks

Use volumes for write-heavy paths

Ensure enough free space and healthy filesystem

Consider storage options

6) Memory Pressure and Misconfigured Limits (Often Masquerading as CPU/I/O)

6.1 Check container memory limits

6.2 Detect OOM kills and memory reclaim

6.3 cgroup memory stats (v2)

6.4 Swap behavior

6.5 Fix memory misconfiguration

Increase container memory limit

Set a reservation (soft limit) to reduce contention

7) PIDs Limit, ulimits, and “It’s Slow Because It Can’t Spawn”

7.1 Check pids limit

7.2 Check ulimits (nofile, nproc)

8) Network Isn’t the Focus, But Don’t Ignore It

9) A Practical Step-by-Step Workflow (Repeatable)

Step 1: Confirm symptoms and scope

Step 2: Check if the container is throttled

Step 3: If not throttled, check I/O pressure and disk saturation

Step 4: Check memory pressure/OOM

Step 5: Check pids/ulimits

10) Common Misconfigurations and Their “Slow” Signatures

Misconfig: CPU quota too low

Misconfig: Writing to container layer instead of a volume

Misconfig: Memory limit too low for workload

Misconfig: PIDs limit too low

Misconfig: nofile too low

11) Reproducing and Proving the Root Cause (So Fixes Stick)

12) Appendix: Handy One-Liners

Show container limits quickly

Find top CPU containers

Attach a shell and check app-level behavior

Closing Notes

Related Tutorials