Handling Graceful Shutdowns: Fixing Stuck or Zombie Containers in Production

Production container platforms are optimized for starting and stopping workloads quickly. But “stop” is not a single action: it is a sequence of signals, timeouts, process behavior, and kernel mechanics. When that sequence breaks, you get containers that won’t die, containers that are “Exited” but still hold resources, or “zombie” processes accumulating inside a container. This tutorial explains why that happens and how to fix it—using real commands and production-safe patterns.

1. What “graceful shutdown” means for containers
2. The signal flow: Docker, containerd, Kubernetes
3. Common failure modes that create stuck or zombie containers
4. Diagnosing a stuck container (host and inside-container)
5. Fixing zombie processes: PID 1, init systems, and reaping
6. Fixing containers that ignore SIGTERM
7. Fixing containers stuck in Stopping or unkillable (D state)
8. Kubernetes specifics: terminationGracePeriodSeconds, preStop, and probes
9. Practical hardening patterns (Dockerfile, entrypoint, app code)
10. Incident playbook: step-by-step commands
11. Prevention checklist

1. What “graceful shutdown” means for containers

A container is not a VM; it’s a set of Linux processes isolated by namespaces and controlled by cgroups. Stopping a container typically means:

Send a “please exit” signal (usually SIGTERM) to the container’s main process (PID 1 inside the container).
Wait for a grace period.
If it hasn’t exited, send SIGKILL (force kill).
Tear down networking, cgroups, mounts, and release resources.

A graceful shutdown is successful when:

The application receives the termination signal.
It stops accepting new work.
It finishes or cancels in-flight work within the grace period.
It flushes buffers, closes sockets, releases locks, and exits.
The process tree is cleaned up (no zombies), and the runtime can remove the container.

When it fails, you may observe:

docker stop hangs or takes the full timeout.
docker rm -f fails or hangs.
Kubernetes pods stuck in Terminating.
Containers that “Exited” but still have child processes (rare but possible with misconfigured runtimes or host issues).
Zombie processes inside the container (processes in Z state).
“Unkillable” processes in D state (uninterruptible sleep), often due to kernel/I/O issues.

2. The signal flow: Docker, containerd, Kubernetes

Docker (classic behavior)

docker stop <container>:
- Sends SIGTERM to PID 1 in the container.
- Waits --time seconds (default 10).
- Sends SIGKILL if still running.

Commands:

docker stop --time 20 myapp
docker kill --signal=SIGTERM myapp
docker kill --signal=SIGKILL myapp

containerd / runc (under the hood)

Docker and Kubernetes ultimately rely on an OCI runtime (commonly runc). The runtime sends signals to the container process and manages cgroups and namespaces. If the runtime can’t signal or can’t reap, you can see “stuck” states.

Kubernetes

Kubernetes termination sequence (simplified):

Pod gets a deletion timestamp.
Endpoints are updated (pod removed from Service endpoints).
If defined, preStop hook runs.
Kubelet asks runtime to stop the container:
- Sends SIGTERM.
- Waits terminationGracePeriodSeconds.
- Sends SIGKILL.

If your app needs 30 seconds to drain connections, but grace is 10 seconds, you’ll see forced kills and potentially corrupted work.

3. Common failure modes that create stuck or zombie containers

A) PID 1 doesn’t forward signals

Inside a container, PID 1 has special semantics: it may ignore some signals by default, and it is responsible for reaping orphaned child processes. If PID 1 is a shell script that doesn’t exec the real app, signals may never reach the app.

Bad pattern:

#!/bin/sh
myserver &   # runs in background
wait         # PID 1 waits, but signal handling is often wrong here

Better pattern:

#!/bin/sh
exec myserver

B) PID 1 doesn’t reap children → zombies

If your app spawns child processes and doesn’t wait() for them, they become zombies (STAT=Z). In a normal Linux system, systemd (PID 1) reaps them. In containers, your app is PID 1 and must reap or you need a minimal init.

C) App ignores SIGTERM or blocks shutdown

Common causes:

Not registering signal handlers (or using frameworks incorrectly).
Long blocking I/O without cancellation.
Deadlocks on shutdown (e.g., waiting for a goroutine/thread that waits for a lock held by shutdown path).
Not closing listeners, so the process never exits.

D) Uninterruptible sleep (`D` state)

If a process is stuck in kernel space (often I/O), SIGKILL won’t kill it. This is not a “container problem”; it’s a host/kernel/storage problem. Symptoms:

docker kill -9 has no effect.
ps shows D state.
Often related to NFS, hung disks, FUSE, overlayfs issues, or kernel bugs.

E) Runtime / cgroup cleanup issues

Sometimes the process exits but cgroup cleanup hangs due to kernel or runtime issues. You might see containers stuck in “Removing” or “Dead”.

4. Diagnosing a stuck container (host and inside-container)

4.1 Identify the container and state

docker ps -a --no-trunc
docker inspect -f '{{.State.Status}} {{.State.Running}} {{.State.Pid}} {{.State.FinishedAt}}' myapp

If .State.Pid is non-zero, the container still has a running init process on the host.

4.2 Check what PID 1 is doing (from the host)

Get the host PID:

PID=$(docker inspect -f '{{.State.Pid}}' myapp)
echo "$PID"

Inspect process state:

ps -o pid,ppid,stat,etime,cmd -p "$PID"
cat /proc/"$PID"/status | sed -n '1,40p'

If you see State: D (disk sleep) or STAT includes D, you likely have an unkillable process.

Check open files and what it’s waiting on:

sudo ls -l /proc/"$PID"/fd | head
sudo cat /proc/"$PID"/wchan

If wchan shows something like nfs_*, fuse_*, or block I/O wait, suspect storage.

4.3 Enter the container’s namespaces without relying on `docker exec`

If docker exec hangs (it can if the runtime is unhealthy), use nsenter:

sudo nsenter -t "$PID" -m -u -i -n -p -- bash -lc 'ps auxf'

If the image doesn’t have bash, use sh:

sudo nsenter -t "$PID" -m -u -i -n -p -- sh -lc 'ps -eo pid,ppid,stat,cmd --forest'

4.4 Look for zombies

Inside the container namespace:

ps -eo pid,ppid,stat,cmd | awk '$3 ~ /Z/ {print}'

Or a quick count:

ps -eo stat | grep -c Z

If zombies exist and PID 1 is not reaping, they will accumulate over time.

4.5 Check signal handling quickly

From the host, send SIGTERM and see if it exits:

docker kill --signal=SIGTERM myapp
sleep 2
docker inspect -f '{{.State.Running}}' myapp

If it stays running, either it ignores SIGTERM, is stuck, or PID 1 is not your app.

5. Fixing zombie processes: PID 1, init systems, and reaping

5.1 Why zombies happen in containers

A zombie process is a process that has exited but still has an entry in the process table because its parent hasn’t collected its exit status via wait().

In a container:

If your application is PID 1 and spawns children, it must wait() for them.
Many apps do not implement a proper reaper loop.
Shell scripts used as entrypoints often mishandle child processes.

5.2 Use a minimal init: `tini` (recommended)

tini is a tiny init process that:

Forwards signals to your app.
Reaps zombie processes.

Docker run:

docker run --init myimage:latest

Docker’s --init uses tini under the hood on many installations.

Dockerfile approach (explicit):

FROM debian:stable-slim

RUN apt-get update && apt-get install -y --no-install-recommends tini ca-certificates \
  && rm -rf /var/lib/apt/lists/*

ENTRYPOINT ["/usr/bin/tini","--"]
CMD ["./myserver"]

5.3 If you must use a shell entrypoint, `exec` properly

Bad:

#!/bin/sh
./myserver

This keeps the shell as PID 1; signals go to the shell, not necessarily to myserver.

Good:

#!/bin/sh
exec ./myserver

Now myserver becomes PID 1 and receives signals directly.

5.4 For apps that spawn children: ensure reaping

If you’re writing the app, implement child reaping or avoid spawning unmanaged children. For example, in Go you typically don’t need to spawn OS processes for concurrency; use goroutines. If you do spawn processes, call Wait() and handle SIGCHLD.

If you can’t change the app, use tini or dumb-init.

6. Fixing containers that ignore SIGTERM

6.1 Confirm what signal is sent and what the app receives

Docker sends SIGTERM by default. Some apps only handle SIGINT (Ctrl+C) in dev setups. You can test:

docker kill --signal=SIGINT myapp

If SIGINT works but SIGTERM doesn’t, fix the app to handle SIGTERM correctly.

6.2 Ensure PID 1 is the app (not a wrapper)

Check:

docker exec myapp ps -p 1 -o pid,cmd

If PID 1 is sh, bash, python entrypoint.py, or a supervisor, ensure it forwards signals and exits when the child exits.

6.3 Increase stop timeout (as a mitigation)

If the app is slow but correct:

docker stop --time 60 myapp

For Compose:

docker compose stop -t 60

This is not a “fix” if the app never exits, but it prevents premature SIGKILL for workloads that legitimately need time to drain.

6.4 Application-level shutdown patterns (what “good” looks like)

A robust server shutdown generally does:

Stop accepting new connections (close listener).
Signal worker pools to stop.
Set deadlines for in-flight requests.
Flush logs/metrics.
Exit with a clean code.

If you run HTTP services behind a load balancer, also consider:

Draining keep-alive connections.
Returning 503 quickly during shutdown window.

7. Fixing containers stuck in `Stopping` or unkillable (`D` state)

7.1 First attempt: normal stop, then SIGKILL

docker stop --time 20 myapp
docker kill --signal=SIGKILL myapp

If docker kill returns success but the container remains running, the process may be in D state or the runtime is stuck.

7.2 Inspect host PID and process state

PID=$(docker inspect -f '{{.State.Pid}}' myapp)
ps -o pid,stat,wchan,cmd -p "$PID"

If stat includes D, you cannot kill it from userspace. Your options shift to fixing the underlying kernel wait condition.

7.3 Typical root causes of `D` state in production

NFS mount hung (common with network storage hiccups).
Block device latency/hang.
overlayfs issues under heavy I/O.
FUSE filesystem deadlock.
Kernel bugs or resource exhaustion.

Check kernel logs:

dmesg -T | tail -n 200
journalctl -k --since "30 min ago"

Look for I/O errors, NFS timeouts, or hung task warnings.

7.4 If the container uses NFS or remote volumes

List mounts used by the process:

sudo cat /proc/"$PID"/mountinfo | head -n 50
sudo lsof -p "$PID" | head

If you suspect NFS, see NFS stats:

nfsstat -m 2>/dev/null || true

Mitigations:

Fix the storage/network issue.
Consider mounting NFS with options that fail faster (careful: this changes semantics).
Avoid putting critical shutdown paths on remote storage (e.g., writing final state to NFS during SIGTERM).

7.5 When removal is stuck: restart runtime services (last resort)

On a Docker host (systemd-based), restarting Docker can release runtime deadlocks, but it can also disrupt running containers. Use extreme caution.

sudo systemctl status docker
sudo systemctl restart docker

On Kubernetes nodes with containerd:

sudo systemctl status containerd
sudo systemctl restart containerd

If a process is truly unkillable (D state), even restarting the runtime may not help. The process remains until the kernel wait resolves or the host reboots.

7.6 Host reboot decision

If you have confirmed:

PID is in D state,
storage is hung or kernel is wedged,
the container blocks critical operations (e.g., node drain),

then a controlled node reboot may be the only resolution. In Kubernetes, cordon and drain first when possible:

kubectl cordon <node>
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data --grace-period=60 --timeout=10m

If drain cannot complete due to stuck pods, you may need forced deletion (see Kubernetes section), but understand it may leave resources behind until reboot.

8. Kubernetes specifics: terminationGracePeriodSeconds, preStop, and probes

8.1 Understand the termination timeline

SIGTERM is sent when the pod is terminating.
preStop runs before SIGTERM completes, and it counts against the grace period.
After grace period, kubelet issues SIGKILL.

If your preStop sleeps 20 seconds and your grace period is 30 seconds, your app has at most ~10 seconds to shut down after preStop completes.

8.2 Configure a realistic grace period

Example:

kubectl get pod myapp -o jsonpath='{.spec.terminationGracePeriodSeconds}{"\n"}'

A typical web service might need 30–60 seconds depending on request duration and connection draining.

8.3 Use `preStop` to drain, not to “wait and hope”

A useful preStop might call an internal endpoint to start draining:

kubectl exec deploy/myapp -- curl -sf http://127.0.0.1:8080/drain

In a Pod spec, the hook could be:

Execute a command that flips the app into “draining” mode.
Then sleep briefly to allow endpoints to update.

Be careful: preStop failures can shorten your effective shutdown time.

8.4 Readiness probes and termination

A strong pattern:

On SIGTERM, immediately fail readiness (or stop responding to readiness endpoint).
This removes the pod from load balancer rotation quickly.
Then finish in-flight work.

If readiness stays “ready” during shutdown, traffic may continue to hit the pod until it dies.

8.5 Pods stuck in `Terminating`

Get details:

kubectl get pod -n myns mypod -o wide
kubectl describe pod -n myns mypod
kubectl get pod -n myns mypod -o json | jq '.metadata.finalizers, .status.containerStatuses'

Common causes:

Finalizers (e.g., PVC protection, custom controllers).
Kubelet can’t kill container due to node/runtime issues.
Volume unmount hangs (again often storage).

Force delete (dangerous; use when node is unhealthy and you accept cleanup later):

kubectl delete pod -n myns mypod --grace-period=0 --force

If the node is unreachable, Kubernetes will remove the API object, but the process may still run on the node until it recovers or reboots.

9. Practical hardening patterns (Dockerfile, entrypoint, app code)

9.1 Prefer exec-form ENTRYPOINT/CMD

Exec form avoids an extra shell and preserves signal delivery:

ENTRYPOINT ["./myserver"]

If you need arguments:

CMD ["--port=8080","--log-level=info"]

Avoid:

ENTRYPOINT ./myserver --port=8080

That uses a shell and can break signal handling.

9.2 Add an init for reaping

Use Docker --init in runtime config, or bake tini in the image (especially for Kubernetes where --init is not a Pod setting).

9.3 Ensure logs flush on shutdown

If you use buffered logging, flush on SIGTERM. Otherwise you’ll see truncated logs exactly when you need them most.

9.4 Avoid shutdown work that depends on fragile dependencies

Common mistake: on SIGTERM, write final state to an NFS mount or a remote DB and block indefinitely. Use timeouts and fallbacks.

9.5 Add explicit timeouts everywhere

HTTP server shutdown timeout
DB connection close timeout
Queue consumer stop timeout

If the app can’t stop within the platform grace period, it will eventually be SIGKILLed.

10. Incident playbook: step-by-step commands

This section is a practical sequence you can run during an incident on a Docker host. Adjust names and be mindful of impact.

10.1 Identify the problem container

docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Image}}'
docker ps -a --no-trunc | grep -E 'Stopping|Dead|Exited'

10.2 Attempt graceful stop with longer timeout

docker stop --time 60 myapp

10.3 If still running, inspect PID and state

PID=$(docker inspect -f '{{.State.Pid}}' myapp)
ps -o pid,ppid,stat,etime,wchan,cmd -p "$PID"

10.4 Check container process tree via nsenter

sudo nsenter -t "$PID" -m -u -i -n -p -- sh -lc 'ps -eo pid,ppid,stat,cmd --forest | sed -n "1,200p"'

10.5 Look for zombies

sudo nsenter -t "$PID" -p -- sh -lc 'ps -eo pid,ppid,stat,cmd | awk "$3 ~ /Z/ {print}"'

If zombies are present, plan a redeploy with tini/proper PID 1 behavior.

10.6 If ignoring SIGTERM, send SIGKILL

docker kill --signal=SIGKILL myapp

10.7 If SIGKILL doesn’t work: check for `D` state and kernel logs

ps -o pid,stat,wchan,cmd -p "$PID"
dmesg -T | tail -n 100
journalctl -k --since "15 min ago" | tail -n 200

If D state correlates with storage/network issues, engage the storage layer. If the node is wedged, prepare for reboot.

10.8 If Docker metadata is stuck (container “Dead”)

Sometimes Docker shows a container as Dead and it can’t be removed. Try:

docker rm -f myapp

If it hangs, you may need to restart Docker after assessing impact:

sudo systemctl restart docker

On containerd-based systems:

sudo systemctl restart containerd

If the underlying process is unkillable, runtime restarts won’t fix it—only resolution of the kernel wait or reboot will.

11. Prevention checklist

Use this as a pre-production and post-incident checklist.

Container image and entrypoint

Use exec-form ENTRYPOINT/CMD.
Ensure PID 1 is the actual app process (or a minimal init like tini).
Avoid shell wrappers; if needed, exec the child.
Add tini/dumb-init to reap zombies.

Application behavior

Handle SIGTERM (and ideally SIGINT) explicitly.
Stop accepting new work immediately on shutdown.
Drain connections and stop background workers with timeouts.
Avoid indefinite waits; always use deadlines.
Flush logs/metrics on exit.

Platform configuration

Set realistic stop/grace timeouts (docker stop -t, Kubernetes terminationGracePeriodSeconds).
Ensure readiness fails quickly during shutdown (drain pattern).
Use preStop only when necessary; remember it consumes grace time.

Storage and kernel realities

Be cautious with NFS/remote mounts in critical paths.
Monitor for hung tasks and I/O latency.
Have a node reboot playbook for unkillable D state processes.

Closing notes

“Zombie containers” in production usually boil down to two root causes:

Bad PID 1 behavior (signals not forwarded, children not reaped) — fixable by using tini, exec-form entrypoints, and correct shutdown code.
Kernel-level unkillable waits (D state) — not fixable by signals; requires resolving underlying I/O issues or rebooting the node.

If you want, share:

your Dockerfile and entrypoint,
the output of docker inspect -f '{{.State.Pid}}' and ps -o pid,stat,wchan,cmd -p <PID>,
and whether you’re on Docker or Kubernetes,

and I can suggest a targeted remediation plan for your specific shutdown behavior.

Handling Graceful Shutdowns: Fixing Stuck or Zombie Containers in Production

Handling Graceful Shutdowns: Fixing Stuck or Zombie Containers in Production

Table of Contents

1. What “graceful shutdown” means for containers

2. The signal flow: Docker, containerd, Kubernetes

Docker (classic behavior)

containerd / runc (under the hood)

Kubernetes

3. Common failure modes that create stuck or zombie containers

A) PID 1 doesn’t forward signals

B) PID 1 doesn’t reap children → zombies

C) App ignores SIGTERM or blocks shutdown

D) Uninterruptible sleep (D state)

E) Runtime / cgroup cleanup issues

4. Diagnosing a stuck container (host and inside-container)

4.1 Identify the container and state

4.2 Check what PID 1 is doing (from the host)

4.3 Enter the container’s namespaces without relying on docker exec

4.4 Look for zombies

4.5 Check signal handling quickly

5. Fixing zombie processes: PID 1, init systems, and reaping

5.1 Why zombies happen in containers

5.2 Use a minimal init: tini (recommended)

5.3 If you must use a shell entrypoint, exec properly

5.4 For apps that spawn children: ensure reaping

6. Fixing containers that ignore SIGTERM

6.1 Confirm what signal is sent and what the app receives

6.2 Ensure PID 1 is the app (not a wrapper)

6.3 Increase stop timeout (as a mitigation)

6.4 Application-level shutdown patterns (what “good” looks like)

7. Fixing containers stuck in Stopping or unkillable (D state)

7.1 First attempt: normal stop, then SIGKILL

7.2 Inspect host PID and process state

7.3 Typical root causes of D state in production

7.4 If the container uses NFS or remote volumes

7.5 When removal is stuck: restart runtime services (last resort)

7.6 Host reboot decision

8. Kubernetes specifics: terminationGracePeriodSeconds, preStop, and probes

8.1 Understand the termination timeline

8.2 Configure a realistic grace period

8.3 Use preStop to drain, not to “wait and hope”

8.4 Readiness probes and termination

8.5 Pods stuck in Terminating

9. Practical hardening patterns (Dockerfile, entrypoint, app code)

9.1 Prefer exec-form ENTRYPOINT/CMD

9.2 Add an init for reaping

9.3 Ensure logs flush on shutdown

9.4 Avoid shutdown work that depends on fragile dependencies

9.5 Add explicit timeouts everywhere

10. Incident playbook: step-by-step commands

10.1 Identify the problem container

10.2 Attempt graceful stop with longer timeout

10.3 If still running, inspect PID and state

10.4 Check container process tree via nsenter

10.5 Look for zombies

10.6 If ignoring SIGTERM, send SIGKILL

10.7 If SIGKILL doesn’t work: check for D state and kernel logs

10.8 If Docker metadata is stuck (container “Dead”)

11. Prevention checklist

Container image and entrypoint

Application behavior

Platform configuration

Storage and kernel realities

Closing notes

Related Tutorials

D) Uninterruptible sleep (`D` state)

4.3 Enter the container’s namespaces without relying on `docker exec`

5.2 Use a minimal init: `tini` (recommended)

5.3 If you must use a shell entrypoint, `exec` properly

7. Fixing containers stuck in `Stopping` or unkillable (`D` state)

7.3 Typical root causes of `D` state in production

8.3 Use `preStop` to drain, not to “wait and hope”

8.5 Pods stuck in `Terminating`

10.7 If SIGKILL doesn’t work: check for `D` state and kernel logs