← Back to Tutorials

Handling Graceful Shutdowns: Fixing Stuck or Zombie Containers in Production

devopscontainersdockerkubernetesgraceful-shutdownsignal-handlingproduction-opstroubleshooting

Handling Graceful Shutdowns: Fixing Stuck or Zombie Containers in Production

Production container platforms are optimized for starting and stopping workloads quickly. But “stop” is not a single action: it is a sequence of signals, timeouts, process behavior, and kernel mechanics. When that sequence breaks, you get containers that won’t die, containers that are “Exited” but still hold resources, or “zombie” processes accumulating inside a container. This tutorial explains why that happens and how to fix it—using real commands and production-safe patterns.


Table of Contents


1. What “graceful shutdown” means for containers

A container is not a VM; it’s a set of Linux processes isolated by namespaces and controlled by cgroups. Stopping a container typically means:

  1. Send a “please exit” signal (usually SIGTERM) to the container’s main process (PID 1 inside the container).
  2. Wait for a grace period.
  3. If it hasn’t exited, send SIGKILL (force kill).
  4. Tear down networking, cgroups, mounts, and release resources.

A graceful shutdown is successful when:

When it fails, you may observe:


2. The signal flow: Docker, containerd, Kubernetes

Docker (classic behavior)

Commands:

docker stop --time 20 myapp
docker kill --signal=SIGTERM myapp
docker kill --signal=SIGKILL myapp

containerd / runc (under the hood)

Docker and Kubernetes ultimately rely on an OCI runtime (commonly runc). The runtime sends signals to the container process and manages cgroups and namespaces. If the runtime can’t signal or can’t reap, you can see “stuck” states.

Kubernetes

Kubernetes termination sequence (simplified):

  1. Pod gets a deletion timestamp.
  2. Endpoints are updated (pod removed from Service endpoints).
  3. If defined, preStop hook runs.
  4. Kubelet asks runtime to stop the container:
    • Sends SIGTERM.
    • Waits terminationGracePeriodSeconds.
    • Sends SIGKILL.

If your app needs 30 seconds to drain connections, but grace is 10 seconds, you’ll see forced kills and potentially corrupted work.


3. Common failure modes that create stuck or zombie containers

A) PID 1 doesn’t forward signals

Inside a container, PID 1 has special semantics: it may ignore some signals by default, and it is responsible for reaping orphaned child processes. If PID 1 is a shell script that doesn’t exec the real app, signals may never reach the app.

Bad pattern:

#!/bin/sh
myserver &   # runs in background
wait         # PID 1 waits, but signal handling is often wrong here

Better pattern:

#!/bin/sh
exec myserver

B) PID 1 doesn’t reap children → zombies

If your app spawns child processes and doesn’t wait() for them, they become zombies (STAT=Z). In a normal Linux system, systemd (PID 1) reaps them. In containers, your app is PID 1 and must reap or you need a minimal init.

C) App ignores SIGTERM or blocks shutdown

Common causes:

D) Uninterruptible sleep (D state)

If a process is stuck in kernel space (often I/O), SIGKILL won’t kill it. This is not a “container problem”; it’s a host/kernel/storage problem. Symptoms:

E) Runtime / cgroup cleanup issues

Sometimes the process exits but cgroup cleanup hangs due to kernel or runtime issues. You might see containers stuck in “Removing” or “Dead”.


4. Diagnosing a stuck container (host and inside-container)

4.1 Identify the container and state

docker ps -a --no-trunc
docker inspect -f '{{.State.Status}} {{.State.Running}} {{.State.Pid}} {{.State.FinishedAt}}' myapp

If .State.Pid is non-zero, the container still has a running init process on the host.

4.2 Check what PID 1 is doing (from the host)

Get the host PID:

PID=$(docker inspect -f '{{.State.Pid}}' myapp)
echo "$PID"

Inspect process state:

ps -o pid,ppid,stat,etime,cmd -p "$PID"
cat /proc/"$PID"/status | sed -n '1,40p'

If you see State: D (disk sleep) or STAT includes D, you likely have an unkillable process.

Check open files and what it’s waiting on:

sudo ls -l /proc/"$PID"/fd | head
sudo cat /proc/"$PID"/wchan

If wchan shows something like nfs_*, fuse_*, or block I/O wait, suspect storage.

4.3 Enter the container’s namespaces without relying on docker exec

If docker exec hangs (it can if the runtime is unhealthy), use nsenter:

sudo nsenter -t "$PID" -m -u -i -n -p -- bash -lc 'ps auxf'

If the image doesn’t have bash, use sh:

sudo nsenter -t "$PID" -m -u -i -n -p -- sh -lc 'ps -eo pid,ppid,stat,cmd --forest'

4.4 Look for zombies

Inside the container namespace:

ps -eo pid,ppid,stat,cmd | awk '$3 ~ /Z/ {print}'

Or a quick count:

ps -eo stat | grep -c Z

If zombies exist and PID 1 is not reaping, they will accumulate over time.

4.5 Check signal handling quickly

From the host, send SIGTERM and see if it exits:

docker kill --signal=SIGTERM myapp
sleep 2
docker inspect -f '{{.State.Running}}' myapp

If it stays running, either it ignores SIGTERM, is stuck, or PID 1 is not your app.


5. Fixing zombie processes: PID 1, init systems, and reaping

5.1 Why zombies happen in containers

A zombie process is a process that has exited but still has an entry in the process table because its parent hasn’t collected its exit status via wait().

In a container:

tini is a tiny init process that:

Docker run:

docker run --init myimage:latest

Docker’s --init uses tini under the hood on many installations.

Dockerfile approach (explicit):

FROM debian:stable-slim

RUN apt-get update && apt-get install -y --no-install-recommends tini ca-certificates \
  && rm -rf /var/lib/apt/lists/*

ENTRYPOINT ["/usr/bin/tini","--"]
CMD ["./myserver"]

5.3 If you must use a shell entrypoint, exec properly

Bad:

#!/bin/sh
./myserver

This keeps the shell as PID 1; signals go to the shell, not necessarily to myserver.

Good:

#!/bin/sh
exec ./myserver

Now myserver becomes PID 1 and receives signals directly.

5.4 For apps that spawn children: ensure reaping

If you’re writing the app, implement child reaping or avoid spawning unmanaged children. For example, in Go you typically don’t need to spawn OS processes for concurrency; use goroutines. If you do spawn processes, call Wait() and handle SIGCHLD.

If you can’t change the app, use tini or dumb-init.


6. Fixing containers that ignore SIGTERM

6.1 Confirm what signal is sent and what the app receives

Docker sends SIGTERM by default. Some apps only handle SIGINT (Ctrl+C) in dev setups. You can test:

docker kill --signal=SIGINT myapp

If SIGINT works but SIGTERM doesn’t, fix the app to handle SIGTERM correctly.

6.2 Ensure PID 1 is the app (not a wrapper)

Check:

docker exec myapp ps -p 1 -o pid,cmd

If PID 1 is sh, bash, python entrypoint.py, or a supervisor, ensure it forwards signals and exits when the child exits.

6.3 Increase stop timeout (as a mitigation)

If the app is slow but correct:

docker stop --time 60 myapp

For Compose:

docker compose stop -t 60

This is not a “fix” if the app never exits, but it prevents premature SIGKILL for workloads that legitimately need time to drain.

6.4 Application-level shutdown patterns (what “good” looks like)

A robust server shutdown generally does:

If you run HTTP services behind a load balancer, also consider:


7. Fixing containers stuck in Stopping or unkillable (D state)

7.1 First attempt: normal stop, then SIGKILL

docker stop --time 20 myapp
docker kill --signal=SIGKILL myapp

If docker kill returns success but the container remains running, the process may be in D state or the runtime is stuck.

7.2 Inspect host PID and process state

PID=$(docker inspect -f '{{.State.Pid}}' myapp)
ps -o pid,stat,wchan,cmd -p "$PID"

If stat includes D, you cannot kill it from userspace. Your options shift to fixing the underlying kernel wait condition.

7.3 Typical root causes of D state in production

Check kernel logs:

dmesg -T | tail -n 200
journalctl -k --since "30 min ago"

Look for I/O errors, NFS timeouts, or hung task warnings.

7.4 If the container uses NFS or remote volumes

List mounts used by the process:

sudo cat /proc/"$PID"/mountinfo | head -n 50
sudo lsof -p "$PID" | head

If you suspect NFS, see NFS stats:

nfsstat -m 2>/dev/null || true

Mitigations:

7.5 When removal is stuck: restart runtime services (last resort)

On a Docker host (systemd-based), restarting Docker can release runtime deadlocks, but it can also disrupt running containers. Use extreme caution.

sudo systemctl status docker
sudo systemctl restart docker

On Kubernetes nodes with containerd:

sudo systemctl status containerd
sudo systemctl restart containerd

If a process is truly unkillable (D state), even restarting the runtime may not help. The process remains until the kernel wait resolves or the host reboots.

7.6 Host reboot decision

If you have confirmed:

then a controlled node reboot may be the only resolution. In Kubernetes, cordon and drain first when possible:

kubectl cordon <node>
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data --grace-period=60 --timeout=10m

If drain cannot complete due to stuck pods, you may need forced deletion (see Kubernetes section), but understand it may leave resources behind until reboot.


8. Kubernetes specifics: terminationGracePeriodSeconds, preStop, and probes

8.1 Understand the termination timeline

If your preStop sleeps 20 seconds and your grace period is 30 seconds, your app has at most ~10 seconds to shut down after preStop completes.

8.2 Configure a realistic grace period

Example:

kubectl get pod myapp -o jsonpath='{.spec.terminationGracePeriodSeconds}{"\n"}'

A typical web service might need 30–60 seconds depending on request duration and connection draining.

8.3 Use preStop to drain, not to “wait and hope”

A useful preStop might call an internal endpoint to start draining:

kubectl exec deploy/myapp -- curl -sf http://127.0.0.1:8080/drain

In a Pod spec, the hook could be:

Be careful: preStop failures can shorten your effective shutdown time.

8.4 Readiness probes and termination

A strong pattern:

If readiness stays “ready” during shutdown, traffic may continue to hit the pod until it dies.

8.5 Pods stuck in Terminating

Get details:

kubectl get pod -n myns mypod -o wide
kubectl describe pod -n myns mypod
kubectl get pod -n myns mypod -o json | jq '.metadata.finalizers, .status.containerStatuses'

Common causes:

Force delete (dangerous; use when node is unhealthy and you accept cleanup later):

kubectl delete pod -n myns mypod --grace-period=0 --force

If the node is unreachable, Kubernetes will remove the API object, but the process may still run on the node until it recovers or reboots.


9. Practical hardening patterns (Dockerfile, entrypoint, app code)

9.1 Prefer exec-form ENTRYPOINT/CMD

Exec form avoids an extra shell and preserves signal delivery:

ENTRYPOINT ["./myserver"]

If you need arguments:

CMD ["--port=8080","--log-level=info"]

Avoid:

ENTRYPOINT ./myserver --port=8080

That uses a shell and can break signal handling.

9.2 Add an init for reaping

Use Docker --init in runtime config, or bake tini in the image (especially for Kubernetes where --init is not a Pod setting).

9.3 Ensure logs flush on shutdown

If you use buffered logging, flush on SIGTERM. Otherwise you’ll see truncated logs exactly when you need them most.

9.4 Avoid shutdown work that depends on fragile dependencies

Common mistake: on SIGTERM, write final state to an NFS mount or a remote DB and block indefinitely. Use timeouts and fallbacks.

9.5 Add explicit timeouts everywhere

If the app can’t stop within the platform grace period, it will eventually be SIGKILLed.


10. Incident playbook: step-by-step commands

This section is a practical sequence you can run during an incident on a Docker host. Adjust names and be mindful of impact.

10.1 Identify the problem container

docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Image}}'
docker ps -a --no-trunc | grep -E 'Stopping|Dead|Exited'

10.2 Attempt graceful stop with longer timeout

docker stop --time 60 myapp

10.3 If still running, inspect PID and state

PID=$(docker inspect -f '{{.State.Pid}}' myapp)
ps -o pid,ppid,stat,etime,wchan,cmd -p "$PID"

10.4 Check container process tree via nsenter

sudo nsenter -t "$PID" -m -u -i -n -p -- sh -lc 'ps -eo pid,ppid,stat,cmd --forest | sed -n "1,200p"'

10.5 Look for zombies

sudo nsenter -t "$PID" -p -- sh -lc 'ps -eo pid,ppid,stat,cmd | awk "$3 ~ /Z/ {print}"'

If zombies are present, plan a redeploy with tini/proper PID 1 behavior.

10.6 If ignoring SIGTERM, send SIGKILL

docker kill --signal=SIGKILL myapp

10.7 If SIGKILL doesn’t work: check for D state and kernel logs

ps -o pid,stat,wchan,cmd -p "$PID"
dmesg -T | tail -n 100
journalctl -k --since "15 min ago" | tail -n 200

If D state correlates with storage/network issues, engage the storage layer. If the node is wedged, prepare for reboot.

10.8 If Docker metadata is stuck (container “Dead”)

Sometimes Docker shows a container as Dead and it can’t be removed. Try:

docker rm -f myapp

If it hangs, you may need to restart Docker after assessing impact:

sudo systemctl restart docker

On containerd-based systems:

sudo systemctl restart containerd

If the underlying process is unkillable, runtime restarts won’t fix it—only resolution of the kernel wait or reboot will.


11. Prevention checklist

Use this as a pre-production and post-incident checklist.

Container image and entrypoint

Application behavior

Platform configuration

Storage and kernel realities


Closing notes

“Zombie containers” in production usually boil down to two root causes:

  1. Bad PID 1 behavior (signals not forwarded, children not reaped) — fixable by using tini, exec-form entrypoints, and correct shutdown code.
  2. Kernel-level unkillable waits (D state) — not fixable by signals; requires resolving underlying I/O issues or rebooting the node.

If you want, share:

and I can suggest a targeted remediation plan for your specific shutdown behavior.