Debugging Failing Docker Builds: How to Read and Fix Dockerfile Errors
Docker builds fail for two broad reasons:
- The Dockerfile instructions don’t do what you think (syntax, shell behavior, build context, cache, permissions).
- The environment inside the build container differs from your local machine (different OS, missing packages, network/DNS issues, architecture mismatches).
This tutorial teaches a practical, repeatable workflow to read Docker build logs, isolate the failing layer, reproduce the failure interactively, and fix the underlying cause. It includes real commands and concrete examples you can copy/paste.
Table of Contents
- Prerequisites
- How Docker Builds Actually Work (and Why Errors Look Weird)
- Reading Build Output: Finding the Real Error
- Turn On Better Logs: BuildKit, Plain Progress, and Debug Output
- A Systematic Debugging Checklist
- Common Dockerfile Error Categories (with Fixes)
- 1) Dockerfile parse errors
- 2)
COPY/ build context problems - 3) Shell form vs exec form confusion
- 4) Package manager failures (apt/apk/yum)
- 5) “Command not found” and PATH issues
- 6) Permission issues and non-root users
- 7) Network/DNS/TLS problems during build
- 8) Architecture mismatches (amd64 vs arm64)
- 9) Cache-related surprises
- 10) Multi-stage build mistakes
- Reproducing a Failing Layer Interactively
- Using
--targetand “debug stages” - Linting and Static Checks (Before You Build)
- Hardening Your Dockerfile to Prevent Future Failures
- A Worked Example: From Failure to Fix
- Quick Reference: Useful Commands
Prerequisites
You should have:
- Docker installed (
docker versionworks) - Basic familiarity with Dockerfiles
- A project directory with a
Dockerfile
Verify Docker is working:
docker version
docker info
How Docker Builds Actually Work (and Why Errors Look Weird)
A Docker build is a sequence of layers created from Dockerfile instructions. Each instruction like RUN, COPY, ADD creates a new filesystem layer (with some nuance for metadata-only instructions like ENV, WORKDIR, USER).
Key consequences:
- Errors point to a specific step, but the root cause might be earlier (missing file copied, wrong working directory, PATH changed, etc.).
- The build context matters: Docker only sees files in the directory you pass to
docker build(and not those excluded by.dockerignore). - Caching can hide changes: A step may not rerun if Docker thinks it can reuse a cached layer.
- The shell matters:
RUNin “shell form” uses/bin/sh -cby default on Linux images; subtle differences from Bash can break scripts.
When a build fails, always ask:
- Which step failed?
- What exact command was executed?
- What filesystem state existed at that step (what files were copied, what user, what working directory)?
- Was the step cached or actually executed?
Reading Build Output: Finding the Real Error
A typical failure looks like:
Step 6/12 : RUN npm ci
---> Running in 3b2b1c...
npm ERR! code EAI_AGAIN
npm ERR! syscall getaddrinfo
npm ERR! errno EAI_AGAIN
npm ERR! request to https://registry.npmjs.org/... failed
The command '/bin/sh -c npm ci' returned a non-zero code: 1
What matters:
- Step number:
Step 6/12 - Instruction:
RUN npm ci - Exact executed command:
'/bin/sh -c npm ci' - Exit code: non-zero means failure
- The real error lines: here it’s DNS/network (
EAI_AGAIN)
Sometimes the actual error is above the final line. Scroll up. If you’re using a UI progress mode, switch to plain logs (next section).
Turn On Better Logs: BuildKit, Plain Progress, and Debug Output
Use BuildKit (recommended)
BuildKit is the modern builder. It provides better caching, parallelism, and more readable output.
Check whether BuildKit is enabled:
docker buildx version
Use buildx explicitly:
docker buildx build -t myimage:debug .
Force plain progress output (easier to read)
docker buildx build --progress=plain -t myimage:debug .
Or with classic docker build:
DOCKER_BUILDKIT=1 docker build --progress=plain -t myimage:debug .
Disable cache to ensure steps rerun
docker buildx build --no-cache --progress=plain -t myimage:nocache .
Increase verbosity inside RUN
For shell scripts, add flags:
- For
bash:set -euxo pipefail - For POSIX
sh:set -eu(nopipefailin many/bin/sh)
Example:
RUN set -eu; \
echo "PWD=$(pwd)"; \
ls -la; \
some-command --version
A Systematic Debugging Checklist
When a build fails, follow this order:
- Re-run with plain logs
DOCKER_BUILDKIT=1 docker build --progress=plain -t test . - Identify the failing step and instruction (
RUN,COPY, etc.). - Confirm build context (are the needed files actually sent to Docker?).
- Check
.dockerignorefor accidentally excluded files. - Reproduce the failing command in an interactive container (same base image).
- Fix the Dockerfile:
- correct paths and working directory
- install missing dependencies
- handle permissions
- pin versions or add retries for flaky networks
- Rebuild with
--no-cacheto confirm the fix is real. - Re-enable cache and ensure the Dockerfile is structured for stable caching.
Common Dockerfile Error Categories (with Fixes)
1) Dockerfile parse errors
Symptoms
- Errors like:
failed to solve: dockerfile parse error line X: unknown instructionunknown flag: --frominvalid reference format
Typical causes
- Misspelled Dockerfile instruction (
RNUinstead ofRUN) - Wrong capitalization is usually okay, but typos are not.
- Using
COPY --from=in a Docker version that doesn’t support it (rare now). - Quoting mistakes.
Example
RUN echo "hello
This fails because the quote is not closed.
Fix
Close quotes, or use a heredoc (BuildKit supports heredocs in RUN):
# syntax=docker/dockerfile:1.6
RUN <<'SH'
set -eu
echo "hello"
SH
If you see parse errors, fix them first—no other debugging matters until the Dockerfile parses.
2) COPY / build context problems
Symptoms
COPY failed: file not found in build context or excluded by .dockerignorefailed to compute cache key: ... not found
Root cause
Docker can only access files inside the build context directory passed to docker build. If you run:
docker build -t app -f docker/Dockerfile .
The context is . (current directory). But if you run:
docker build -t app -f docker/Dockerfile docker
The context is docker/, and files outside docker/ are invisible.
Debug steps
- Print your build command and confirm the final argument (the context):
docker build ... <context>
- Inspect
.dockerignore:cat .dockerignore
Example failure
COPY package.json package-lock.json ./
But .dockerignore contains:
package-lock.json
Fix
Remove the ignore rule, or copy files differently.
Pro tip: show what was sent as context
BuildKit doesn’t directly print the full context, but you can sanity-check by temporarily adding:
RUN ls -la
after a COPY to confirm the files exist.
3) Shell form vs exec form confusion
Dockerfile instructions like RUN, CMD, and ENTRYPOINT have two forms:
-
Shell form:
RUN echo $HOME- executed as
/bin/sh -c "echo $HOME" - environment variable expansion happens
- shell features like
&&, pipes work
- executed as
-
Exec form:
RUN ["echo", "$HOME"]- no shell
$HOMEis passed literally (no expansion)- safer for JSON-array commands and signals
Common error
RUN ["cd", "/app"]
This fails because cd is a shell builtin, not an executable.
Fix
Use shell form:
WORKDIR /app
# or:
RUN cd /app && make build
Another common error
CMD ["npm start"]
This tries to execute a binary literally called npm start.
Fix
Use:
CMD ["npm", "start"]
# or shell form if you need shell features:
CMD npm start
4) Package manager failures (apt/apk/yum)
Debian/Ubuntu (apt-get) pitfalls
Symptoms
E: Unable to locate package ...Temporary failure resolving ...Hash Sum mismatchThe following signatures couldn't be verified
Best practice pattern
RUN set -eu; \
apt-get update; \
apt-get install -y --no-install-recommends \
ca-certificates curl; \
rm -rf /var/lib/apt/lists/*
Why this pattern matters
apt-get updatemust run beforeapt-get installin the same layer, otherwise cached layers can break installs.- Cleaning
/var/lib/apt/lists/*reduces image size and avoids stale index issues.
Common mistake
RUN apt-get update
RUN apt-get install -y curl
If the apt-get update layer is cached but the package index is outdated, install can fail.
Alpine (apk) pitfalls
Best practice
RUN set -eu; \
apk add --no-cache curl ca-certificates
Common error
ERROR: unable to select packages: ...- package name differs between distros (e.g.,
libssl1.1vsopenssl)
- package name differs between distros (e.g.,
RHEL/CentOS/Fedora (yum/dnf)
RUN set -eu; \
dnf -y install curl ca-certificates; \
dnf clean all
5) “Command not found” and PATH issues
Symptoms
/bin/sh: 1: some-tool: not foundexec: "python": executable file not found in $PATH
Root causes
- The tool isn’t installed
- It’s installed but not in
PATH - You’re using a minimal base image (e.g.,
alpine,scratch,distroless) - You installed it in a previous stage but didn’t copy it into the final stage
Debug
Add:
RUN set -eu; which python || true; echo "$PATH"; ls -la /usr/bin | head
Fix
Install the tool in the stage where it’s used:
RUN apt-get update && apt-get install -y --no-install-recommends python3
Or call it by full path if needed:
RUN /usr/bin/python3 --version
6) Permission issues and non-root users
Symptoms
Permission deniedEACCES: permission denied- Cannot write to directories like
/root,/usr/local, or app directories
Root causes
- You switched to a non-root user with
USER app - You
COPYfiles owned by root into a directory a non-root user can’t write - You’re trying to install packages as non-root
Best practice pattern
- Create user
- Create directories
chownas needed- Switch
USER
Example:
RUN set -eu; \
useradd -m -u 10001 appuser; \
mkdir -p /app; \
chown -R appuser:appuser /app
WORKDIR /app
COPY --chown=appuser:appuser . /app
USER appuser
Debug permissions
RUN set -eu; id; ls -ld /app; ls -la /app | head
7) Network/DNS/TLS problems during build
Symptoms
Temporary failure resolvingCould not resolve hostx509: certificate signed by unknown authorityTLS handshake timeout
Root causes
- Corporate proxy or MITM TLS interception
- Missing CA certificates in minimal images
- Docker daemon DNS configuration issues
- Flaky network during build
Fix: install CA certificates
Debian/Ubuntu:
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates
Alpine:
RUN apk add --no-cache ca-certificates
Fix: pass proxy build args
docker build \
--build-arg http_proxy=http://proxy.example:3128 \
--build-arg https_proxy=http://proxy.example:3128 \
--build-arg no_proxy=localhost,127.0.0.1 \
-t app .
In Dockerfile (optional):
ARG http_proxy
ARG https_proxy
ARG no_proxy
Debug DNS inside build (interactive approach)
Use an interactive container from the same base image:
docker run --rm -it debian:bookworm-slim sh
cat /etc/resolv.conf
getent hosts registry.npmjs.org || true
apt-get update
8) Architecture mismatches (amd64 vs arm64)
Symptoms
exec format error- Installing prebuilt binaries fails
- Running downloaded binaries fails during build
This often happens on Apple Silicon (arm64) building images intended for amd64, or when a Dockerfile downloads an amd64-only binary.
Check your build platform
docker version --format '{{.Server.Os}}/{{.Server.Arch}}'
docker buildx ls
Build for a specific platform
docker buildx build --platform linux/amd64 -t myimage:amd64 .
Fix downloads by selecting arch
In RUN steps, detect architecture:
RUN set -eu; \
arch="$(uname -m)"; \
echo "arch=$arch"; \
case "$arch" in \
x86_64) url="https://example.com/tool-linux-amd64";; \
aarch64) url="https://example.com/tool-linux-arm64";; \
*) echo "unsupported arch: $arch" >&2; exit 1;; \
esac; \
curl -fsSL "$url" -o /usr/local/bin/tool; \
chmod +x /usr/local/bin/tool
9) Cache-related surprises
Symptoms
- You fixed something but build still fails the same way
- A step doesn’t re-run when you expect it to
COPYchanges don’t trigger rebuilds due to ordering
Tools
- Disable cache:
docker buildx build --no-cache --progress=plain -t app:nocache . - Inspect history:
docker history app:latest
Best practice: order layers for stable caching
For Node.js:
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm test
This ensures dependency install is cached unless lockfiles change.
10) Multi-stage build mistakes
Symptoms
COPY --from=builder ...fails: file not found- Build succeeds but runtime container fails because dependencies weren’t copied
Example mistake
FROM node:20 AS builder
WORKDIR /app
COPY . .
RUN npm ci && npm run build
FROM node:20-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/server.js"]
If your build output is actually in /app/build not /app/dist, the COPY fails or runtime fails.
Debug
Add in builder stage:
RUN set -eu; ls -la /app; ls -la /app/dist || true; ls -la /app/build || true
Fix
Copy the correct path, and also copy runtime dependencies if needed:
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
Or rebuild dependencies in the final stage (often smaller/cleaner for production).
Reproducing a Failing Layer Interactively
One of the fastest ways to debug is to run an interactive shell in the same base image and execute the failing commands manually.
Step 1: Identify the base image at the failing step
If the failure happens after FROM python:3.12-slim, use that.
Step 2: Start a container
docker run --rm -it python:3.12-slim bash
If the image doesn’t have bash, use sh:
docker run --rm -it alpine:3.19 sh
Step 3: Mimic the Dockerfile environment
If your Dockerfile sets WORKDIR /app and copies files, replicate:
mkdir -p /app
cd /app
You can mount your project directory for debugging:
docker run --rm -it -v "$PWD":/app -w /app python:3.12-slim bash
Now run the failing command exactly as in the Dockerfile:
pip install -r requirements.txt
pytest -q
This isolates whether the issue is Dockerfile-related (paths, missing files) or environment-related (OS deps, permissions).
Using --target and “debug stages”
Multi-stage builds can be debugged by stopping at an intermediate stage.
Example Dockerfile:
FROM node:20 AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci
FROM deps AS build
COPY . .
RUN npm run build
FROM node:20-slim AS runtime
WORKDIR /app
COPY --from=build /app/dist ./dist
CMD ["node", "dist/server.js"]
Build only up to build:
docker buildx build --target build --progress=plain -t app:buildstage .
Then run a shell in that stage image:
docker run --rm -it app:buildstage sh
ls -la /app
ls -la /app/dist
Add a dedicated debug stage
Sometimes you want tooling (curl, bash, strace) only for debugging:
FROM runtime AS debug
RUN apt-get update && apt-get install -y --no-install-recommends curl bash && rm -rf /var/lib/apt/lists/*
CMD ["bash"]
Build and run:
docker build -t app:debug --target debug .
docker run --rm -it app:debug
Linting and Static Checks (Before You Build)
Hadolint (Dockerfile linter)
Install (example using Docker):
docker run --rm -i hadolint/hadolint < Dockerfile
This catches common issues like:
- using
apt-getwithout cleaning lists - not pinning versions (optional)
- using
ADDwhenCOPYis better - shellcheck-like warnings for
RUN
ShellCheck for scripts embedded in RUN
If you have complex shell logic, consider moving it into a script file and running ShellCheck locally, then COPY it in.
Example:
shellcheck scripts/build.sh
Hardening Your Dockerfile to Prevent Future Failures
Use explicit shells and strict mode
If you rely on Bash features, set it:
SHELL ["/bin/bash", "-c"]
RUN set -euxo pipefail; \
echo "Using bash with pipefail"
If you stay with /bin/sh, use:
RUN set -eu; \
do_something
Prefer deterministic installs
- Use lockfiles (
package-lock.json,poetry.lock,Pipfile.lock) - Pin base images (at least to major/minor) to avoid sudden breakage:
python:3.12-slimis fairly stableubuntu:latestis not stable for reproducible builds
Reduce network flakiness
- Combine downloads where possible
- Use retries for curl:
RUN set -eu; \
curl -fsSL --retry 5 --retry-delay 2 --retry-all-errors \
https://example.com/file.tar.gz -o /tmp/file.tar.gz
Make failures obvious
Print versions early:
RUN node --version && npm --version
List directories after critical copies:
RUN set -eu; ls -la /app
These lines can be removed later, but during debugging they save time.
A Worked Example: From Failure to Fix
Scenario
You have a Node.js app. Docker build fails at npm ci with:
npm ERR! code EACCES
npm ERR! syscall mkdir
npm ERR! path /home/node/.npm
npm ERR! errno -13
npm ERR! Error: EACCES: permission denied, mkdir '/home/node/.npm'
Problem Dockerfile
FROM node:20-slim
WORKDIR /app
COPY package*.json ./
USER node
RUN npm ci
COPY . .
CMD ["npm", "start"]
Read the error correctly
- Failing step:
RUN npm ci - Error: cannot create
/home/node/.npm - Cause: permissions or missing home directory state for
nodeuser (or npm cache location)
Fix options
Fix A: Use a writable npm cache directory
FROM node:20-slim
WORKDIR /app
COPY package*.json ./
USER node
ENV NPM_CONFIG_CACHE=/tmp/.npm
RUN npm ci
COPY . .
CMD ["npm", "start"]
Fix B: Ensure ownership of working directory and home
If /app is owned by root, npm may try to write there during install. Ensure correct ownership:
FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN set -eu; \
chown -R node:node /app
USER node
RUN npm ci
COPY --chown=node:node . .
CMD ["npm", "start"]
Verify with a clean rebuild
docker buildx build --no-cache --progress=plain -t myapp:fixed .
docker run --rm -p 3000:3000 myapp:fixed
If it still fails, reproduce interactively:
docker run --rm -it node:20-slim bash
id
ls -la /app || true
Then iterate.
Quick Reference: Useful Commands
Build with readable logs
DOCKER_BUILDKIT=1 docker build --progress=plain -t app:debug .
Build without cache
docker buildx build --no-cache --progress=plain -t app:nocache .
Stop at a stage
docker buildx build --target build --progress=plain -t app:build .
Run a shell in an image
docker run --rm -it app:debug sh
# or
docker run --rm -it app:debug bash
Mount your project into a container for debugging
docker run --rm -it -v "$PWD":/app -w /app python:3.12-slim bash
Inspect image layers
docker history app:latest
Lint Dockerfile
docker run --rm -i hadolint/hadolint < Dockerfile
Closing Workflow (Use This Every Time)
- Rebuild with plain logs (
--progress=plain). - Identify the failing step and the exact command.
- Confirm files exist in the build context and aren’t ignored.
- Reproduce the failing command in an interactive container from the same base image.
- Fix the Dockerfile (paths, shell form, permissions, package installs, network).
- Rebuild with
--no-cacheto confirm the fix. - Re-enable caching and reorder layers for stability.
If you want, paste your failing build log (with the step number and the failing instruction) and your Dockerfile, and I can point to the most likely root cause and a minimal fix.