← Back to Tutorials

Debugging Failing Docker Builds: How to Read and Fix Dockerfile Errors

dockerdockerfiledebuggingdevopscontainersci-cdbuild-errorstroubleshooting

Debugging Failing Docker Builds: How to Read and Fix Dockerfile Errors

Docker builds fail for two broad reasons:

  1. The Dockerfile instructions don’t do what you think (syntax, shell behavior, build context, cache, permissions).
  2. The environment inside the build container differs from your local machine (different OS, missing packages, network/DNS issues, architecture mismatches).

This tutorial teaches a practical, repeatable workflow to read Docker build logs, isolate the failing layer, reproduce the failure interactively, and fix the underlying cause. It includes real commands and concrete examples you can copy/paste.


Table of Contents


Prerequisites

You should have:

Verify Docker is working:

docker version
docker info

How Docker Builds Actually Work (and Why Errors Look Weird)

A Docker build is a sequence of layers created from Dockerfile instructions. Each instruction like RUN, COPY, ADD creates a new filesystem layer (with some nuance for metadata-only instructions like ENV, WORKDIR, USER).

Key consequences:

When a build fails, always ask:

  1. Which step failed?
  2. What exact command was executed?
  3. What filesystem state existed at that step (what files were copied, what user, what working directory)?
  4. Was the step cached or actually executed?

Reading Build Output: Finding the Real Error

A typical failure looks like:

Step 6/12 : RUN npm ci
 ---> Running in 3b2b1c...
npm ERR! code EAI_AGAIN
npm ERR! syscall getaddrinfo
npm ERR! errno EAI_AGAIN
npm ERR! request to https://registry.npmjs.org/... failed
The command '/bin/sh -c npm ci' returned a non-zero code: 1

What matters:

Sometimes the actual error is above the final line. Scroll up. If you’re using a UI progress mode, switch to plain logs (next section).


Turn On Better Logs: BuildKit, Plain Progress, and Debug Output

BuildKit is the modern builder. It provides better caching, parallelism, and more readable output.

Check whether BuildKit is enabled:

docker buildx version

Use buildx explicitly:

docker buildx build -t myimage:debug .

Force plain progress output (easier to read)

docker buildx build --progress=plain -t myimage:debug .

Or with classic docker build:

DOCKER_BUILDKIT=1 docker build --progress=plain -t myimage:debug .

Disable cache to ensure steps rerun

docker buildx build --no-cache --progress=plain -t myimage:nocache .

Increase verbosity inside RUN

For shell scripts, add flags:

Example:

RUN set -eu; \
    echo "PWD=$(pwd)"; \
    ls -la; \
    some-command --version

A Systematic Debugging Checklist

When a build fails, follow this order:

  1. Re-run with plain logs
    DOCKER_BUILDKIT=1 docker build --progress=plain -t test .
  2. Identify the failing step and instruction (RUN, COPY, etc.).
  3. Confirm build context (are the needed files actually sent to Docker?).
  4. Check .dockerignore for accidentally excluded files.
  5. Reproduce the failing command in an interactive container (same base image).
  6. Fix the Dockerfile:
    • correct paths and working directory
    • install missing dependencies
    • handle permissions
    • pin versions or add retries for flaky networks
  7. Rebuild with --no-cache to confirm the fix is real.
  8. Re-enable cache and ensure the Dockerfile is structured for stable caching.

Common Dockerfile Error Categories (with Fixes)

1) Dockerfile parse errors

Symptoms

Typical causes

Example

RUN echo "hello

This fails because the quote is not closed.

Fix

Close quotes, or use a heredoc (BuildKit supports heredocs in RUN):

# syntax=docker/dockerfile:1.6
RUN <<'SH'
set -eu
echo "hello"
SH

If you see parse errors, fix them first—no other debugging matters until the Dockerfile parses.


2) COPY / build context problems

Symptoms

Root cause

Docker can only access files inside the build context directory passed to docker build. If you run:

docker build -t app -f docker/Dockerfile .

The context is . (current directory). But if you run:

docker build -t app -f docker/Dockerfile docker

The context is docker/, and files outside docker/ are invisible.

Debug steps

  1. Print your build command and confirm the final argument (the context):
    • docker build ... <context>
  2. Inspect .dockerignore:
    cat .dockerignore

Example failure

COPY package.json package-lock.json ./

But .dockerignore contains:

package-lock.json

Fix

Remove the ignore rule, or copy files differently.

Pro tip: show what was sent as context

BuildKit doesn’t directly print the full context, but you can sanity-check by temporarily adding:

RUN ls -la

after a COPY to confirm the files exist.


3) Shell form vs exec form confusion

Dockerfile instructions like RUN, CMD, and ENTRYPOINT have two forms:

Common error

RUN ["cd", "/app"]

This fails because cd is a shell builtin, not an executable.

Fix

Use shell form:

WORKDIR /app
# or:
RUN cd /app && make build

Another common error

CMD ["npm start"]

This tries to execute a binary literally called npm start.

Fix

Use:

CMD ["npm", "start"]
# or shell form if you need shell features:
CMD npm start

4) Package manager failures (apt/apk/yum)

Debian/Ubuntu (apt-get) pitfalls

Symptoms

Best practice pattern

RUN set -eu; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
      ca-certificates curl; \
    rm -rf /var/lib/apt/lists/*

Why this pattern matters

Common mistake

RUN apt-get update
RUN apt-get install -y curl

If the apt-get update layer is cached but the package index is outdated, install can fail.

Alpine (apk) pitfalls

Best practice

RUN set -eu; \
    apk add --no-cache curl ca-certificates

Common error

RHEL/CentOS/Fedora (yum/dnf)

RUN set -eu; \
    dnf -y install curl ca-certificates; \
    dnf clean all

5) “Command not found” and PATH issues

Symptoms

Root causes

Debug

Add:

RUN set -eu; which python || true; echo "$PATH"; ls -la /usr/bin | head

Fix

Install the tool in the stage where it’s used:

RUN apt-get update && apt-get install -y --no-install-recommends python3

Or call it by full path if needed:

RUN /usr/bin/python3 --version

6) Permission issues and non-root users

Symptoms

Root causes

Best practice pattern

  1. Create user
  2. Create directories
  3. chown as needed
  4. Switch USER

Example:

RUN set -eu; \
    useradd -m -u 10001 appuser; \
    mkdir -p /app; \
    chown -R appuser:appuser /app

WORKDIR /app
COPY --chown=appuser:appuser . /app
USER appuser

Debug permissions

RUN set -eu; id; ls -ld /app; ls -la /app | head

7) Network/DNS/TLS problems during build

Symptoms

Root causes

Fix: install CA certificates

Debian/Ubuntu:

RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates

Alpine:

RUN apk add --no-cache ca-certificates

Fix: pass proxy build args

docker build \
  --build-arg http_proxy=http://proxy.example:3128 \
  --build-arg https_proxy=http://proxy.example:3128 \
  --build-arg no_proxy=localhost,127.0.0.1 \
  -t app .

In Dockerfile (optional):

ARG http_proxy
ARG https_proxy
ARG no_proxy

Debug DNS inside build (interactive approach)
Use an interactive container from the same base image:

docker run --rm -it debian:bookworm-slim sh
cat /etc/resolv.conf
getent hosts registry.npmjs.org || true
apt-get update

8) Architecture mismatches (amd64 vs arm64)

Symptoms

This often happens on Apple Silicon (arm64) building images intended for amd64, or when a Dockerfile downloads an amd64-only binary.

Check your build platform

docker version --format '{{.Server.Os}}/{{.Server.Arch}}'
docker buildx ls

Build for a specific platform

docker buildx build --platform linux/amd64 -t myimage:amd64 .

Fix downloads by selecting arch

In RUN steps, detect architecture:

RUN set -eu; \
    arch="$(uname -m)"; \
    echo "arch=$arch"; \
    case "$arch" in \
      x86_64) url="https://example.com/tool-linux-amd64";; \
      aarch64) url="https://example.com/tool-linux-arm64";; \
      *) echo "unsupported arch: $arch" >&2; exit 1;; \
    esac; \
    curl -fsSL "$url" -o /usr/local/bin/tool; \
    chmod +x /usr/local/bin/tool

Symptoms

Tools

Best practice: order layers for stable caching

For Node.js:

WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm test

This ensures dependency install is cached unless lockfiles change.


10) Multi-stage build mistakes

Symptoms

Example mistake

FROM node:20 AS builder
WORKDIR /app
COPY . .
RUN npm ci && npm run build

FROM node:20-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/server.js"]

If your build output is actually in /app/build not /app/dist, the COPY fails or runtime fails.

Debug

Add in builder stage:

RUN set -eu; ls -la /app; ls -la /app/dist || true; ls -la /app/build || true

Fix

Copy the correct path, and also copy runtime dependencies if needed:

COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist

Or rebuild dependencies in the final stage (often smaller/cleaner for production).


Reproducing a Failing Layer Interactively

One of the fastest ways to debug is to run an interactive shell in the same base image and execute the failing commands manually.

Step 1: Identify the base image at the failing step

If the failure happens after FROM python:3.12-slim, use that.

Step 2: Start a container

docker run --rm -it python:3.12-slim bash

If the image doesn’t have bash, use sh:

docker run --rm -it alpine:3.19 sh

Step 3: Mimic the Dockerfile environment

If your Dockerfile sets WORKDIR /app and copies files, replicate:

mkdir -p /app
cd /app

You can mount your project directory for debugging:

docker run --rm -it -v "$PWD":/app -w /app python:3.12-slim bash

Now run the failing command exactly as in the Dockerfile:

pip install -r requirements.txt
pytest -q

This isolates whether the issue is Dockerfile-related (paths, missing files) or environment-related (OS deps, permissions).


Using --target and “debug stages”

Multi-stage builds can be debugged by stopping at an intermediate stage.

Example Dockerfile:

FROM node:20 AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci

FROM deps AS build
COPY . .
RUN npm run build

FROM node:20-slim AS runtime
WORKDIR /app
COPY --from=build /app/dist ./dist
CMD ["node", "dist/server.js"]

Build only up to build:

docker buildx build --target build --progress=plain -t app:buildstage .

Then run a shell in that stage image:

docker run --rm -it app:buildstage sh
ls -la /app
ls -la /app/dist

Add a dedicated debug stage

Sometimes you want tooling (curl, bash, strace) only for debugging:

FROM runtime AS debug
RUN apt-get update && apt-get install -y --no-install-recommends curl bash && rm -rf /var/lib/apt/lists/*
CMD ["bash"]

Build and run:

docker build -t app:debug --target debug .
docker run --rm -it app:debug

Linting and Static Checks (Before You Build)

Hadolint (Dockerfile linter)

Install (example using Docker):

docker run --rm -i hadolint/hadolint < Dockerfile

This catches common issues like:

ShellCheck for scripts embedded in RUN

If you have complex shell logic, consider moving it into a script file and running ShellCheck locally, then COPY it in.

Example:

shellcheck scripts/build.sh

Hardening Your Dockerfile to Prevent Future Failures

Use explicit shells and strict mode

If you rely on Bash features, set it:

SHELL ["/bin/bash", "-c"]
RUN set -euxo pipefail; \
    echo "Using bash with pipefail"

If you stay with /bin/sh, use:

RUN set -eu; \
    do_something

Prefer deterministic installs

Reduce network flakiness

RUN set -eu; \
    curl -fsSL --retry 5 --retry-delay 2 --retry-all-errors \
      https://example.com/file.tar.gz -o /tmp/file.tar.gz

Make failures obvious

Print versions early:

RUN node --version && npm --version

List directories after critical copies:

RUN set -eu; ls -la /app

These lines can be removed later, but during debugging they save time.


A Worked Example: From Failure to Fix

Scenario

You have a Node.js app. Docker build fails at npm ci with:

npm ERR! code EACCES
npm ERR! syscall mkdir
npm ERR! path /home/node/.npm
npm ERR! errno -13
npm ERR! Error: EACCES: permission denied, mkdir '/home/node/.npm'

Problem Dockerfile

FROM node:20-slim
WORKDIR /app
COPY package*.json ./
USER node
RUN npm ci
COPY . .
CMD ["npm", "start"]

Read the error correctly

Fix options

Fix A: Use a writable npm cache directory

FROM node:20-slim
WORKDIR /app

COPY package*.json ./

USER node
ENV NPM_CONFIG_CACHE=/tmp/.npm
RUN npm ci

COPY . .
CMD ["npm", "start"]

Fix B: Ensure ownership of working directory and home

If /app is owned by root, npm may try to write there during install. Ensure correct ownership:

FROM node:20-slim
WORKDIR /app

COPY package*.json ./

RUN set -eu; \
    chown -R node:node /app

USER node
RUN npm ci

COPY --chown=node:node . .
CMD ["npm", "start"]

Verify with a clean rebuild

docker buildx build --no-cache --progress=plain -t myapp:fixed .
docker run --rm -p 3000:3000 myapp:fixed

If it still fails, reproduce interactively:

docker run --rm -it node:20-slim bash
id
ls -la /app || true

Then iterate.


Quick Reference: Useful Commands

Build with readable logs

DOCKER_BUILDKIT=1 docker build --progress=plain -t app:debug .

Build without cache

docker buildx build --no-cache --progress=plain -t app:nocache .

Stop at a stage

docker buildx build --target build --progress=plain -t app:build .

Run a shell in an image

docker run --rm -it app:debug sh
# or
docker run --rm -it app:debug bash

Mount your project into a container for debugging

docker run --rm -it -v "$PWD":/app -w /app python:3.12-slim bash

Inspect image layers

docker history app:latest

Lint Dockerfile

docker run --rm -i hadolint/hadolint < Dockerfile

Closing Workflow (Use This Every Time)

  1. Rebuild with plain logs (--progress=plain).
  2. Identify the failing step and the exact command.
  3. Confirm files exist in the build context and aren’t ignored.
  4. Reproduce the failing command in an interactive container from the same base image.
  5. Fix the Dockerfile (paths, shell form, permissions, package installs, network).
  6. Rebuild with --no-cache to confirm the fix.
  7. Re-enable caching and reorder layers for stability.

If you want, paste your failing build log (with the step number and the failing instruction) and your Dockerfile, and I can point to the most likely root cause and a minimal fix.