← Back to Tutorials

Debugging DNS Resolution Problems Inside Docker Containers (Advanced Guide)

dockerdnsnetworkingtroubleshootingcontainerslinuxsystemd-resolveddevopstcpdumpdig

Debugging DNS Resolution Problems Inside Docker Containers (Advanced Guide)

DNS issues inside containers are deceptively tricky: the container’s network namespace, Docker’s embedded DNS, the host’s resolver configuration, corporate VPNs, split-horizon DNS, and firewall/NAT rules can all interact in ways that look like “DNS is broken” while the root cause is elsewhere. This guide focuses on systematic, advanced debugging with real commands and deep explanations so you can isolate the failure domain quickly.


Table of Contents


1. Mental Model: How DNS Works in Docker

When a process inside a container resolves a name (e.g., api.example.com), it typically follows this chain:

  1. Application calls the system resolver (often via getaddrinfo()).
  2. The resolver consults:
    • /etc/nsswitch.conf (controls whether to use files, DNS, mDNS, etc.)
    • /etc/hosts
    • /etc/resolv.conf (nameservers, search domains, options)
  3. The query is sent to a nameserver IP listed in /etc/resolv.conf.

In Docker, /etc/resolv.conf inside the container is usually generated by Docker. On many Linux setups, you’ll see:

Docker’s embedded DNS then forwards queries to upstream resolvers (often derived from the host’s /etc/resolv.conf), and also answers container/service names on the same network (service discovery).

Key implication: A “DNS problem in the container” can be:


2. Quick Triage Checklist (Fast Isolation)

Run these in order to narrow down the problem:

2.1 Confirm the symptom inside the container

docker exec -it <container> sh
# or bash if available

Try:

getent hosts example.com

If getent fails, try raw DNS tools:

nslookup example.com
dig example.com

If those tools aren’t installed, see Section 5 for installing/debugging alternatives.

2.2 Check if it’s only DNS or general connectivity

ip route
ping -c 1 1.1.1.1
ping -c 1 8.8.8.8

If ping is blocked in your environment, try TCP connectivity:

# BusyBox / Alpine often has wget; Debian/Ubuntu often has curl
curl -I https://1.1.1.1 --max-time 5

If you can reach IPs but not names, it’s likely DNS. If you can’t reach IPs, it’s a broader network issue.

2.3 Identify the configured nameserver

cat /etc/resolv.conf

If you see 127.0.0.11, you’re using Docker’s embedded DNS. If you see something like 127.0.0.53, that’s often systemd-resolved on the host, and it may not be reachable from the container unless Docker has copied it intentionally (and even then it can be problematic).


3. Inspect DNS Configuration Inside the Container

3.1 /etc/resolv.conf (nameservers, search, ndots)

Example:

nameserver 127.0.0.11
options ndots:0
search corp.example.com

Important fields:

Why ndots matters:
If ndots:5 and you query api.example.com (3 dots), the resolver may try search domains first (e.g., api.example.com.corp.example.com) before trying the absolute name. This can cause delays/timeouts that look like DNS failures.

3.2 /etc/nsswitch.conf (resolution order)

cat /etc/nsswitch.conf | sed -n '1,120p'

Look for the hosts: line. Common examples:

If dns is missing, your resolver may never query DNS (rare in containers, but possible in minimal images).

3.3 /etc/hosts

cat /etc/hosts

Sometimes a stale entry overrides DNS and causes confusion (e.g., example.com pinned to an old IP).


4. Understand Docker’s Embedded DNS (127.0.0.11)

On user-defined bridge networks, Docker injects an internal DNS server at 127.0.0.11 inside each container. It provides:

4.1 Confirm the container is on a user-defined network

docker inspect <container> --format '{{json .NetworkSettings.Networks}}' | jq

If you see bridge only (the default docker0 bridge), behavior can differ depending on Docker version and settings. User-defined networks typically have better DNS/service discovery.

4.2 Inspect the network itself

docker network ls
docker network inspect <network_name> | jq '.[0].IPAM, .[0].Options, .[0].Containers'

Look for unusual options, subnets overlapping with VPN routes, or custom gateways.


5. Debug with dig, nslookup, getent, and strace

5.1 Use a dedicated debug container on the same network

If your application image is minimal, don’t pollute it—attach a toolbox container to the same network:

docker run --rm -it --network <network_name> nicolaka/netshoot bash

netshoot includes dig, tcpdump, iproute2, and more.

Alternatively:

docker run --rm -it --network <network_name> alpine:3.20 sh
apk add --no-cache bind-tools drill busybox-extras

5.2 Compare resolver paths: getent vs dig

Run:

getent hosts example.com
dig example.com
dig +search example.com

If dig works but getent fails, suspect:

5.3 Query Docker’s embedded DNS explicitly

If /etc/resolv.conf points to 127.0.0.11:

dig @127.0.0.11 example.com
dig @127.0.0.11 tasks.<service>  # in Swarm contexts

If that fails, try querying an upstream resolver directly (if reachable):

dig @1.1.1.1 example.com
dig @8.8.8.8 example.com

If upstream works but 127.0.0.11 fails, the embedded DNS or its forwarding path is broken.

5.4 Use strace to see what the app is doing

If you can reproduce with a small command (e.g., curl), trace DNS-related syscalls:

strace -f -e trace=network,connect,sendto,recvfrom,openat,read,write \
  curl -I https://example.com --max-time 5

Look for:

This is especially useful when the application has its own DNS behavior.


6. Distinguish DNS Failures from Network Failures

6.1 Check routing and interface state

Inside the container:

ip addr
ip route

On the host, identify the veth pair and bridge:

docker inspect <container> --format '{{.NetworkSettings.SandboxKey}}'
# Example output: /var/run/docker/netns/xxxxxxxx

Then:

# List interfaces on host
ip link

6.2 Test UDP/53 reachability to the resolver

If the resolver is 127.0.0.11, you’re testing connectivity to Docker’s embedded DNS (local inside namespace). If resolver is a real IP (e.g., 10.0.0.2), test:

# netcat may not be present; in netshoot it is
nc -vu -w 2 10.0.0.2 53

For TCP/53:

nc -vz -w 2 10.0.0.2 53

Some DNS servers require TCP for large responses or when UDP is blocked.

6.3 Look for MTU blackholes (DNS can be affected)

Large DNS responses (DNSSEC, many records) can fragment. If fragmentation is blocked, you get timeouts.

Inside container:

ip link show eth0

Try lowering MTU temporarily (in a test container) or test path MTU with tracepath (in netshoot):

tracepath 1.1.1.1

7. Check the Host: systemd-resolved, NetworkManager, and /etc/resolv.conf

Docker typically reads the host’s resolver configuration and propagates it (or uses daemon config). But modern Linux often uses systemd-resolved, which can create a stub resolver at 127.0.0.53 on the host.

7.1 Inspect host /etc/resolv.conf

On the host:

ls -l /etc/resolv.conf
cat /etc/resolv.conf

If it points to 127.0.0.53, Docker might copy that into containers in some setups, which is usually wrong because 127.0.0.53 inside a container refers to the container itself, not the host.

7.2 Check systemd-resolved status (host)

resolvectl status

Look for:

If your environment uses split DNS (e.g., *.corp.example.com via VPN DNS), Docker’s forwarding may not respect per-link rules unless configured carefully.

7.3 Configure Docker daemon DNS explicitly (host)

If upstream resolvers are flaky or the host uses a stub resolver, set DNS servers in Docker daemon config.

Edit (host):

sudo mkdir -p /etc/docker
sudo nano /etc/docker/daemon.json

Example:

{
  "dns": ["1.1.1.1", "8.8.8.8"],
  "dns-options": ["timeout:2", "attempts:3"],
  "dns-search": []
}

Then restart Docker:

sudo systemctl restart docker

Recreate containers to pick up changes.

Note: If you rely on corporate DNS or split DNS, hardcoding public resolvers may break internal names. In that case, set DNS to your corporate resolvers (reachable from Docker networks) or use a local caching forwarder that understands split DNS.


8. Common Root Causes and Fixes

8.1 Container has nameserver 127.0.0.53 (host stub leaked into container)

Symptom: DNS fails instantly or times out; dig @127.0.0.53 fails.

Fix options:

docker run --rm -it --dns 10.0.0.2 --dns 10.0.0.3 alpine:3.20 sh

8.2 VPN / split DNS not working from containers

Symptom: Host resolves internal.corp, container cannot.

Why: VPN client sets per-interface DNS rules; Docker’s embedded DNS forwards using a simpler upstream list and may not follow split routing rules.

Debug:

Fix approaches:

8.3 Subnet overlap between Docker networks and corporate/VPN networks

Symptom: Some domains resolve but connections fail; or DNS servers are “unreachable” from containers.

Why: If Docker uses 172.16.0.0/12 and your VPN also routes parts of that, packets may go the wrong way.

Debug:

ip route
docker network inspect bridge | jq '.[0].IPAM.Config'
docker network ls
docker network inspect <network> | jq '.[0].IPAM.Config'

Fix: Create Docker networks on non-overlapping subnets:

docker network create --subnet 10.200.0.0/24 mynet

For the default bridge, you can change Docker’s default address pools in daemon.json:

{
  "default-address-pools": [
    {"base":"10.200.0.0/16","size":24}
  ]
}

Restart Docker and recreate networks/containers.

8.4 Firewall blocking UDP/53 (or TCP/53)

Symptom: dig times out; tcpdump shows queries leaving but no replies.

Debug:

tcpdump -ni any port 53

If queries leave but no response returns, check upstream firewall/VPN policies.

Fix: Allow DNS traffic from Docker subnets to DNS servers. On Linux hosts using nftables/iptables, rules vary widely; ensure NAT and forward policies permit it.

8.5 ndots and search domains causing long delays

Symptom: Resolution eventually works but takes seconds; app startup slow.

Debug: Check /etc/resolv.conf:

cat /etc/resolv.conf

If you see options ndots:5 and a search list, try:

time getent hosts example.com
time getent hosts example

Fix options:

docker run --rm -it --dns-option ndots:1 alpine:3.20 sh

In Compose:

docker compose run --rm --dns-option ndots:1 <service> sh

8.6 Alpine/musl vs Debian/glibc differences

Symptom: Same config works in Debian container but not in Alpine.

Why: musl libc resolver differs from glibc in search/timeout behavior and edge cases.

Debug: Use dig to bypass libc differences:

dig example.com

Fix:


9. Docker Compose and DNS: Service Discovery vs External Resolution

Compose creates a default network (unless configured otherwise), and service names become DNS names.

9.1 Verify service discovery

Assume services web and db on the same Compose network.

From web:

docker compose exec web getent hosts db
docker compose exec web dig db

If db doesn’t resolve:

9.2 Inspect Compose networks

docker compose ps
docker network ls | grep "$(basename "$PWD")"
docker network inspect <compose_network> | jq '.[0].Containers'

9.3 Beware network_mode: host

If a container uses host networking, it uses the host’s network stack and DNS behavior, not Docker’s embedded DNS. This can “fix” some DNS issues but breaks service discovery and isolation.


10. Advanced: Packet Capture and Query Tracing

When you need proof of where the query dies, capture packets.

10.1 Capture inside a debug container

Run netshoot on the same network:

docker run --rm -it --network <network_name> --cap-add NET_ADMIN nicolaka/netshoot bash

Capture DNS:

tcpdump -ni any port 53

In another terminal, trigger resolution:

docker exec -it <container> getent hosts example.com

Interpretation:

10.2 Capture on the host (bridge interface)

Identify the bridge:

docker network inspect <network_name> | jq -r '.[0].Options["com.docker.network.bridge.name"]'

If null, it might be something like br-<id>. List bridges:

ip link show type bridge

Capture:

sudo tcpdump -ni br-xxxxxxxx port 53

This helps confirm whether packets leave the container namespace and reach the host bridge.

10.3 Query tracing with dig +trace

+trace walks the DNS hierarchy and bypasses your configured resolver (it queries root servers, then TLD, etc.):

dig +trace example.com

If dig +trace works but normal dig example.com fails, your configured resolver or forwarding path is the issue, not global DNS.


11. Advanced: IPv6, DNS over TLS/HTTPS, and MTU Edge Cases

11.1 IPv6 inside containers

If your app prefers IPv6 and Docker/network doesn’t support it properly, you can see confusing failures.

Check:

ip -6 addr
getent ahosts example.com

If AAAA records resolve but connectivity fails, you might need to:

Test:

curl -4 -I https://example.com --max-time 5
curl -6 -I https://example.com --max-time 5

11.2 DNS over HTTPS/TLS (DoH/DoT)

Some environments intercept or block UDP/53 but allow HTTPS. If your container uses a DoH client (or a library that does), the “DNS issue” might actually be HTTPS egress restrictions, proxy requirements, or certificate interception.

Debug by verifying:

Use strace or application logs to confirm.

11.3 MTU and fragmentation

DNS responses with DNSSEC can exceed typical UDP sizes. If fragmentation is blocked, you get timeouts.

Debug with dig forcing smaller sizes:

dig example.com +dnssec
dig example.com +bufsize=1232

If +bufsize=1232 works but default fails, suspect PMTU/fragmentation issues.


12. Hardening and Best Practices

12.1 Use a predictable DNS strategy

Options:

12.2 Keep Docker networks non-overlapping

Plan subnets to avoid VPN/corporate overlaps. Use default-address-pools to prevent surprises when new networks are created.

12.3 Add a standard “debug toolbox” workflow

Instead of modifying production images, keep a known debug container:

docker run --rm -it --network <network> nicolaka/netshoot bash

Common commands to memorize:

cat /etc/resolv.conf
getent hosts name
dig @127.0.0.11 name
dig @<upstream_dns> name
tcpdump -ni any port 53
ip route

12.4 Explicitly set resolver options for latency-sensitive apps

If search domains are unnecessary, reduce them. Consider:

Example:

docker run --rm -it \
  --dns 10.0.0.2 \
  --dns-option ndots:1 \
  --dns-option timeout:2 \
  alpine:3.20 sh

12.5 Validate from the same network namespace as the app

Always test from:

Testing from the host alone can mislead you because host DNS and routing may differ substantially.


Practical Debug Session (Putting It All Together)

Assume: curl https://example.com fails inside container with “Could not resolve host”.

  1. Check resolver config:

    docker exec -it app cat /etc/resolv.conf
    docker exec -it app cat /etc/nsswitch.conf
  2. Test system resolver and direct DNS:

    docker exec -it app getent hosts example.com
    docker exec -it app sh -lc 'command -v dig && dig example.com || echo "dig not installed"'
  3. Attach netshoot to same network and test:

    NET=$(docker inspect app --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}')
    docker run --rm -it --network "$NET" nicolaka/netshoot bash
    dig @127.0.0.11 example.com
    dig @1.1.1.1 example.com
    tcpdump -ni any port 53
  4. If upstream works but 127.0.0.11 fails:

    • Inspect Docker daemon DNS settings
    • Check host firewall rules and Docker logs:
      sudo journalctl -u docker --since "1 hour ago"
  5. If 127.0.0.11 works but getent/app fails:

    • Inspect search/ndots
    • Consider libc differences
    • Use strace on the failing command:
      docker exec -it app strace -f -e trace=network,openat,read,write curl -I https://example.com --max-time 5

This workflow reliably tells you whether the failure is:


Closing Notes

DNS inside Docker is not “just DNS”; it’s DNS + namespaces + forwarding + host resolver policy. The fastest path to a fix is to avoid guessing and instead:

If you share (1) /etc/resolv.conf from the container, (2) resolvectl status from the host, and (3) the output of dig @127.0.0.11 example.com, you can usually pinpoint the root cause with high confidence.