Debugging DNS Resolution Problems Inside Docker Containers (Advanced Guide)

DNS issues inside containers are deceptively tricky: the container’s network namespace, Docker’s embedded DNS, the host’s resolver configuration, corporate VPNs, split-horizon DNS, and firewall/NAT rules can all interact in ways that look like “DNS is broken” while the root cause is elsewhere. This guide focuses on systematic, advanced debugging with real commands and deep explanations so you can isolate the failure domain quickly.

1. Mental Model: How DNS Works in Docker
2. Quick Triage Checklist (Fast Isolation)
3. Inspect DNS Configuration Inside the Container
4. Understand Docker’s Embedded DNS (127.0.0.11)
5. Debug with dig, nslookup, getent, and strace
6. Distinguish DNS Failures from Network Failures
7. Check the Host: systemd-resolved, NetworkManager, and /etc/resolv.conf
8. Common Root Causes and Fixes
9. Docker Compose and DNS: Service Discovery vs External Resolution
10. Advanced: Packet Capture and Query Tracing
11. Advanced: IPv6, DNS over TLS/HTTPS, and MTU Edge Cases
12. Hardening and Best Practices

1. Mental Model: How DNS Works in Docker

When a process inside a container resolves a name (e.g., api.example.com), it typically follows this chain:

Application calls the system resolver (often via getaddrinfo()).
The resolver consults:
- /etc/nsswitch.conf (controls whether to use files, DNS, mDNS, etc.)
- /etc/hosts
- /etc/resolv.conf (nameservers, search domains, options)
The query is sent to a nameserver IP listed in /etc/resolv.conf.

In Docker, /etc/resolv.conf inside the container is usually generated by Docker. On many Linux setups, you’ll see:

nameserver 127.0.0.11 (Docker’s embedded DNS for user-defined bridge networks)
plus options ndots:0 or ndots:5 depending on version/config
optional search ... domains

Docker’s embedded DNS then forwards queries to upstream resolvers (often derived from the host’s /etc/resolv.conf), and also answers container/service names on the same network (service discovery).

Key implication: A “DNS problem in the container” can be:

The container can’t reach Docker’s DNS (iptables, network namespace issues)
Docker’s DNS can’t reach upstream resolvers (host/VPN/firewall)
Upstream resolvers return different answers than expected (split DNS)
The resolver library behavior differs (glibc vs musl; ndots/search behavior)
It’s not DNS at all (routing, MTU, TCP fallback, or blocked UDP/53)

2. Quick Triage Checklist (Fast Isolation)

Run these in order to narrow down the problem:

2.1 Confirm the symptom inside the container

docker exec -it <container> sh
# or bash if available

Try:

getent hosts example.com

If getent fails, try raw DNS tools:

nslookup example.com
dig example.com

If those tools aren’t installed, see Section 5 for installing/debugging alternatives.

2.2 Check if it’s only DNS or general connectivity

ip route
ping -c 1 1.1.1.1
ping -c 1 8.8.8.8

If ping is blocked in your environment, try TCP connectivity:

# BusyBox / Alpine often has wget; Debian/Ubuntu often has curl
curl -I https://1.1.1.1 --max-time 5

If you can reach IPs but not names, it’s likely DNS. If you can’t reach IPs, it’s a broader network issue.

2.3 Identify the configured nameserver

cat /etc/resolv.conf

If you see 127.0.0.11, you’re using Docker’s embedded DNS. If you see something like 127.0.0.53, that’s often systemd-resolved on the host, and it may not be reachable from the container unless Docker has copied it intentionally (and even then it can be problematic).

3. Inspect DNS Configuration Inside the Container

3.1 `/etc/resolv.conf` (nameservers, search, ndots)

Example:

nameserver 127.0.0.11
options ndots:0
search corp.example.com

Important fields:

nameserver: where DNS queries go
search: suffixes appended for short names (e.g., db → db.corp.example.com)
options ndots:N: controls when a name is treated as “absolute” vs “relative”

Why ndots matters:
If ndots:5 and you query api.example.com (3 dots), the resolver may try search domains first (e.g., api.example.com.corp.example.com) before trying the absolute name. This can cause delays/timeouts that look like DNS failures.

3.2 `/etc/nsswitch.conf` (resolution order)

cat /etc/nsswitch.conf | sed -n '1,120p'

Look for the hosts: line. Common examples:

Debian/Ubuntu:

hosts: files mdns4_minimal [NOTFOUND=return] dns

Alpine (musl-based) may differ, and behavior can be simpler.

If dns is missing, your resolver may never query DNS (rare in containers, but possible in minimal images).

3.3 `/etc/hosts`

cat /etc/hosts

Sometimes a stale entry overrides DNS and causes confusion (e.g., example.com pinned to an old IP).

4. Understand Docker’s Embedded DNS (127.0.0.11)

On user-defined bridge networks, Docker injects an internal DNS server at 127.0.0.11 inside each container. It provides:

Service discovery: container names and Compose service names resolve to container IPs
Forwarding: external names are forwarded to upstream resolvers (derived from host config or daemon config)

4.1 Confirm the container is on a user-defined network

docker inspect <container> --format '{{json .NetworkSettings.Networks}}' | jq

If you see bridge only (the default docker0 bridge), behavior can differ depending on Docker version and settings. User-defined networks typically have better DNS/service discovery.

4.2 Inspect the network itself

docker network ls
docker network inspect <network_name> | jq '.[0].IPAM, .[0].Options, .[0].Containers'

Look for unusual options, subnets overlapping with VPN routes, or custom gateways.

5. Debug with `dig`, `nslookup`, `getent`, and `strace`

5.1 Use a dedicated debug container on the same network

If your application image is minimal, don’t pollute it—attach a toolbox container to the same network:

docker run --rm -it --network <network_name> nicolaka/netshoot bash

netshoot includes dig, tcpdump, iproute2, and more.

Alternatively:

docker run --rm -it --network <network_name> alpine:3.20 sh
apk add --no-cache bind-tools drill busybox-extras

5.2 Compare resolver paths: `getent` vs `dig`

getent hosts example.com uses the system’s NSS and resolver config.
dig example.com queries DNS more directly and is less affected by NSS order.

Run:

getent hosts example.com
dig example.com
dig +search example.com

If dig works but getent fails, suspect:

nsswitch.conf ordering
search/ndots behavior causing timeouts
application using a different resolver path (e.g., Go’s net resolver modes)

5.3 Query Docker’s embedded DNS explicitly

If /etc/resolv.conf points to 127.0.0.11:

dig @127.0.0.11 example.com
dig @127.0.0.11 tasks.<service>  # in Swarm contexts

If that fails, try querying an upstream resolver directly (if reachable):

dig @1.1.1.1 example.com
dig @8.8.8.8 example.com

If upstream works but 127.0.0.11 fails, the embedded DNS or its forwarding path is broken.

5.4 Use `strace` to see what the app is doing

If you can reproduce with a small command (e.g., curl), trace DNS-related syscalls:

strace -f -e trace=network,connect,sendto,recvfrom,openat,read,write \
  curl -I https://example.com --max-time 5

Look for:

Reads of /etc/resolv.conf, /etc/nsswitch.conf, /etc/hosts
UDP packets to port 53 (often to 127.0.0.11)
Timeouts or ECONNREFUSED

This is especially useful when the application has its own DNS behavior.

6. Distinguish DNS Failures from Network Failures

6.1 Check routing and interface state

Inside the container:

ip addr
ip route

On the host, identify the veth pair and bridge:

docker inspect <container> --format '{{.NetworkSettings.SandboxKey}}'
# Example output: /var/run/docker/netns/xxxxxxxx

Then:

# List interfaces on host
ip link

6.2 Test UDP/53 reachability to the resolver

If the resolver is 127.0.0.11, you’re testing connectivity to Docker’s embedded DNS (local inside namespace). If resolver is a real IP (e.g., 10.0.0.2), test:

# netcat may not be present; in netshoot it is
nc -vu -w 2 10.0.0.2 53

For TCP/53:

nc -vz -w 2 10.0.0.2 53

Some DNS servers require TCP for large responses or when UDP is blocked.

6.3 Look for MTU blackholes (DNS can be affected)

Large DNS responses (DNSSEC, many records) can fragment. If fragmentation is blocked, you get timeouts.

Inside container:

ip link show eth0

Try lowering MTU temporarily (in a test container) or test path MTU with tracepath (in netshoot):

tracepath 1.1.1.1

7. Check the Host: systemd-resolved, NetworkManager, and `/etc/resolv.conf`

Docker typically reads the host’s resolver configuration and propagates it (or uses daemon config). But modern Linux often uses systemd-resolved, which can create a stub resolver at 127.0.0.53 on the host.

7.1 Inspect host `/etc/resolv.conf`

On the host:

ls -l /etc/resolv.conf
cat /etc/resolv.conf

If it points to 127.0.0.53, Docker might copy that into containers in some setups, which is usually wrong because 127.0.0.53 inside a container refers to the container itself, not the host.

7.2 Check systemd-resolved status (host)

resolvectl status

Look for:

DNS Servers
DNS Domain (search domains)
Per-link DNS settings (VPN interfaces often set these)

If your environment uses split DNS (e.g., *.corp.example.com via VPN DNS), Docker’s forwarding may not respect per-link rules unless configured carefully.

7.3 Configure Docker daemon DNS explicitly (host)

If upstream resolvers are flaky or the host uses a stub resolver, set DNS servers in Docker daemon config.

Edit (host):

sudo mkdir -p /etc/docker
sudo nano /etc/docker/daemon.json

Example:

{
  "dns": ["1.1.1.1", "8.8.8.8"],
  "dns-options": ["timeout:2", "attempts:3"],
  "dns-search": []
}

Then restart Docker:

sudo systemctl restart docker

Recreate containers to pick up changes.

Note: If you rely on corporate DNS or split DNS, hardcoding public resolvers may break internal names. In that case, set DNS to your corporate resolvers (reachable from Docker networks) or use a local caching forwarder that understands split DNS.

8. Common Root Causes and Fixes

8.1 Container has `nameserver 127.0.0.53` (host stub leaked into container)

Symptom: DNS fails instantly or times out; dig @127.0.0.53 fails.

Fix options:

Configure Docker daemon "dns": [...] as above
Or run container with explicit DNS:

docker run --rm -it --dns 10.0.0.2 --dns 10.0.0.3 alpine:3.20 sh

8.2 VPN / split DNS not working from containers

Symptom: Host resolves internal.corp, container cannot.

Why: VPN client sets per-interface DNS rules; Docker’s embedded DNS forwards using a simpler upstream list and may not follow split routing rules.

Debug:

On host: resolvectl status to see which interface provides which DNS
In container: dig internal.corp and dig @<corp_dns> internal.corp

Fix approaches:

Use corporate DNS servers directly for Docker (daemon.json)
Ensure Docker subnets do not overlap with VPN routes (see next section)
Run a local DNS forwarder on the host (e.g., dnsmasq or unbound) that implements split DNS and point Docker to it (host IP reachable from containers)

8.3 Subnet overlap between Docker networks and corporate/VPN networks

Symptom: Some domains resolve but connections fail; or DNS servers are “unreachable” from containers.

Why: If Docker uses 172.16.0.0/12 and your VPN also routes parts of that, packets may go the wrong way.

Debug:

Host routes:

ip route

Docker network subnets:

docker network inspect bridge | jq '.[0].IPAM.Config'
docker network ls
docker network inspect <network> | jq '.[0].IPAM.Config'

Fix: Create Docker networks on non-overlapping subnets:

docker network create --subnet 10.200.0.0/24 mynet

For the default bridge, you can change Docker’s default address pools in daemon.json:

{
  "default-address-pools": [
    {"base":"10.200.0.0/16","size":24}
  ]
}

Restart Docker and recreate networks/containers.

8.4 Firewall blocking UDP/53 (or TCP/53)

Symptom: dig times out; tcpdump shows queries leaving but no replies.

Debug:

On host: check firewall rules (iptables/nftables)
In netshoot container: capture traffic:

tcpdump -ni any port 53

If queries leave but no response returns, check upstream firewall/VPN policies.

Fix: Allow DNS traffic from Docker subnets to DNS servers. On Linux hosts using nftables/iptables, rules vary widely; ensure NAT and forward policies permit it.

8.5 `ndots` and search domains causing long delays

Symptom: Resolution eventually works but takes seconds; app startup slow.

Debug: Check /etc/resolv.conf:

cat /etc/resolv.conf

If you see options ndots:5 and a search list, try:

time getent hosts example.com
time getent hosts example

Fix options:

Reduce search domains
Set ndots:1 or ndots:0 for containers that mostly use FQDNs:

docker run --rm -it --dns-option ndots:1 alpine:3.20 sh

In Compose:

docker compose run --rm --dns-option ndots:1 <service> sh

8.6 Alpine/musl vs Debian/glibc differences

Symptom: Same config works in Debian container but not in Alpine.

Why: musl libc resolver differs from glibc in search/timeout behavior and edge cases.

Debug: Use dig to bypass libc differences:

dig example.com

Fix:

Prefer consistent base images for network-sensitive apps
Explicitly configure resolver options
Consider using glibc-based images if you hit musl-specific resolver limitations

9. Docker Compose and DNS: Service Discovery vs External Resolution

Compose creates a default network (unless configured otherwise), and service names become DNS names.

9.1 Verify service discovery

Assume services web and db on the same Compose network.

From web:

docker compose exec web getent hosts db
docker compose exec web dig db

If db doesn’t resolve:

Ensure both services share the same network
Ensure you didn’t set network_mode: host for one service and not the other
Check for multiple networks and which one is default

9.2 Inspect Compose networks

docker compose ps
docker network ls | grep "$(basename "$PWD")"
docker network inspect <compose_network> | jq '.[0].Containers'

9.3 Beware `network_mode: host`

If a container uses host networking, it uses the host’s network stack and DNS behavior, not Docker’s embedded DNS. This can “fix” some DNS issues but breaks service discovery and isolation.

10. Advanced: Packet Capture and Query Tracing

When you need proof of where the query dies, capture packets.

10.1 Capture inside a debug container

Run netshoot on the same network:

docker run --rm -it --network <network_name> --cap-add NET_ADMIN nicolaka/netshoot bash

Capture DNS:

tcpdump -ni any port 53

In another terminal, trigger resolution:

docker exec -it <container> getent hosts example.com

Interpretation:

You should see a query from container IP to 127.0.0.11 (if embedded DNS)
Then a forwarded query from Docker (on the host side) to upstream resolvers
If you only see the first hop, forwarding is failing
If you see forwarded queries but no replies, upstream path is failing

10.2 Capture on the host (bridge interface)

Identify the bridge:

docker network inspect <network_name> | jq -r '.[0].Options["com.docker.network.bridge.name"]'

If null, it might be something like br-<id>. List bridges:

ip link show type bridge

Capture:

sudo tcpdump -ni br-xxxxxxxx port 53

This helps confirm whether packets leave the container namespace and reach the host bridge.

10.3 Query tracing with `dig +trace`

+trace walks the DNS hierarchy and bypasses your configured resolver (it queries root servers, then TLD, etc.):

dig +trace example.com

If dig +trace works but normal dig example.com fails, your configured resolver or forwarding path is the issue, not global DNS.

11. Advanced: IPv6, DNS over TLS/HTTPS, and MTU Edge Cases

11.1 IPv6 inside containers

If your app prefers IPv6 and Docker/network doesn’t support it properly, you can see confusing failures.

Check:

ip -6 addr
getent ahosts example.com

If AAAA records resolve but connectivity fails, you might need to:

Enable IPv6 in Docker daemon
Or force IPv4 in the application (e.g., curl -4 ...)

Test:

curl -4 -I https://example.com --max-time 5
curl -6 -I https://example.com --max-time 5

11.2 DNS over HTTPS/TLS (DoH/DoT)

Some environments intercept or block UDP/53 but allow HTTPS. If your container uses a DoH client (or a library that does), the “DNS issue” might actually be HTTPS egress restrictions, proxy requirements, or certificate interception.

Debug by verifying:

Whether the app is using system DNS at all
Whether it connects to known DoH endpoints

Use strace or application logs to confirm.

11.3 MTU and fragmentation

DNS responses with DNSSEC can exceed typical UDP sizes. If fragmentation is blocked, you get timeouts.

Debug with dig forcing smaller sizes:

dig example.com +dnssec
dig example.com +bufsize=1232

If +bufsize=1232 works but default fails, suspect PMTU/fragmentation issues.

12. Hardening and Best Practices

12.1 Use a predictable DNS strategy

Options:

Use Docker embedded DNS (default on user-defined networks) and ensure upstream resolvers are correct.
Pin DNS servers at daemon level (/etc/docker/daemon.json) for consistent behavior.
Run a local caching resolver (dnsmasq/unbound) reachable from containers to improve performance and control (timeouts, split DNS).

12.2 Keep Docker networks non-overlapping

Plan subnets to avoid VPN/corporate overlaps. Use default-address-pools to prevent surprises when new networks are created.

12.3 Add a standard “debug toolbox” workflow

Instead of modifying production images, keep a known debug container:

docker run --rm -it --network <network> nicolaka/netshoot bash

Common commands to memorize:

cat /etc/resolv.conf
getent hosts name
dig @127.0.0.11 name
dig @<upstream_dns> name
tcpdump -ni any port 53
ip route

12.4 Explicitly set resolver options for latency-sensitive apps

If search domains are unnecessary, reduce them. Consider:

--dns-option ndots:1
--dns-option timeout:2
--dns-option attempts:2

Example:

docker run --rm -it \
  --dns 10.0.0.2 \
  --dns-option ndots:1 \
  --dns-option timeout:2 \
  alpine:3.20 sh

12.5 Validate from the same network namespace as the app

Always test from:

the same container, or
a debug container attached to the same Docker network

Testing from the host alone can mislead you because host DNS and routing may differ substantially.

Practical Debug Session (Putting It All Together)

Assume: curl https://example.com fails inside container with “Could not resolve host”.

Check resolver config:

docker exec -it app cat /etc/resolv.conf
docker exec -it app cat /etc/nsswitch.conf

Test system resolver and direct DNS:

docker exec -it app getent hosts example.com
docker exec -it app sh -lc 'command -v dig && dig example.com || echo "dig not installed"'

Attach netshoot to same network and test:

NET=$(docker inspect app --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}')
docker run --rm -it --network "$NET" nicolaka/netshoot bash
dig @127.0.0.11 example.com
dig @1.1.1.1 example.com
tcpdump -ni any port 53

If upstream works but 127.0.0.11 fails:
- Inspect Docker daemon DNS settings
- Check host firewall rules and Docker logs:
```
sudo journalctl -u docker --since "1 hour ago"
```

If 127.0.0.11 works but getent/app fails:

Inspect search/ndots
Consider libc differences

Use strace on the failing command:

docker exec -it app strace -f -e trace=network,openat,read,write curl -I https://example.com --max-time 5

This workflow reliably tells you whether the failure is:

application resolver behavior
container resolver config
Docker embedded DNS forwarding
upstream DNS reachability
routing/firewall/VPN/MTU issues

Closing Notes

DNS inside Docker is not “just DNS”; it’s DNS + namespaces + forwarding + host resolver policy. The fastest path to a fix is to avoid guessing and instead:

inspect /etc/resolv.conf and nsswitch.conf
compare getent vs dig
query @127.0.0.11 vs upstream resolvers
capture traffic on port 53 when needed
validate host resolver and VPN split-DNS behavior

If you share (1) /etc/resolv.conf from the container, (2) resolvectl status from the host, and (3) the output of dig @127.0.0.11 example.com, you can usually pinpoint the root cause with high confidence.

Debugging DNS Resolution Problems Inside Docker Containers (Advanced Guide)

Debugging DNS Resolution Problems Inside Docker Containers (Advanced Guide)

Table of Contents

1. Mental Model: How DNS Works in Docker

2. Quick Triage Checklist (Fast Isolation)

2.1 Confirm the symptom inside the container

2.2 Check if it’s only DNS or general connectivity

2.3 Identify the configured nameserver

3. Inspect DNS Configuration Inside the Container

3.1 /etc/resolv.conf (nameservers, search, ndots)

3.2 /etc/nsswitch.conf (resolution order)

3.3 /etc/hosts

4. Understand Docker’s Embedded DNS (127.0.0.11)

4.1 Confirm the container is on a user-defined network

4.2 Inspect the network itself

5. Debug with dig, nslookup, getent, and strace

5.1 Use a dedicated debug container on the same network

5.2 Compare resolver paths: getent vs dig

5.3 Query Docker’s embedded DNS explicitly

5.4 Use strace to see what the app is doing

6. Distinguish DNS Failures from Network Failures

6.1 Check routing and interface state

6.2 Test UDP/53 reachability to the resolver

6.3 Look for MTU blackholes (DNS can be affected)

7. Check the Host: systemd-resolved, NetworkManager, and /etc/resolv.conf

7.1 Inspect host /etc/resolv.conf

7.2 Check systemd-resolved status (host)

7.3 Configure Docker daemon DNS explicitly (host)

8. Common Root Causes and Fixes

8.1 Container has nameserver 127.0.0.53 (host stub leaked into container)

8.2 VPN / split DNS not working from containers

8.3 Subnet overlap between Docker networks and corporate/VPN networks

8.4 Firewall blocking UDP/53 (or TCP/53)

8.5 ndots and search domains causing long delays

8.6 Alpine/musl vs Debian/glibc differences

9. Docker Compose and DNS: Service Discovery vs External Resolution

9.1 Verify service discovery

9.2 Inspect Compose networks

9.3 Beware network_mode: host

10. Advanced: Packet Capture and Query Tracing

10.1 Capture inside a debug container

10.2 Capture on the host (bridge interface)

10.3 Query tracing with dig +trace

11. Advanced: IPv6, DNS over TLS/HTTPS, and MTU Edge Cases

11.1 IPv6 inside containers

11.2 DNS over HTTPS/TLS (DoH/DoT)

11.3 MTU and fragmentation

12. Hardening and Best Practices

12.1 Use a predictable DNS strategy

12.2 Keep Docker networks non-overlapping

12.3 Add a standard “debug toolbox” workflow

12.4 Explicitly set resolver options for latency-sensitive apps

12.5 Validate from the same network namespace as the app

Practical Debug Session (Putting It All Together)

Closing Notes

Related Tutorials

3.1 `/etc/resolv.conf` (nameservers, search, ndots)

3.2 `/etc/nsswitch.conf` (resolution order)

3.3 `/etc/hosts`

5. Debug with `dig`, `nslookup`, `getent`, and `strace`

5.2 Compare resolver paths: `getent` vs `dig`

5.4 Use `strace` to see what the app is doing

7. Check the Host: systemd-resolved, NetworkManager, and `/etc/resolv.conf`

7.1 Inspect host `/etc/resolv.conf`

8.1 Container has `nameserver 127.0.0.53` (host stub leaked into container)

8.5 `ndots` and search domains causing long delays

9.3 Beware `network_mode: host`

10.3 Query tracing with `dig +trace`