Debugging DNS Resolution Problems Inside Docker Containers (Advanced Guide)
DNS issues inside containers are deceptively tricky: the container’s network namespace, Docker’s embedded DNS, the host’s resolver configuration, corporate VPNs, split-horizon DNS, and firewall/NAT rules can all interact in ways that look like “DNS is broken” while the root cause is elsewhere. This guide focuses on systematic, advanced debugging with real commands and deep explanations so you can isolate the failure domain quickly.
Table of Contents
- 1. Mental Model: How DNS Works in Docker
- 2. Quick Triage Checklist (Fast Isolation)
- 3. Inspect DNS Configuration Inside the Container
- 4. Understand Docker’s Embedded DNS (127.0.0.11)
- 5. Debug with
dig,nslookup,getent, andstrace - 6. Distinguish DNS Failures from Network Failures
- 7. Check the Host: systemd-resolved, NetworkManager, and
/etc/resolv.conf - 8. Common Root Causes and Fixes
- 9. Docker Compose and DNS: Service Discovery vs External Resolution
- 10. Advanced: Packet Capture and Query Tracing
- 11. Advanced: IPv6, DNS over TLS/HTTPS, and MTU Edge Cases
- 12. Hardening and Best Practices
1. Mental Model: How DNS Works in Docker
When a process inside a container resolves a name (e.g., api.example.com), it typically follows this chain:
- Application calls the system resolver (often via
getaddrinfo()). - The resolver consults:
/etc/nsswitch.conf(controls whether to use files, DNS, mDNS, etc.)/etc/hosts/etc/resolv.conf(nameservers, search domains, options)
- The query is sent to a nameserver IP listed in
/etc/resolv.conf.
In Docker, /etc/resolv.conf inside the container is usually generated by Docker. On many Linux setups, you’ll see:
nameserver 127.0.0.11(Docker’s embedded DNS for user-defined bridge networks)- plus
options ndots:0orndots:5depending on version/config - optional
search ...domains
Docker’s embedded DNS then forwards queries to upstream resolvers (often derived from the host’s /etc/resolv.conf), and also answers container/service names on the same network (service discovery).
Key implication: A “DNS problem in the container” can be:
- The container can’t reach Docker’s DNS (iptables, network namespace issues)
- Docker’s DNS can’t reach upstream resolvers (host/VPN/firewall)
- Upstream resolvers return different answers than expected (split DNS)
- The resolver library behavior differs (glibc vs musl; ndots/search behavior)
- It’s not DNS at all (routing, MTU, TCP fallback, or blocked UDP/53)
2. Quick Triage Checklist (Fast Isolation)
Run these in order to narrow down the problem:
2.1 Confirm the symptom inside the container
docker exec -it <container> sh
# or bash if available
Try:
getent hosts example.com
If getent fails, try raw DNS tools:
nslookup example.com
dig example.com
If those tools aren’t installed, see Section 5 for installing/debugging alternatives.
2.2 Check if it’s only DNS or general connectivity
ip route
ping -c 1 1.1.1.1
ping -c 1 8.8.8.8
If ping is blocked in your environment, try TCP connectivity:
# BusyBox / Alpine often has wget; Debian/Ubuntu often has curl
curl -I https://1.1.1.1 --max-time 5
If you can reach IPs but not names, it’s likely DNS. If you can’t reach IPs, it’s a broader network issue.
2.3 Identify the configured nameserver
cat /etc/resolv.conf
If you see 127.0.0.11, you’re using Docker’s embedded DNS. If you see something like 127.0.0.53, that’s often systemd-resolved on the host, and it may not be reachable from the container unless Docker has copied it intentionally (and even then it can be problematic).
3. Inspect DNS Configuration Inside the Container
3.1 /etc/resolv.conf (nameservers, search, ndots)
Example:
nameserver 127.0.0.11
options ndots:0
search corp.example.com
Important fields:
nameserver: where DNS queries gosearch: suffixes appended for short names (e.g.,db→db.corp.example.com)options ndots:N: controls when a name is treated as “absolute” vs “relative”
Why ndots matters:
If ndots:5 and you query api.example.com (3 dots), the resolver may try search domains first (e.g., api.example.com.corp.example.com) before trying the absolute name. This can cause delays/timeouts that look like DNS failures.
3.2 /etc/nsswitch.conf (resolution order)
cat /etc/nsswitch.conf | sed -n '1,120p'
Look for the hosts: line. Common examples:
- Debian/Ubuntu:
hosts: files mdns4_minimal [NOTFOUND=return] dns - Alpine (musl-based) may differ, and behavior can be simpler.
If dns is missing, your resolver may never query DNS (rare in containers, but possible in minimal images).
3.3 /etc/hosts
cat /etc/hosts
Sometimes a stale entry overrides DNS and causes confusion (e.g., example.com pinned to an old IP).
4. Understand Docker’s Embedded DNS (127.0.0.11)
On user-defined bridge networks, Docker injects an internal DNS server at 127.0.0.11 inside each container. It provides:
- Service discovery: container names and Compose service names resolve to container IPs
- Forwarding: external names are forwarded to upstream resolvers (derived from host config or daemon config)
4.1 Confirm the container is on a user-defined network
docker inspect <container> --format '{{json .NetworkSettings.Networks}}' | jq
If you see bridge only (the default docker0 bridge), behavior can differ depending on Docker version and settings. User-defined networks typically have better DNS/service discovery.
4.2 Inspect the network itself
docker network ls
docker network inspect <network_name> | jq '.[0].IPAM, .[0].Options, .[0].Containers'
Look for unusual options, subnets overlapping with VPN routes, or custom gateways.
5. Debug with dig, nslookup, getent, and strace
5.1 Use a dedicated debug container on the same network
If your application image is minimal, don’t pollute it—attach a toolbox container to the same network:
docker run --rm -it --network <network_name> nicolaka/netshoot bash
netshoot includes dig, tcpdump, iproute2, and more.
Alternatively:
docker run --rm -it --network <network_name> alpine:3.20 sh
apk add --no-cache bind-tools drill busybox-extras
5.2 Compare resolver paths: getent vs dig
getent hosts example.comuses the system’s NSS and resolver config.dig example.comqueries DNS more directly and is less affected by NSS order.
Run:
getent hosts example.com
dig example.com
dig +search example.com
If dig works but getent fails, suspect:
nsswitch.conforderingsearch/ndotsbehavior causing timeouts- application using a different resolver path (e.g., Go’s net resolver modes)
5.3 Query Docker’s embedded DNS explicitly
If /etc/resolv.conf points to 127.0.0.11:
dig @127.0.0.11 example.com
dig @127.0.0.11 tasks.<service> # in Swarm contexts
If that fails, try querying an upstream resolver directly (if reachable):
dig @1.1.1.1 example.com
dig @8.8.8.8 example.com
If upstream works but 127.0.0.11 fails, the embedded DNS or its forwarding path is broken.
5.4 Use strace to see what the app is doing
If you can reproduce with a small command (e.g., curl), trace DNS-related syscalls:
strace -f -e trace=network,connect,sendto,recvfrom,openat,read,write \
curl -I https://example.com --max-time 5
Look for:
- Reads of
/etc/resolv.conf,/etc/nsswitch.conf,/etc/hosts - UDP packets to port 53 (often to 127.0.0.11)
- Timeouts or
ECONNREFUSED
This is especially useful when the application has its own DNS behavior.
6. Distinguish DNS Failures from Network Failures
6.1 Check routing and interface state
Inside the container:
ip addr
ip route
On the host, identify the veth pair and bridge:
docker inspect <container> --format '{{.NetworkSettings.SandboxKey}}'
# Example output: /var/run/docker/netns/xxxxxxxx
Then:
# List interfaces on host
ip link
6.2 Test UDP/53 reachability to the resolver
If the resolver is 127.0.0.11, you’re testing connectivity to Docker’s embedded DNS (local inside namespace). If resolver is a real IP (e.g., 10.0.0.2), test:
# netcat may not be present; in netshoot it is
nc -vu -w 2 10.0.0.2 53
For TCP/53:
nc -vz -w 2 10.0.0.2 53
Some DNS servers require TCP for large responses or when UDP is blocked.
6.3 Look for MTU blackholes (DNS can be affected)
Large DNS responses (DNSSEC, many records) can fragment. If fragmentation is blocked, you get timeouts.
Inside container:
ip link show eth0
Try lowering MTU temporarily (in a test container) or test path MTU with tracepath (in netshoot):
tracepath 1.1.1.1
7. Check the Host: systemd-resolved, NetworkManager, and /etc/resolv.conf
Docker typically reads the host’s resolver configuration and propagates it (or uses daemon config). But modern Linux often uses systemd-resolved, which can create a stub resolver at 127.0.0.53 on the host.
7.1 Inspect host /etc/resolv.conf
On the host:
ls -l /etc/resolv.conf
cat /etc/resolv.conf
If it points to 127.0.0.53, Docker might copy that into containers in some setups, which is usually wrong because 127.0.0.53 inside a container refers to the container itself, not the host.
7.2 Check systemd-resolved status (host)
resolvectl status
Look for:
- DNS Servers
- DNS Domain (search domains)
- Per-link DNS settings (VPN interfaces often set these)
If your environment uses split DNS (e.g., *.corp.example.com via VPN DNS), Docker’s forwarding may not respect per-link rules unless configured carefully.
7.3 Configure Docker daemon DNS explicitly (host)
If upstream resolvers are flaky or the host uses a stub resolver, set DNS servers in Docker daemon config.
Edit (host):
sudo mkdir -p /etc/docker
sudo nano /etc/docker/daemon.json
Example:
{
"dns": ["1.1.1.1", "8.8.8.8"],
"dns-options": ["timeout:2", "attempts:3"],
"dns-search": []
}
Then restart Docker:
sudo systemctl restart docker
Recreate containers to pick up changes.
Note: If you rely on corporate DNS or split DNS, hardcoding public resolvers may break internal names. In that case, set DNS to your corporate resolvers (reachable from Docker networks) or use a local caching forwarder that understands split DNS.
8. Common Root Causes and Fixes
8.1 Container has nameserver 127.0.0.53 (host stub leaked into container)
Symptom: DNS fails instantly or times out; dig @127.0.0.53 fails.
Fix options:
- Configure Docker daemon
"dns": [...]as above - Or run container with explicit DNS:
docker run --rm -it --dns 10.0.0.2 --dns 10.0.0.3 alpine:3.20 sh
8.2 VPN / split DNS not working from containers
Symptom: Host resolves internal.corp, container cannot.
Why: VPN client sets per-interface DNS rules; Docker’s embedded DNS forwards using a simpler upstream list and may not follow split routing rules.
Debug:
- On host:
resolvectl statusto see which interface provides which DNS - In container:
dig internal.corpanddig @<corp_dns> internal.corp
Fix approaches:
- Use corporate DNS servers directly for Docker (
daemon.json) - Ensure Docker subnets do not overlap with VPN routes (see next section)
- Run a local DNS forwarder on the host (e.g.,
dnsmasqorunbound) that implements split DNS and point Docker to it (host IP reachable from containers)
8.3 Subnet overlap between Docker networks and corporate/VPN networks
Symptom: Some domains resolve but connections fail; or DNS servers are “unreachable” from containers.
Why: If Docker uses 172.16.0.0/12 and your VPN also routes parts of that, packets may go the wrong way.
Debug:
- Host routes:
ip route
- Docker network subnets:
docker network inspect bridge | jq '.[0].IPAM.Config'
docker network ls
docker network inspect <network> | jq '.[0].IPAM.Config'
Fix: Create Docker networks on non-overlapping subnets:
docker network create --subnet 10.200.0.0/24 mynet
For the default bridge, you can change Docker’s default address pools in daemon.json:
{
"default-address-pools": [
{"base":"10.200.0.0/16","size":24}
]
}
Restart Docker and recreate networks/containers.
8.4 Firewall blocking UDP/53 (or TCP/53)
Symptom: dig times out; tcpdump shows queries leaving but no replies.
Debug:
- On host: check firewall rules (iptables/nftables)
- In netshoot container: capture traffic:
tcpdump -ni any port 53
If queries leave but no response returns, check upstream firewall/VPN policies.
Fix: Allow DNS traffic from Docker subnets to DNS servers. On Linux hosts using nftables/iptables, rules vary widely; ensure NAT and forward policies permit it.
8.5 ndots and search domains causing long delays
Symptom: Resolution eventually works but takes seconds; app startup slow.
Debug:
Check /etc/resolv.conf:
cat /etc/resolv.conf
If you see options ndots:5 and a search list, try:
time getent hosts example.com
time getent hosts example
Fix options:
- Reduce search domains
- Set
ndots:1orndots:0for containers that mostly use FQDNs:
docker run --rm -it --dns-option ndots:1 alpine:3.20 sh
In Compose:
docker compose run --rm --dns-option ndots:1 <service> sh
8.6 Alpine/musl vs Debian/glibc differences
Symptom: Same config works in Debian container but not in Alpine.
Why: musl libc resolver differs from glibc in search/timeout behavior and edge cases.
Debug:
Use dig to bypass libc differences:
dig example.com
Fix:
- Prefer consistent base images for network-sensitive apps
- Explicitly configure resolver options
- Consider using
glibc-based images if you hit musl-specific resolver limitations
9. Docker Compose and DNS: Service Discovery vs External Resolution
Compose creates a default network (unless configured otherwise), and service names become DNS names.
9.1 Verify service discovery
Assume services web and db on the same Compose network.
From web:
docker compose exec web getent hosts db
docker compose exec web dig db
If db doesn’t resolve:
- Ensure both services share the same network
- Ensure you didn’t set
network_mode: hostfor one service and not the other - Check for multiple networks and which one is default
9.2 Inspect Compose networks
docker compose ps
docker network ls | grep "$(basename "$PWD")"
docker network inspect <compose_network> | jq '.[0].Containers'
9.3 Beware network_mode: host
If a container uses host networking, it uses the host’s network stack and DNS behavior, not Docker’s embedded DNS. This can “fix” some DNS issues but breaks service discovery and isolation.
10. Advanced: Packet Capture and Query Tracing
When you need proof of where the query dies, capture packets.
10.1 Capture inside a debug container
Run netshoot on the same network:
docker run --rm -it --network <network_name> --cap-add NET_ADMIN nicolaka/netshoot bash
Capture DNS:
tcpdump -ni any port 53
In another terminal, trigger resolution:
docker exec -it <container> getent hosts example.com
Interpretation:
- You should see a query from container IP to 127.0.0.11 (if embedded DNS)
- Then a forwarded query from Docker (on the host side) to upstream resolvers
- If you only see the first hop, forwarding is failing
- If you see forwarded queries but no replies, upstream path is failing
10.2 Capture on the host (bridge interface)
Identify the bridge:
docker network inspect <network_name> | jq -r '.[0].Options["com.docker.network.bridge.name"]'
If null, it might be something like br-<id>. List bridges:
ip link show type bridge
Capture:
sudo tcpdump -ni br-xxxxxxxx port 53
This helps confirm whether packets leave the container namespace and reach the host bridge.
10.3 Query tracing with dig +trace
+trace walks the DNS hierarchy and bypasses your configured resolver (it queries root servers, then TLD, etc.):
dig +trace example.com
If dig +trace works but normal dig example.com fails, your configured resolver or forwarding path is the issue, not global DNS.
11. Advanced: IPv6, DNS over TLS/HTTPS, and MTU Edge Cases
11.1 IPv6 inside containers
If your app prefers IPv6 and Docker/network doesn’t support it properly, you can see confusing failures.
Check:
ip -6 addr
getent ahosts example.com
If AAAA records resolve but connectivity fails, you might need to:
- Enable IPv6 in Docker daemon
- Or force IPv4 in the application (e.g.,
curl -4 ...)
Test:
curl -4 -I https://example.com --max-time 5
curl -6 -I https://example.com --max-time 5
11.2 DNS over HTTPS/TLS (DoH/DoT)
Some environments intercept or block UDP/53 but allow HTTPS. If your container uses a DoH client (or a library that does), the “DNS issue” might actually be HTTPS egress restrictions, proxy requirements, or certificate interception.
Debug by verifying:
- Whether the app is using system DNS at all
- Whether it connects to known DoH endpoints
Use strace or application logs to confirm.
11.3 MTU and fragmentation
DNS responses with DNSSEC can exceed typical UDP sizes. If fragmentation is blocked, you get timeouts.
Debug with dig forcing smaller sizes:
dig example.com +dnssec
dig example.com +bufsize=1232
If +bufsize=1232 works but default fails, suspect PMTU/fragmentation issues.
12. Hardening and Best Practices
12.1 Use a predictable DNS strategy
Options:
- Use Docker embedded DNS (default on user-defined networks) and ensure upstream resolvers are correct.
- Pin DNS servers at daemon level (
/etc/docker/daemon.json) for consistent behavior. - Run a local caching resolver (dnsmasq/unbound) reachable from containers to improve performance and control (timeouts, split DNS).
12.2 Keep Docker networks non-overlapping
Plan subnets to avoid VPN/corporate overlaps. Use default-address-pools to prevent surprises when new networks are created.
12.3 Add a standard “debug toolbox” workflow
Instead of modifying production images, keep a known debug container:
docker run --rm -it --network <network> nicolaka/netshoot bash
Common commands to memorize:
cat /etc/resolv.conf
getent hosts name
dig @127.0.0.11 name
dig @<upstream_dns> name
tcpdump -ni any port 53
ip route
12.4 Explicitly set resolver options for latency-sensitive apps
If search domains are unnecessary, reduce them. Consider:
--dns-option ndots:1--dns-option timeout:2--dns-option attempts:2
Example:
docker run --rm -it \
--dns 10.0.0.2 \
--dns-option ndots:1 \
--dns-option timeout:2 \
alpine:3.20 sh
12.5 Validate from the same network namespace as the app
Always test from:
- the same container, or
- a debug container attached to the same Docker network
Testing from the host alone can mislead you because host DNS and routing may differ substantially.
Practical Debug Session (Putting It All Together)
Assume: curl https://example.com fails inside container with “Could not resolve host”.
-
Check resolver config:
docker exec -it app cat /etc/resolv.conf docker exec -it app cat /etc/nsswitch.conf -
Test system resolver and direct DNS:
docker exec -it app getent hosts example.com docker exec -it app sh -lc 'command -v dig && dig example.com || echo "dig not installed"' -
Attach netshoot to same network and test:
NET=$(docker inspect app --format '{{range $k,$v := .NetworkSettings.Networks}}{{$k}}{{end}}') docker run --rm -it --network "$NET" nicolaka/netshoot bash dig @127.0.0.11 example.com dig @1.1.1.1 example.com tcpdump -ni any port 53 -
If upstream works but 127.0.0.11 fails:
- Inspect Docker daemon DNS settings
- Check host firewall rules and Docker logs:
sudo journalctl -u docker --since "1 hour ago"
-
If 127.0.0.11 works but
getent/app fails:- Inspect
search/ndots - Consider libc differences
- Use
straceon the failing command:docker exec -it app strace -f -e trace=network,openat,read,write curl -I https://example.com --max-time 5
- Inspect
This workflow reliably tells you whether the failure is:
- application resolver behavior
- container resolver config
- Docker embedded DNS forwarding
- upstream DNS reachability
- routing/firewall/VPN/MTU issues
Closing Notes
DNS inside Docker is not “just DNS”; it’s DNS + namespaces + forwarding + host resolver policy. The fastest path to a fix is to avoid guessing and instead:
- inspect
/etc/resolv.confandnsswitch.conf - compare
getentvsdig - query
@127.0.0.11vs upstream resolvers - capture traffic on port 53 when needed
- validate host resolver and VPN split-DNS behavior
If you share (1) /etc/resolv.conf from the container, (2) resolvectl status from the host, and (3) the output of dig @127.0.0.11 example.com, you can usually pinpoint the root cause with high confidence.