In Kubernetes operations, the isolation of containers is both a core advantage and a debugging challenge. For instance, basic containers often lack tools like tcpdump and telnet, which hinders network troubleshooting, and performance bottlenecks are difficult to locate due to namespace isolation. We combine the deployment of lightweight debugging containers with the capability of the nsenter command to break through container isolation, forming a practical method that covers network debugging and system performance analysis, effectively addressing the core challenges of K8S operations.
Tool IntroductionIn K8S operations, issues related to network connectivity (such as DNS resolution failures and service access timeouts) and system performance bottlenecks (such as CPU spikes and I/O blocking) account for over 60%. The combination of busybox-dig and nsenter can achieve debugging and analysis, with the former solving the problem of missing tools within containers, and the latter breaking isolation to enable host-level performance analysis.
- busybox-dig
A lightweight network debugging tool, it integrates a slimmed-down busybox image with the dig command, only a few MB in size, and can be deployed via a temporary container to avoid polluting the environment by installing additional tools in business containers.
- nsenter
A tool for breaking container isolation, it is a namespace entry tool that comes with the Linux system. It can enter the network, mount, UTS, and other namespaces of a container through its PID, allowing operations on the resources inside the container from the host. In K8S, containers are essentially processes on the host, and using nsenter allows direct use of host performance analysis tools like top, sar, and iostat, solving the problem of performance troubleshooting due to missing tools within containers.Operational MethodIf you encounter a timeout in the backend service, use busybox-dig to troubleshoot network issues, and then use nsenter to locate performance bottlenecks.First, use busybox-dig to check network connectivity issues. When a service named backend-svc times out, verify network connectivity and DNS resolution using busybox-dig.Deploying the Debugging Container:
kubectl run busyboxdig -it --image=datica/busybox-dig --restart=Never --rm sh
Parameter Explanation:-it # Enable interactive mode to meet command line operation needs;–image=datica/busybox-dig # Specify the image containing the dig tool;–restart=Never # Set as a one-time container to avoid repeated restarts after failure;–rm # Automatically delete the container after exit to avoid resource residue. This command can quickly deploy a temporary debugging container, addressing the pain point of business containers lacking network debugging tools.After executing the command, you will enter the container’s interactive terminal. If the business container is in a specific namespace, you need to add the -n namespace name parameter to ensure the same network environment.DNS Resolution Verification:
dig backend-svc.default.svc.cluster.local
If the ANSWER SECTION in the returned result does not contain the backend IP, it indicates an abnormality in CoreDNS resolution, and you need to check the status of the CoreDNS Pod and service configuration. If the resolution is normal, proceed to the next step.Port Connectivity Test:
telnet 10.244.10.125 80nc -zv 10.244.10.125 80
Use telnet to test the backend IP port or nc -zv to check if the port is open. If the connection fails, you need to investigate network policies, firewall rules, and whether the backend container is listening on the port. If DNS resolution is normal but the port is unreachable, further analysis is required, at which point the nsenter tool is introduced.Using nsenter to Locate Container Performance BottlenecksGet the Container PID:
kubectl inspect pod backend-pod-7f98d7c6b4-2xqzk -o jsonpath='{.status.containerStatuses[0].pid}'
The PID obtained is 31538. This PID is the process ID of the container on the host, and you can use nsenter to enter that namespace for analysis.
nsenter -t 31538 -n -m -p sh
Where -n enters the network namespace, -m enters the mount namespace, and -p enters the process namespace. At this point, terminal operations are equivalent to being inside the container, and you can use command line tools from the host for operations.CPU Analysis:
- Execute top to view process CPU usage. If a certain process’s CPU usage remains at 100%, you can locate the issue of that process by combining with business logs;
- Use sar -u 1 5 to check the overall CPU load and determine if there is resource contention.
Memory Analysis:
- Check memory usage with free -h;
- Use vmstat 1 5 to observe memory swapping. If the swap usage rate is too high, it indicates insufficient memory, and you need to adjust the container’s resource limits.
I/O Analysis
- Execute iostat -x 1 5 to check disk I/O usage. If %util is close to 100%, it indicates disk I/O saturation, and you need to check if the business has a large number of read/write operations or if the storage volume performance is insufficient.
If you find that the backend container is frequently restarted due to OOM Killer triggered by memory leaks, you can resolve the issue by adjusting memory resource limits and optimizing the code.