Troubleshooting networking issues in a Kubernetes cluster can be a challenging task, especially when it comes to capturing TCP packets. From network connectivity problems to latency issues, there are several factors that can impact the performance and stability of your cluster.

In this article, we will provide you with a step-by-step guide on how to capture TCP packets using powerful tools like tcpdump and netstat. These tools allow you to examine the flow of TCP packets within your Azure Kubernetes Service (AKS) cluster, helping you identify and resolve networking issues efficiently.

Traditionally, tools like tcpdump and netstat are interactive, requiring constant monitoring and interaction with the command line. However, we will explore ways to run these tools in the background as part of a script, enabling you to gather data from all the nodes in your AKS cluster effortlessly.

Whether you’re a seasoned Kubernetes expert or just starting your journey, this article will equip you with the knowledge and techniques needed to troubleshoot networking problems in your Azure Kubernetes Service cluster. With that in mind, let’s see how tcpdump and netstat can help unravel networking intricacies in your Kubernetes cluster.

TCP Captures

In this section, we’ll focus on gathering the tcp captures. These can be viewed and interpreted by tools like Wireshark to discover tcp flow issues, among other things.

Create debug namespace

To keep things clean and organized, we create a debug namespace in which we will put all our debug pods that help do the tcp captures.

debug_namespace=node-debugger-ns
sleepTime=$(($duration + 10)) # 10 extra seconds, to allow container initialization to take place
kubectl config use-context $my_cluster
kubectl create namespace $debug_namespace

Get cluster’s external ip

Next, we need to get our cluster’s external ip since we need it further in our scripts:

clusterIp=$(kubectl get services \
    --namespace $my_namespace \
    $my_namespace-nginx-ingress-controller \
    --output jsonpath='{.status.loadBalancer.ingress[0].ip}')

echo "clusterIp: $clusterIp"

Find nodes with Nginx

Now that we have the cluster ip, we need to find kubernetes nodes with Nginx pods deployed on them. Why Nginx, because that is what we used as an ingress controller for our cluster. This means that external traffic will hit the nginx pods (meaning it hits nodes which have nginx deployed on them).

Let’s find those nodes:

for node in $(kubectl get nodes -o name);
    do
        nodename=${node##*/}
        for pod in $(kubectl get pods --namespace=$my_namespace --field-selector spec.nodeName=$nodename -o=name | sed "s/^.\{4\}//")
            do
                if [[ "$pod" == *"ingress"* ]]; then
                    echo "ingress pod: $pod"
                    tcp_capture $node $clusterIp & # for each node we execute the tcp_capture function in the background
                    break # only 1 tcp_capture run per node is needed
                fi
            done
    done

One thing to note is that while a node can have multiple nginx pods, we only need one ingress pod per node, so we need to break the loop after we find the first one.

Once we have all the nodes, we can start the tcp capture process. We’ll present some fragments of the tcp_capture function and at the end of the section we’ll put all the code. Remember that our tcp_capture function is launched for all nodes in paralel.

Capture the packets

Let’s start by creating a debug pod, inside our debug namespace, that will be used to install and run tcpdump utility. For this we use the kubectl command-line tool.

#used to capture only hosts we are interested in
captureHosts="host 1.2.3.4 or host 2.3.4.5" # we can make this as param
kubectl debug ${node} --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 --namespace=$debug_namespace --stdin << EOF
    apt-get update && apt-get install tcpdump procps -y
    nohup tcpdump -W 1 --snapshot-length=0 -vvv -S $captureHosts -w /tmp/capture.pcap > /dev/null 2>&1 &
EOF
debugpod=$(kubectl --namespace=$debug_namespace get pods --no-headers -o custom-columns=":metadata.name" | grep "$nodename")

Once we have the debug pod with the tcpdump utility installed, we need to wait for the capture to finish:

echo "Sleeping for $sleepTime seconds to allow tcpdump to complete..."
sleep $sleepTime

When the sleepTime duration expires, the script will resume execution. At this point we need to stop the tcpdump process that is running on our debugpod, to be able to download the capture file on our local machine:

kubectl --namespace=$debug_namespace exec $debugpod -- /bin/bash -c "pkill tcpdump" #not nice, but efficient
echo "Stopped tcpdump on pod: $debugpod!"

With the tcpdump process stopped, it’s time to copy the tcp capture file to our local machine, providing a name of our choosing. This will make it easier to organize all the tcp capture files from all the kubernetes nodes:

now=$(date +"%d_%m_%Y_%H_%M_%S")
captureName=$now-$nodename-capture.pcap

kubectl --namespace=$debug_namespace cp $debugpod:/tmp/capture.pcap $captureName
echo "Copied capture $captureName to local!"

As seen in the command, the tcp capture will be copied to the local system in the same folder as where you are running the script from.

Cleanup

Once we have the tcp captures on our machine, we have to delete our debug pod. Otherwise, this pod will use additional resources from our cluster:

kubectl --namespace=$debug_namespace delete pod $debugpod
echo "Deleted pod $debugpod!"

The last step in this tcp capture guide is to delete the debug namespace from our cluster. This will leave the cluster as we initially found it, before starting this guide.

kubectl delete namespace $debug_namespace
echo "Deleted namespace: $debug_namespace!"

Netstat

Sometimes tcp captures are not enough to troubleshoot an issue. We can use netstat utility if we need to dig a bit deeper to see if a connection is in a certain state, like ESTABLISHED, TIME_WAIT, FIN_WAIT1 and others. However, netstat utility alone does not give us enough information to see a connection state in a timeline. Let’s see how the initial netstat output looks like and how we can enhance it to have an even better overview of the status of our connections.

Netstat without timestamps

The output of the netstat does indeed give us valuable information on the status of a certain connection if we need it:

bash-5.1$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 localhost:10246         localhost:38264         TIME_WAIT   
tcp        0      0 localhost:10246         localhost:59606         TIME_WAIT   
tcp        0      0 localhost:10246         localhost:56120         TIME_WAIT   
tcp        0      0 localhost:10246         localhost:37724         TIME_WAIT

However, we have to execute the netstat command manually everytime we want to see the details of our connections. Let’s enhance the output and automate this process so that it helps us even further in our networking troubleshooting.

Netstat with timestamps

Since we want to automate everything we can, let’s create a script that will execute the netstat command and output the result to a file. In this script, before each execution of netstat we will also show a timestamp. This will give us the ability to view when a netstat command was executed.

# empty the file
: > $OUTFILE
for i in $(seq 1 $TIMES_TO_EXECUTE);
    do
        now="$(date)"
        echo "==========$now" >> $OUTFILE
        netstat -nat >> $OUTFILE
        sleep $SLEEP_DURATION
    done

Great! We have a script that can run the netstat command multiple times and save the output to a file. Do note that before running the netstat command each time, we include a timestamp in the output file.

So, if we examine the output file, we will see a timestamp followed by the netstat command output.

==========Wed May 24 14:50:38 UTC 2023
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       
tcp        0      0 0.0.0.0:443             0.0.0.0:38264               LISTEN      
tcp        0      0 10.0.123.4:41694        10.0.0.1:443                ESTABLISHED

As we scroll down, we’ll encounter another timestamp and its corresponding netstat output, and so on. This gives us a good overview. However, if we want to search for a specific port number using grep, we can find it, but it might be challenging to determine when that output was generated and the connection’s status at that time.

To solve this problem, we need an additional script. The purpose of this new script is to take the output file generated by the previous script and append corresponding timestamps to each line in the file:

lastDate=""
while IFS="" read -r line || [ -n "$line" ]
do
    if [[ "$line" == *"=========="* ]]; then
        lastDate=$line
    fi
    echo "$lastDate==========$line" >> $OUTPUT_FILE
    done < $INPUT_FILE

If we execute the above script and inspect the output file, we can see something like below:

bash-5.1$ cat timed-netstat-out.txt | grep 443
==========Wed May 24 14:50:38 UTC 2023==========tcp        0      0 0.0.0.0:443             0.0.0.0:38264               LISTEN      
==========Wed May 24 14:50:38 UTC 2023==========tcp        0      0 10.0.123.4:41694        10.0.0.1:443                ESTABLISHED

This output format is much better and also gives us the ability to search for something in the fixed file and see the state of our connection in time. It can help a lot in determining if you have connections that remain for a long time for example in FIN_WAIT1(aka an orphan connection) or other states, and take appropriate actions to fix the issues.

Conclusion

Troubleshooting networking issues in a Kubernetes cluster can be a challenging task, even for experienced users. However, with the aid of tools like tcpdump and netstat, along with some gluing scripts, the process of resolving these issues can be made easier and more efficient.

Below you can find all three scripts used for capturing tcp packets, run netstat and output result to a file, and also the fixing script that adds timestamps for all rows of the netstat output file.

Click to see script that runs tcp capture on all nodes

        #!/bin/sh
        while getopts ":c:d:n:h:" option; do
            case $option in
                c)
                    cluster=${OPTARG}
                    ;;
                d)
                    duration=${OPTARG}
                    ;;
                n)
                    my_namespace=${OPTARG}
                    ;;
                h) # display Help
                    helper
                    exit;;
                \?) # incorrect option
                    echo "Error: Invalid option, valid options are -c -d -n -h"
                    exit;;
            esac
        done
        
        start=`date +%s`
        startDateTime="$(date)"
        printf "Start date and time %s\n" "$startDateTime"
        debug_namespace=node-debugger-ns
        sleepTime=$(($duration + 10))
        kubectl config use-context $my_cluster
        kubectl create namespace $debug_namespace
        
        clusterIp=$(kubectl get services \
            --namespace $my_namespace \
            $my_namespace-nginx-ingress-controller \
            --output jsonpath='{.status.loadBalancer.ingress[0].ip}')
        echo "clusterIp: $clusterIp"
        
        tcp_capture() {
            nodename=${node##*/}
            echo "       Cluster: $my_cluster"
            echo "     Namespace: $debug_namespace"
            echo "     Node Name: $nodename"
            echo "Type/Node Name: ${node}"
            echo "clusterIp: $clusterIp"
            
            #used to keep capture only hosts we are interested in
            captureHosts="host 1.2.3.4 or host 2.3.4.5" # we can make this as a param and only use if needed
            echo "captureHosts: $captureHosts"
            
            kubectl debug ${node} --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 --namespace=$debug_namespace --stdin << EOF
                apt-get update && apt-get install tcpdump procps -y
                nohup tcpdump -W 1 --snapshot-length=0 -vvv -S $captureHosts -w /tmp/capture.pcap > /dev/null 2>&1 &
            EOF
            
            debugpod=$(kubectl --namespace=$debug_namespace get pods --no-headers -o custom-columns=":metadata.name" | grep "$nodename")
            
            echo "Sleeping for $sleepTime seconds to allow tcpdump to complete..."
            sleep $sleepTime
            
            kubectl --namespace=$debug_namespace exec $debugpod -- /bin/bash -c "pkill tcpdump"
            echo "Stopped tcpdump on pod: $debugpod!"
            
            now=$(date +"%d_%m_%Y_%H_%M_%S")
            captureName=$now-$nodename-capture.pcap
            
            kubectl --namespace=$debug_namespace cp $debugpod:/tmp/capture.pcap $captureName
            echo "Copied capture $captureName to local!"
            
            kubectl --namespace=$debug_namespace delete pod $debugpod
            echo "Deleted pod $debugpod!"
        }
        
        for node in $(kubectl get nodes -o name);
            do
                nodename=${node##*/}
                for pod in $(kubectl get pods --namespace=$my_namespace --field-selector spec.nodeName=$nodename -o=name | sed "s/^.\{4\}//")
                    do
                        if [[ "$pod" == *"ingress"* ]]; then
                            echo "ingress pod: $pod"
                            tcp_capture $node $clusterIp &
                            break # only 1 tcp_capture per node is needed
                        fi
                    done
            done
        
        wait
        
        kubectl delete namespace $debug_namespace
        echo "Deleted namespace: $debug_namespace!"
        
        endDateTime="$(date)"
        printf "Start date and time %s\n" "$startDateTime"
        printf "End date and time %s\n" "$endDateTime"
        
        end=`date +%s`
        runtime=$((end-start))
        echo "Script runtime in seconds: $runtime"
        
        echo "All done."

Click to see script that runs netstat for a duration of time

        #!/bin/bash
        # example: ./netstat-dump.sh 10 netstat-out.txt - will run for 20 seconds, every 2 sec will run a netstat cmd
        TIMES_TO_EXECUTE=$1
        OUTFILE=$2
        SLEEP_DURATION=2 # in seconds
        # empty the file
        : > $OUTFILE
        
        for i in $(seq 1 $TIMES_TO_EXECUTE);
        do
            now="$(date)"
            echo "==========$now" >> $OUTFILE
            netstat -nat >> $OUTFILE
            sleep $SLEEP_DURATION
        done

Click to see script that adds the timestamps to all lines of netstat

        #!/bin/bash
        #example: ./fix-netstat-dump.sh netstat-out.txt timed-netstat-out.txt
        
        INPUT_FILE=$1
        OUTPUT_FILE=$2
        
        lastDate=""
        while IFS="" read -r line || [ -n "$line" ]
        do
        if [[ "$line" == *"=========="* ]]; then
            lastDate=$line
        fi
        echo "$lastDate==========$line" >> $OUTPUT_FILE
        done < $INPUT_FILE

Enhanced Network Troubleshooting with tcpdump and netstat in Azure Kubernetes Service