Troubleshooting Pods in Kubernetes (K8s) with kubectl

2 Likes
over 1 year ago

If you are new to Kubernetes and kubectl, I will suggest you go through my older post first.

Kubectl describe

Describe shows the details of the resource you're looking at. The most common use case is describing a pod or node to check if there's an error in the events, or if resources are not enough to deploy a pod/container.

Resources you can describe include:

  • Nodes
  • Pods
  • Services
  • Deployments
  • Replica sets
  • Cronjobs

In this post, I will concentrate exclusively on the pod.

I have deployed pods in a namespace called nam so you will notice in the following examples that I am using namespace with all the commands.

First I am trying to list of all the pods running in namespace (ns) nam

kubectl get pods -n nam

debug-get-pods.PNG

You will notice that 2 pods are not healthy. How did I find it?

You can look at 2 columns (READY and STATUS) in the output of the above command. READY says 0/1, it means this pod has 1 container but it is not running. In the working scenario, you will see READY says 1/1 or 2/2. This number (1/1 or 2/2) is based on the containers running inside a pod. For example, there are 2 containers inside a pod but one of them is not up then READY will say 1/2.

STATUS says Pending, it means the container is not up. In the working scenario, you will see STATUS is Running.

Let's run describe command to find out more details.

kubectl describe pod <pod-name> -n <namespace>

Here is a small snip of the output.

debug-event.PNG

 To understand this error better, let's look at the number of nodes in the K8s cluster

 debug-get-nodes.PNG

You can see that in my Kubernetes cluster, I have a master node and 4 worker nodes. I am trying to deploy 5 IDPs (Identity server) and 5 AGs (Access Gateway). If you see the message in the output of the describe command, it's not able to find the node to deploy 5th instances of IDP and AG pods.

The describe pods command gives a considerable amount of information, so I am pasting a snippet of the output of kubectl describe command of both working and failed pods by highlighting the difference in some of the fields.

kubectl describe: working pod

namk8s@namk8s-master:~$ kubectl describe pod access-manager-am-idp-3 -n nam
Name: access-manager-am-idp-3
Namespace: nam
Priority: 0
Node: namk8s-w4/10.71.130.235
Start Time: Fri, 21 Aug 2020 16:15:36 0530
Labels: app.kubernetes.io/instance=access-manager
statefulset.kubernetes.io/pod-name=access-manager-am-idp-3
Annotations: <none>
Status: Running
IP: 10.0.0.10
IPs:
IP: 10.0.0.10
Controlled By: StatefulSet/access-manager-am-idp
Containers:
am-idp:
Container ID: docker://0c5e9ca0abecfe2680e70eb0ed93101d5a3c6b0ba24c67c731d9dede1f7e988b
Image: security-accessmanager-docker.demo/am-idp:5.0.0.0-473
Image ID: docker-pullable://security-accessmanager-docker.demo/am-idp@sha256:092aca5d4d7b0c6eb67ac9ffbcc81e72d9875bee73f1179c7093c02880661d85
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 21 Aug 2020 16:16:12 0530
Ready: True
Restart Count: 0
Environment:
Mounts:

Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
idp-storage:
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>

kubectl describe: failed pod

namk8s@namk8s-master:~$ kubectl describe pod access-manager-am-idp-4 -n nam
Name: access-manager-am-idp-4
Namespace: nam
Priority: 0
Node: <none>
Labels: app.kubernetes.io/instance=access-manager
statefulset.kubernetes.io/pod-name=access-manager-am-idp-4
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/access-manager-am-idp
Containers:
am-idp:
Image: security-accessmanager-docker.demo/am-idp:5.0.0.0-473
Port: <none>
Host Port: <none>
Environment:
Mounts:

Type Status
PodScheduled False
Volumes:
idp-storage:

QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 98s (x1135 over 28h) default-scheduler 0/5 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match pod affinity/anti-affinity, 4 node(s) didn't satisfy existing pods anti-affinity rules.

Kubectl logs

While the describe command provides the events occurring for the applications inside a pod, logs offer comprehensive insights into what's happening inside Kubernetes about the pod.

If you want to tail live logs

kubectl logs -f pod/<pod-name> -n <namespace>

Example: kubectl logs -f pod/access-manager-am-idp-3 -n nam

If you want to look at the last logs

kubectl logs pod/<pod-name> -n <namespace>

Example: kubectl logs pod/access-manager-am-idp-3 -n nam

Kubectl exec

Much like the docker exec command, you can also exec into a container to troubleshoot an application directly. This is useful when the logs from the pod haven't explained the issues you may be debugging.

kubectl exec -it <pod-name> -n <namespace> -- sh

Example: kubectl exec -it access-manager-am-idp-3 -n nam -- sh

This is just an example of one of the failure scenarios but hopefully, this will help you to get started with debugging.

 

Labels:

How To-Best Practice
Other
Comment List
Anonymous
Related Discussions
Recommended