Understanding and debugging Kubernetes (K8s) Probes

2 Likes
over 1 year ago

Kubernetes describes two basic mechanisms for checking the health of a container:

Readiness:

Readiness probes are intended to let Kubernetes know when the app is ready to serve traffic. Kubernetes makes sure the readiness probe passes before allowing a service to forward traffic to the pod. Kubernetes don't route work to a container with a failing readiness probe.

A readiness probe can fail if a service is busy, doesn't finish initializing, is overloaded, or unable to process requests.

Liveness: 

Liveness probes let Kubernetes know if the app is alive or dead. If your app is alive, then Kubernetes leaves it alone. Kubernetes stops and restarts a container with a failing liveness probe to ensure that Pods in a defunct state are terminated and replaced.

A liveness probe fails if the service is in an unrecoverable state, for example, if an out of memory condition happens.

The probes (Readiness and Liveness) can be achieved by running a command, checking a TCP endpoint for connectivity, or performing an HTTP invocation. 

  1. HTTP GET request - For success, a request to a particular HTTP endpoint must result in response between 200 and 399.
  2. Execute a command - For success, the execution of a command within the container must result in a return code of 0.
  3. TCP socket check - For success, a specific TCP socket must be successfully opened on the container.

Both readiness and liveness probe run in parallel throughout the life of a container but the probes impact the system in different ways.

When a container starts, the readiness state is initially negative, and it enters a positive state only after the container is working correctly. A liveness check starts in a positive state, and it enters a negative state only when the process becomes inoperative.

 Both probes use the same configuration parameters:

  • initialDelaySeconds: number of seconds to wait before initiating liveness or readiness probes.
  • periodSeconds: how frequently to check the probe.
  • timeoutSeconds: number of seconds before marking the probe as timing out (failing the health check).
  • successThreshold: minimum number of consecutive successful checks for the probe to pass.
  • failureThreshold: number of retries before marking the probe as failed. 

Let us understand the configuration of the probes with an example

 livenessProbe:
  httpGet:
    path: /livenessProbe
    port: http
  initialDelaySeconds: 180
  periodSeconds: 60
  timeoutSeconds: 2
  successThreshold: 1
  failureThreshold: 5

readinessProbe:
  httpGet:
    path: /readinessProbe
    port: http
  initialDelaySeconds: 20
  periodSeconds: 10
  timeoutSeconds: 2
  successThreshold: 1
  failureThreshold: 25

Explanation of the values set in the above configuration:

  • Because of the value set in initialDelaySeconds, livenessProbe will wait for 180 seconds whereas readinessProbe will wait for 20 seconds before performing the first probe.
  • Because of the value set in periodSeconds, livenessProbe will perform a probe every 60 seconds whereas readinessProbe will do a probe every 10 seconds.
  • Given the values set in failureThreshold and periodSeconds, livenessProbe will at least wait for (60*5) 300 seconds before restarting the pod and this timer will start only once the initialDelaySeconds is over.
  • Given the values set in failureThreshold and periodSeconds, readinessProbe will at least wait for (10*25) 250 seconds before disallowing the traffic to the pod and this timer will start only once the initialDelaySeconds is over.

Let me touch upon one more probe in addition to liveness and readiness probes before we look at the debugging section, Startup Probe probe indicates whether the application within the container is started. If this probe is configured, it disables liveness and readiness checks until it succeeds, making sure those probes don't conflict with the application startup. 

Debugging

We can use the following kubectl command to check probe status for a pod

kubectl describe pod <pod-name> 

If you are wondering how to get the pod name, you can use kubectl get pods command.

I have taken out errors from NetIQ Access Manager docker deployment for better understanding but these are just for illustration as the docker support is under heavy development.

If there is no error in the probes, you won't see anything under "Events" after you run kubectl describe command.

1. Following example shows failure in the probes on Identity server (IDP) using kubectl describe command

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 7s kubelet, namk8s-w3 Liveness probe failed: IDP Health Check: Waiting to establish connection
Warning Unhealthy 4s (x11 over 19h) kubelet, namk8s-w3 Readiness probe failed: IDP Health Check: Waiting to establish connection

 2. Following example shows failure in the probes on Access Gateway (AG) using kubectl describe command

 Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 115s kubelet, namk8s-w1 Readiness probe failed: AG Tomcat Health Check: Waiting to establish connection
AG JCC Health Check: Waiting to establish connection
Warning Unhealthy 20s (x2 over 80s) kubelet, namk8s-w1 Liveness probe failed: AG Tomcat Health Check: Waiting to establish connection
AG HTTPD Health Check: Waiting to establish connection
AG Activemq Health Check: Waiting to establish connection
AG JCC Health Check: Waiting to establish connection
Warning Unhealthy 5s (x35 over 19h) kubelet, namk8s-w1 Readiness probe failed: AG Tomcat Health Check: Waiting to establish connection
AG HTTPD Health Check: Waiting to establish connection
AG Activemq Health Check: Waiting to establish connection
AG JCC Health Check: Waiting to establish connection

3. Following example shows Identity server (IDP) pod is restarted because of the failure in livenessProbe

Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4s (x3 over 2m4s) kubelet, namk8s-w3 Liveness probe failed: IDP Health Check: Waiting to establish connection
Normal Killing 4s kubelet, namk8s-w3 Container am-idp failed liveness probe, will be restarted
Warning Unhealthy 1s (x23 over 19h) kubelet, namk8s-w3 Readiness probe failed: IDP Health Check: Waiting to establish connection

To summarise 

Both liveness & readiness probes are used to control the health of an application. Failing a liveness probe will restart the container, whereas a failing readiness probe will stop the application from serving traffic.

Here is the Kubernetes documentation for the reference.

Labels:

How To-Best Practice
Education-Training
Comment List
Anonymous
Related Discussions
Recommended