Troubleshooting Kubernetes

Troubleshooting Kubernetes

This section covers troubleshooting when using Kubernetes itself.
Besides using the MicroK8s troubleshooting documentation, use the following information to troubleshoot kernel issues.
Troubleshooting the Kernel
Problem
Cause
Solution
Error displayed in logs of the Kubelet service (by default, snap.microk8s.daemonkubelite. service):
open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory
On occasion, the conntrack kernel module might not load. You can then verify if the module is not loaded by running:
lsmod | grep conntrack
If the module is not loaded, you will not see any output.
Load the conntrack module manually:
sudo modprobe nf_conntrack
Use the following information to troubleshoot general Kubernetes issues.
Troubleshooting Kubernetes
Problem
Cause
Solution
When the Kubernetes cluster is under disk pressure, you might see large quantities of pods stuck in the Error, Evicted, or ContainerStatusUnknown states.
This is because the kubelet is unable to create new pods due to the lack of disk space.
Free up disk space on the node (for example, delete unused images), or increase the disk space available to the node.
Run the following command to clean up all pods:
kubectl delete pod --all-namespaces --field-selector=status.phase=Failed
Verify the error is caused by disk pressure by checking the output of the node description:
kubectl get node -o 'jsonpath={.status.conditions[?(@.type=="DiskPressure")].status}'
This returns True if the node is under disk pressure, and False otherwise.
A node has joined the cluster but has communication issues.
The node might be in
NotReady
state.
Remove the failed node by running the command:
sudo -E /snap/bin/microk8s remove-node --force <NODE NAME>
Re-add the node, follow the instructions in the Adding a Node . Only perform the steps for the node that you are re-adding. Run the
--finalize
command from the primary node at the end.
When a node attempts to rejoin, the following error occurs:
Connection failed. The joining node (IP address of failed remote node) is already known to dqlite (504).
You have an incorrect FQDN, or you are unable to log in to the application.
If you install with an incorrect FQDN, you won’t be able to log in to the application.
Uninstall Resonate RFID Reader Management with the command:
sudo -E su - trif-user ./uninstall.sh
The script is in the same folder as install.sh.
Install Resonate RFID Reader Management with the command:
./install.sh -h <RESONATE SERVER FQDN> -m <MODE> --smtp-server <SMTP SERVER FQDN/IP> --admin-email <EMAIL ADDRESS>
Containers fail to download or start: Pod STATUS state is:
ErrImagePull
A container running in a pod fails to pull the required image from a container registry. This error can occur for a variety of reasons, including network connectivity issues, incorrect image name or tag, missing credentials, or insufficient permissions.
Check network connectivity, verify the image name and tag, ensure that credentials are correct and up to date, and ensure that the appropriate permissions are in place. Once the issue is resolved, the pod can be recreated, and the container image should be pulled successfully.
Containers fail to download or start:
Pod STATUS is:
ImagePullBackOff
The result of repeated ErrImagePull errors.
Containers fail to download or start:
Pod STATUS is:
CreateContainerConfigError
Kubernetes can not create the container based on the configuration.
If the pod is stuck in this state, contact Zebra for additional support.
Containers fail to download or start:
Pod STATUS is:
CreateContainerError
Kubernetes can not create the container, but not due to the configuration.
Review container logs. Fix the issue being reported.
Pods are stuck in
Pending
state.
This is often due to a storage issue.
Check the state of your PersistentVolumeClaims (PVCs).
Refer to Resolving Storage Failures