Troubleshooting Kubernetes

Besides using the MicroK8s troubleshooting documentation, use the following information to troubleshoot kernel issues.

Troubleshooting the Kernel
Problem	Cause	Solution
Error displayed in logs of the Kubelet service (by default, snap.microk8s.daemonkubelite. service): open /proc/sys/net/netfilter/nf_conntrack_max: no such file or directory	On occasion, the conntrack kernel module might not load. You can then verify if the module is not loaded by running: lsmod \| grep conntrack If the module is not loaded, you will not see any output.	Load the conntrack module manually: sudo modprobe nf_conntrack

Use the following information to troubleshoot general Kubernetes issues.

Troubleshooting Kubernetes
Problem	Cause	Solution
When the Kubernetes cluster is under disk pressure, you might see large quantities of pods stuck in the Error, Evicted, or ContainerStatusUnknown states.	This is because the kubelet is unable to create new pods due to the lack of disk space.	Free up disk space on the node (for example, delete unused images), or increase the disk space available to the node.
		Run the following command to clean up all pods: kubectl delete pod --all-namespaces --field-selector=status.phase=Failed
		Verify the error is caused by disk pressure by checking the output of the node description: kubectl get node -o 'jsonpath={.status.conditions[?(@.type=="DiskPressure")].status}' This returns True if the node is under disk pressure, and False otherwise.
A node has joined the cluster but has communication issues.	The node might be in NotReady state.	Remove the failed node by running the command: sudo -E /snap/bin/microk8s remove-node --force <NODE NAME> Re-add the node, follow the instructions in the Adding a Node . Only perform the steps for the node that you are re-adding. Run the --finalize command from the primary node at the end.
When a node attempts to rejoin, the following error occurs: Connection failed. The joining node (IP address of failed remote node) is already known to dqlite (504).	The node might be in NotReady state.
You have an incorrect FQDN, or you are unable to log in to the application.	If you install with an incorrect FQDN, you won’t be able to log in to the application.	Uninstall Resonate RFID Reader Management with the command: sudo -E su - trif-user ./uninstall.sh The script is in the same folder as install.sh.
		Install Resonate RFID Reader Management with the command: ./install.sh -h <RESONATE SERVER FQDN> -m <MODE> --smtp-server <SMTP SERVER FQDN/IP> --admin-email <EMAIL ADDRESS> Refer to Installing the Software and install.sh
Containers fail to download or start: Pod STATUS state is: ErrImagePull	A container running in a pod fails to pull the required image from a container registry. This error can occur for a variety of reasons, including network connectivity issues, incorrect image name or tag, missing credentials, or insufficient permissions.	Check network connectivity, verify the image name and tag, ensure that credentials are correct and up to date, and ensure that the appropriate permissions are in place. Once the issue is resolved, the pod can be recreated, and the container image should be pulled successfully.
Containers fail to download or start: Pod STATUS is: ImagePullBackOff	The result of repeated ErrImagePull errors.
Containers fail to download or start: Pod STATUS is: CreateContainerConfigError	Kubernetes can not create the container based on the configuration.	If the pod is stuck in this state, contact Zebra for additional support.
Containers fail to download or start: Pod STATUS is: CreateContainerError	Kubernetes can not create the container, but not due to the configuration.	Review container logs. Fix the issue being reported.
Pods are stuck in Pending state.	This is often due to a storage issue.	Check the state of your PersistentVolumeClaims (PVCs). Refer to Resolving Storage Failures