This doc is about a Kafka troubleshooting guide
https://kafka.apache.org/intro https://zookeeper.apache.org/
kubectl is a CLI to connect to the kubernetes cluster from your machine
Install Visualstudio IDE Code for better code/configuration editing capabilities
Git
Using the below command you can able list down the Kafka brokers and their status
kubectl get pods -n kafka-cluster
If Kafka brokers are in crashloopbackoff or Error status
Describe the brokers and look for error
kubectl describe kafka-v2-0 -n kafka-cluster
kubectl describe kafka-v2-1 -n kafka-cluster
kubectl describe kafka-v2-2 -n kafka-cluster
Check Kafka broker's logs for error
kubectl logs -f kafka-v2-0 -n kafka-cluster
kubectl logs -f kafka-v2-1 -n kafka-cluster
kubectl logs -f kafka-v2-2 -n kafka-cluster
If brokers are in crashloopbackoff due to disk space issues, follow the below document for the cleanup of the logs
Ensure Zookeeper pods are running without any errors in order to run Kafka brokers without a hitch
If Zookeeper pods are in crashloopbackoff or Error status, Use the below commands to check the error
Describe the Zookeeper and look for error
kubectl describe zookeeper-v2-0 -n zookeeper-cluster
kubectl describe zookeeper-v2-1 -n zookeeper-cluster
kubectl describe zookeeper-v2-2 -n zookeeper-cluster
Check Kafka broker's logs for error
kubectl logs -f zookeeper-v2-0 -n zookeeper-cluster
kubectl logs -f zookeeper-v2-1 -n zookeeper-cluster
kubectl logs -f zookeeper-v2-2 -n zookeeper-cluster
In this tutorial, we will go through the step by step process to reset the offset of the Kafka consumer group
Consumer offset is used to track the messages that are consumed by consumers in a consumer group. A topic can be consumed by many consumer groups and each consumer group will have many consumers. A topic is divided into multiple partitions.
A consumer in a consumer group is assigned to a partition. Only one consumer is assigned to a partition. A consumer can be assigned to consume multiple partitions.
Consumer offset is managed at the partition level per consumer group.
Why reset the consumer offset?
In some scenarios, consumers which consumed the messages from a Kafka partition could have resulted in errors and the consumption would have been incomplete. In such cases of consumption failures you may have a need to re-consume the messages which were previously consumed. In such instances you would have to reset the consumer offset to an earlier offset.
Follow the steps below if consumers stop consuming data from consumer group topics for any reason.
Get a Shell to a Kafka broker
Find the current consumer offset
Use the kafka-consumer-groups along with the consumer group id followed by a describe.
You will see 2 entries related to offsets – CURRENT-OFFSET and LOG-END-OFFSET for the partitions in the topic for that consumer group. CURRENT-OFFSET is the current offset for the partition in the consumer group.
If you find out any topic lags that are not getting cleared then use the following steps to reset the consumer offset
Scale down the respective consumer group service (eg. for egov-infra-persist you have to scale down the egov-persister service )
Reset the consumer offset
Use the kafka-consumer-groups to change or reset the offset. You would have to specify the topic, consumer group and use the –reset-offsets flag to change the offset.
Reset offsets to offset from datetime. Format: ‘YYYY-MM-DDTHH:mm:SS.sss’
Scale up the respective consumer group service
The following steps illustrates the way to cleanup Kafka logs.
For any logs that appear to be overflowing and consuming disk space, you can use the following steps to clean up the logs from Kafka brokers
Note: Make sure the team is informed before doing this activity. This activity will delete the Kafka topic data
Backup list of log file names and their disk consumption data (optional)
kubectl exec -it kafka-v2-0 -- du -h /opt/kafka-data/logs |tee backup_0.logs
kubectl exec -it kafka-v2-1 -- du -h /opt/kafka-data/logs |tee backup_1.logs
kubectl exec -it kafka-v2-2 -- du -h /opt/kafka-data/logs |tee backup_2.logs
Cleanup the logs
kubectl exec -it kafka-v2-0 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster
kubectl exec -it kafka-v2-1 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster
kubectl exec -it kafka-v2-2 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster
3. If the pod is in crashlookbackoff state, and the storage is full, use the following workaround:
Make a copy of the pod manifest
kubectl get statefulsets kafka-v2 -n kafka-cluster -oyaml > manifest.yaml
Scale down the Kafka statefulset replica count to zero
kubectl scale statefulsets kafka-v2 -n kafka-cluster --replicas=0
Make the following changes to the copy of the statefulsets manifest file
Modify the command line from:
To
Apply this statefulsets manifest and scale up statefulsets replica count to 3, the pod should be in a running state now and follow [step 2].
Again scale down the Kafka statefulset replica count to zero
kubectl scale statefulsets kafka-v2 --replicas=0 -n kafka-cluster
Make the following changes to the copy of the statefulsets manifest file
Modify the command line from:
To
Apply this statefulsets manifest and scale up statefulsets replica count to 3