1 of 3

Kafka Troubleshooting Guide

This doc is about a Kafka troubleshooting guide

Pre-reads

https://kafka.apache.org/intro https://zookeeper.apache.org/

Pre-requisites

kubectl is a CLI to connect to the kubernetes cluster from your machine
Install Visualstudio IDE Code for better code/configuration editing capabilities
Git

Status check of Kafka Broker's

Using the below command you can able list down the Kafka brokers and their status

kubectl get pods -n kafka-cluster

If Kafka brokers are in crashloopbackoff or Error status
- Describe the brokers and look for error
  kubectl describe kafka-v2-0 -n kafka-cluster
kubectl describe kafka-v2-1 -n kafka-cluster
kubectl describe kafka-v2-2 -n kafka-cluster
Check Kafka broker's logs for error
kubectl logs -f kafka-v2-0 -n kafka-cluster
kubectl logs -f kafka-v2-1 -n kafka-cluster
kubectl logs -f kafka-v2-2 -n kafka-cluster
If brokers are in crashloopbackoff due to disk space issues, follow the below document for the cleanup of the logs
- Cleanup of Kafka logs

Status check of Zookeeper

Ensure Zookeeper pods are running without any errors in order to run Kafka brokers without a hitch

If Zookeeper pods are in crashloopbackoff or Error status, Use the below commands to check the error
- Describe the Zookeeper and look for error
  kubectl describe zookeeper-v2-0 -n zookeeper-cluster
  kubectl describe zookeeper-v2-1 -n zookeeper-cluster
  kubectl describe zookeeper-v2-2 -n zookeeper-cluster
- Check Kafka broker's logs for error
  kubectl logs -f zookeeper-v2-0 -n zookeeper-cluster
  kubectl logs -f zookeeper-v2-1 -n zookeeper-cluster
  kubectl logs -f zookeeper-v2-2 -n zookeeper-cluster

How to clean up Kafka logs

The following steps illustrates the way to cleanup Kafka logs.

For any logs that appear to be overflowing and consuming disk space, you can use the following steps to clean up the logs from Kafka brokers

Note: Make sure the team is informed before doing this activity. This activity will delete the Kafka topic data

Steps

Backup list of log file names and their disk consumption data (optional)
kubectl exec -it kafka-v2-0 -- du -h /opt/kafka-data/logs |tee backup_0.logs
kubectl exec -it kafka-v2-1 -- du -h /opt/kafka-data/logs |tee backup_1.logs
kubectl exec -it kafka-v2-2 -- du -h /opt/kafka-data/logs |tee backup_2.logs
Cleanup the logs
kubectl exec -it kafka-v2-0 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster
kubectl exec -it kafka-v2-1 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster
kubectl exec -it kafka-v2-2 -- rm -rf /opt/kafka-data/logs/* -n kafka-cluster

3. If the pod is in crashlookbackoff state, and the storage is full, use the following workaround:

Make a copy of the pod manifest
kubectl get statefulsets kafka-v2 -n kafka-cluster -oyaml > manifest.yaml
Scale down the Kafka statefulset replica count to zero
kubectl scale statefulsets kafka-v2 -n kafka-cluster --replicas=0
Make the following changes to the copy of the statefulsets manifest file
- Modify the command line from:

containers:
- command:
  - sh
  - -exc
  - |
    export KAFKA_BROKER_ID=${HOSTNAME##*-} && \
    export KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${POD_NAME}.kafka-v2-headless.${POD_NAMESPACE}:9092,EXTERNAL://${HOST_IP}:$((31090 + ${KAFKA_BROKER_ID})) && \
    exec /etc/confluent/docker/run

containers:
- command:
  - sh
  - -exc
  - |
    tail -f /dev/null

Apply this statefulsets manifest and scale up statefulsets replica count to 3, the pod should be in a running state now and follow [step 2].
Again scale down the Kafka statefulset replica count to zero
kubectl scale statefulsets kafka-v2 --replicas=0 -n kafka-cluster
Make the following changes to the copy of the statefulsets manifest file
- Modify the command line from:

containers:
- command:
  - sh
  - -exc
  - |
    tail -f /dev/null

containers:
- command:
  - sh
  - -exc
  - |
    export KAFKA_BROKER_ID=${HOSTNAME##*-} && \
    export KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://${POD_NAME}.kafka-v2-headless.${POD_NAMESPACE}:9092,EXTERNAL://${HOST_IP}:$((31090 + ${KAFKA_BROKER_ID})) && \
    exec /etc/confluent/docker/run

Apply this statefulsets manifest and scale up statefulsets replica count to 3

How to change or reset consumer offset in Kafka?

In this tutorial, we will go through the step by step process to reset the offset of the Kafka consumer group

Consumer offset is used to track the messages that are consumed by consumers in a consumer group. A topic can be consumed by many consumer groups and each consumer group will have many consumers. A topic is divided into multiple partitions.

A consumer in a consumer group is assigned to a partition. Only one consumer is assigned to a partition. A consumer can be assigned to consume multiple partitions.

Consumer offset is managed at the partition level per consumer group.

Why reset the consumer offset?

In some scenarios, consumers which consumed the messages from a Kafka partition could have resulted in errors and the consumption would have been incomplete. In such cases of consumption failures you may have a need to re-consume the messages which were previously consumed. In such instances you would have to reset the consumer offset to an earlier offset.

Follow the steps below if consumers stop consuming data from consumer group topics for any reason.

Get a Shell to a Kafka broker

kubectl get pods -n kafka-cluster

kubectl exec -it kafka-v2-0 -n kafka-cluster

Find the current consumer offset
Use the kafka-consumer-groups along with the consumer group id followed by a describe.
```
kafka-consumer-groups --bootstrap-server kafka-v2.kafka-cluster:9092 --group <group_id> --describe
```
You will see 2 entries related to offsets – CURRENT-OFFSET and LOG-END-OFFSET for the partitions in the topic for that consumer group. CURRENT-OFFSET is the current offset for the partition in the consumer group.
If you find out any topic lags that are not getting cleared then use the following steps to reset the consumer offset
1. Scale down the respective consumer group service (eg. for egov-infra-persist you have to scale down the egov-persister service )
  kubectl scale --replicas=0 deployment <deployment_name> -n egov
2. Reset the consumer offset
  Use the kafka-consumer-groups to change or reset the offset. You would have to specify the topic, consumer group and use the –reset-offsets flag to change the offset.
  kafka-consumer-groups --bootstrap-server kafka-v2.kafka-cluster:9092 --group <group_id> --topic <topic_name> --reset-offsets --to-datetime 2021-06-25T21:22:39.306 --execute
  Reset offsets to offset from datetime. Format: ‘YYYY-MM-DDTHH:mm:SS.sss’
3. Scale up the respective consumer group service
  kubectl scale --replicas=2 deployment <deployment_name> -n egov