Welcome to Apache Kafka tutorial at Learning journal. In the previous video, we started a multi-node cluster on a single
machine. We have also seen some configuration parameters like broker id, port number, and log dirs.
In this session, we will look at some more configurations.
Apache Kafka is a highly configurable system. It provides many configurable parameters. Most of them have a reasonable default, so you don't need to worry about all of them. In this session, I will cover some key broker configurations. We can't cover everything, so I have selected some critical parameters for our discussion. But I recommend that you check the documentation for the complete list of Kafka configuration parameters . Reading them at least once will give you some idea about available configurations. The overview of the available options will help you customize Kafka for your use case. Here is the list of parameters that we will discuss in this session.
I am skipping first three because we have already covered them in previous session.
This parameter takes zookeeper connection string. The connection string is simply a hostname with a port number. We already
know that Kafka uses zookeeper for various coordination purposes, so it is critical that every broker
knows the zookeeper address. This parameter is also necessary to form a cluster.
What do I mean by forming a Cluster?
Well, all brokers are running on different systems, how do they know about each other. If they don't know each other, they are not part of the cluster. So, the zookeeper is the connecting link among all brokers to form a cluster.
If you want to delete a topic, you can use topic management tool. But by default, deleting a topic is not allowed. You can't remove a topic because the default value for this parameter is false. That is reasonable protection for production environments. But in development or testing environment, you may want to delete topics. So, if you want Kafka to allow deleting a topic, you need to set this parameter to true.
We have already discussed auto-create topic feature. If a producer starts sending messages to a non-existent topic, Kafka will create the topic automatically and accept the data. This behaviour is suitable for dev environments. But in a production environment, you may want to implement a more controlled approach. You can set this parameter to false, and Kafka will stop creating topics automatically. You can create topics manually using the topic management tool, and no one will be able to send data to a non-existent topic.
default.replication.factor and num.partitions
These two parameters are quite straightforward. The default values for both of them is one, and they are effective when you have auto create topics enabled. So, if Kafka is creating your topic automatically, the new topic will have only one partition and a single copy. If you want some other values, you can change the default settings accordingly.
log.retention.ms and log.retention.bytes
These two are critical and not obvious. So, whatever data you send to Kafka, it is not retained by Kafka forever. Kafka is
not a database. You don't send data to Kafka for storage so that you can query it later. It is a
message broker. It should deliver the data to the consumer and then clean it up. There is no reason
to retain messages for longer than needed.
Kafka gives you two options to configure the retention period. The default option is retention by time, and the default retention period is seven days. So, in this case, Kafka will clean up all the messages older than seven days. If you want to change the duration, you can specify your value for log.retention.ms configuration.
Kafka gives you another option to define this retention period. You can specify it by size. That's where the second parameter log retention bytes is applicable. But this size applies to partition. So, if you set log.retention.bytes = 1 GB, Kafka will trigger a clean-up activity when the partition size reaches to 1 GB. Remember that it is not a topic size. It is partition size.
If you specify both configurations, the clean-up will start on meeting either of the criteria.
That's it for this session. I believe you are ready to start writing some code, so in next session, we will create a custom producer and send some data to Kafka.