Welcome to Apache Kafka tutorial at Learning journal. In the previous session, we created a Kafka producer. In this session, we will look at the internals of a Kafka producer. We will look at what is going on under the hood. We will try to understand that how a message goes from a client application to a Broker. So, let's get started.
Producer Configurations
The first step is to create a Java properties object and package all the producer configurations that we want to set. These settings include three mandatory configurations that we learned in the previous session.
- bootstrap.servers
- key.serializer
- value.serializer
You can also set some additional properties or even custom configs. In the example that we created earlier, we used only three basic configs, but I will create some more examples in next session with other properties including custom configs.
Producer Record
On the other side, we create a producer record and package five things in a ProducerRecord object. Those five things are listed below.
- Topic Name
- Partition Number
- Timestamp
- Key
- Value
The partition number, timestamp, and key are optional depending upon your use case. The
ProducerRecord object is, in fact, the message that we want to send to Kafka Broker.
Once we have the
Properties and the
ProducerRecord definition, we instantiate a
Producer object using the
Properties object. Then we send the
ProducerRecord to the producer object. That’s it. The message is handed over to the producer.
When the message is handed over to the producer, following things happen.
Serialization
The Producer will apply the serializer to serialize your Key and Value. That's the first thing. You already know that serialization is converting your Key and Value objects into a byte array, and the producer will use the serializer class that we specified to accomplish this.
Partitioning
Then, it will send the record to the partitioner. The partitioner will choose a partition for the message. We already discussed
earlier that the default partitioner would use your message key to determine an appropriate partition.
If a message key is specified, Kafka will hash the key for getting a partition number. If you specify
the same key to multiple messages, all of them will go to the same partition.
If message key is not specified, the default partitioner will try to evenly distribute the messages
to all available partitions for the topic. It uses a round robin algorithm, so few messages go to
the first partition, then some of them goes to second and so on.
Partition Buffer
Once we have a partition number, the partitioner is ready to send it to Broker. But instead of sending the message immediately, the partitioner will keep the message into a partition buffer. The producer maintains an in-memory buffers for each partition and sends the records in batches. You might be wondering that what is the size of the batch? How much time the producer will linger waiting for more messages to arrive? We can configure all those things by adding appropriate configuration parameters to the properties object. I will cover them in the upcoming sessions.
Record Metadata and Retires
Finally, the producer will send a batch of records to the broker. If the broker can receive and save the message, it will
send an acknowledgment in the form of
RecordMetadata object. If anything goes wrong, the producer receives an error. Some errors
may be recoverable with a retry, for example, suppose the leader of the partition was down, if we
retry sending the batch in few milliseconds, we may have a new leader elected by that time. So, In
the case of a recoverable error, the producer will retry sending the batch before it throws and exception.
We can configure the number of retries and time between two retires. The producer will not attempt
for a retry if the error is not a recoverable error.
Great, the workflow of a producer is quite simple, and we can configure almost everything using
the producer configuration parameters.
That's it for this session. In next session, I will cover some more details of Kafka producer
APIs, See you again.