Apache Kafka Foundation Course - Producer Workflow


Welcome to Apache Kafka tutorial at Learning journal. In the previous session, we created a Kafka producer. In this session, we will look at the internals of a Kafka producer. We will look at what is going on under the hood. We will try to understand that how a message goes from a client application to a Broker. So, let's get started.

Kafka Producer Workflow
Fig.1-Kafka Producer Workflow

Producer Configurations

The first step is to create a Java properties object and package all the producer configurations that we want to set. These settings include three mandatory configurations that we learned in the previous session.

  1. bootstrap.servers
  2. key.serializer
  3. value.serializer

You can also set some additional properties or even custom configs. In the example that we created earlier, we used only three basic configs, but I will create some more examples in next session with other properties including custom configs.


Producer Record

On the other side, we create a producer record and package five things in a ProducerRecord object. Those five things are listed below.

  1. Topic Name
  2. Partition Number
  3. Timestamp
  4. Key
  5. Value

The partition number, timestamp, and key are optional depending upon your use case. The ProducerRecord object is, in fact, the message that we want to send to Kafka Broker.
Once we have the Properties and the ProducerRecord definition, we instantiate a Producer object using the Properties object. Then we send the ProducerRecord to the producer object. That’s it. The message is handed over to the producer. When the message is handed over to the producer, following things happen.

Serialization

The Producer will apply the serializer to serialize your Key and Value. That's the first thing. You already know that serialization is converting your Key and Value objects into a byte array, and the producer will use the serializer class that we specified to accomplish this.

Partitioning

Then, it will send the record to the partitioner. The partitioner will choose a partition for the message. We already discussed earlier that the default partitioner would use your message key to determine an appropriate partition. If a message key is specified, Kafka will hash the key for getting a partition number. If you specify the same key to multiple messages, all of them will go to the same partition.
If message key is not specified, the default partitioner will try to evenly distribute the messages to all available partitions for the topic. It uses a round robin algorithm, so few messages go to the first partition, then some of them goes to second and so on.


Partition Buffer

Once we have a partition number, the partitioner is ready to send it to Broker. But instead of sending the message immediately, the partitioner will keep the message into a partition buffer. The producer maintains an in-memory buffers for each partition and sends the records in batches. You might be wondering that what is the size of the batch? How much time the producer will linger waiting for more messages to arrive? We can configure all those things by adding appropriate configuration parameters to the properties object. I will cover them in the upcoming sessions.

Record Metadata and Retires

Finally, the producer will send a batch of records to the broker. If the broker can receive and save the message, it will send an acknowledgment in the form of RecordMetadata object. If anything goes wrong, the producer receives an error. Some errors may be recoverable with a retry, for example, suppose the leader of the partition was down, if we retry sending the batch in few milliseconds, we may have a new leader elected by that time. So, In the case of a recoverable error, the producer will retry sending the batch before it throws and exception.
We can configure the number of retries and time between two retires. The producer will not attempt for a retry if the error is not a recoverable error.
Great, the workflow of a producer is quite simple, and we can configure almost everything using the producer configuration parameters.
That's it for this session. In next session, I will cover some more details of Kafka producer APIs, See you again.


You will also like:


Pure Function benefits

Pure Functions are used heavily in functional programming. Learn Why?

Learning Journal

Apache Spark Introduction

What is Apache Spark and how it works? Learn Spark Architecture.

Learning Journal

Referential Transparency

Referential Transparency is an easy method to verify the purity of a function.

Learning Journal

Scala Variable length arguments

How do you create a variable length argument in Scala? Why would you need it?

Learning Journal

Free virtual machines

Get upto six free VMs in Google Cloud and learn Bigdata.

Learning Journal