Kafka Streams : Real-time Stream Processing!

Author : Prashant Kumar Pandey
Publisher : Learning Journal
Release Date : January 2019
Pages : 300 (Approx)


The book Kafka Streams: Real-time Stream Processing! helps you understand the stream processing in general and apply that skill to Kafka streams programming. This book is focusing mainly on the new generation of the Kafka Streams library available in the Apache Kafka 2.0.
The primary focus of this book is on Kafka Streams. However, the book also touches on the other Kafka capabilities and concepts that are necessary to grasp the Kafka Streams programming.
Apache Kafka is currently one of the most popular and necessary components of any Big Data solution. Initially conceived as a messaging queue, Kafka has quickly evolved into a full-fledged streaming platform. As a streaming platform, Kafka provides a highly scalable, fault tolerant, publish and subscribe pipeline of streams of events. With the addition of Kafka Streams library, it is now able to process the streams of events in real-time with millisecond responses to support a variety of business use cases. This book gives you a solid foundation to write modern real-time stream processing applications using Apache Kafka and Kafka Streams library.

Who should read this book

Kafka Streams: Real-time Stream Processing! is written for software engineers willing to develop stream processing application using Kafka streams library. I am also writing this book for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who don’t directly work with Kafka implementation, but they work with the people who implement Kafka Streams at the ground level.

What should you already know

This book assumes that the reader is familiar with the basics of Java programming language. The source code and examples in this book are using Java 8, and I will be using Java 8 lambda syntax, so experience with lambda will be helpful.
Kafka Streams is a library that runs on Kafka. Having good fundamental knowledge of Kafka is essential to get the most out of Kafka Streams. I will touch base on the mandatory Kafka concepts for those who are new to Kafka, and you should be able to learn Kafka Streams that is the main subject of the book.
The book also assumes that you have some familiarity and experience in running and working on the Linux operating system.

Kafka and source code version

This book is based on Kafka Streams library available in Apache Kafka 2.0. All the source code and examples in this book are tested on Apache Kafka 2.0 open source distribution.
Some chapters of this book also make use of Confluent Open Source platform to teach and demonstrate functionalities that are only available in Confluent Platform such as prebuild connectors, KSQL and Schema Registry.

Table of Contents

About the Author
About the Book
  Who should read this book
  What should you already know
  Kafka and source code version
References and Resources

Chapter 1 - Dawn of Bigdata

Chapter 2 - Real-time Streams
    Conception of Events
    Event Streams
    Stream Processing
    Stream Processing Use Cases
      Incremental ETL
      Real-time Reporting
      Real-time Alerting
      Real-time Decision Making and ML
      Online Machine Learning and AI
    Real-time Data Preparation at Scale
    Advantages of Stream Processing

Chapter 3 - Streaming Concepts
    Detaching Fallacies
      Stream Processing Vs Analytics
      Stream Processing Vs ETL
      Stream Processing Vs Cluster Computing
      Stream Processing vs Batch Processing
    Attaching difficulties
      Time Domain
      Event time vs Processing time
      Time Window
        Tumbling Window
        Sliding Window
      Watermarks & Triggers
      Other Window Approaches
      Stateful Streams
      Stream Table Duality
      Exactly Once Processing

Chapter 4 - Why Apache Kafka
    Realising the Problem
      Summarizing the problem
    Solving the Problem
      Databases as an Alternative
      Challenges with Database
      Database Log Files
      Enter the Kafka

Chapter 5 - Why Messaging System
    Data Integration problem
      Decision Criteria
      Integration Alternatives
    Enter the Messaging System
    Kafka in the Data Ecosystem

Chapter 6 - How Kafka Works
     Kafka Broker
     Kafka Message Log
       Replication Factor
       Offset Index
     Kafka Cluster
     Work Distribution
       Partition Allocation
       Leader, Follower and ISR
        ISR - In Sync Replica
          Committed vs Uncommitted
          Minimum ISR size

Chapter 7 - Streaming into Kafka
     Retail Events - Example
     Streaming Alternatives
       Producer API
       Other Data Integration Tools
     Kafka Producer
       Language Choices
       Hello Producer
       Producer Scalability
       Producer Multithreading
     Producer Internals
       Message Serializers
     Advance Producers
       Idempotent Producer
       Transactional Producer
     Exception Handling

Installing Kafka Cluster
    Preparing VMs
    Configuring Zookeeper
    Configuring Brokers
    Testing Kafka Cluster
      List of Registered Brokers
      Create and List Topics
      Console Producer
      Console Consumer

Installing Kafka on Windows
    Installing JDK 1.8
    Configure and Test Kafka
      Configure Kafka for Windows 10
      Starting and Testing Kafka

Configuring IntelliJ IDEA for Kafka
    Installing Maven 3
    Installing IntelliJ IDEA
      Creating First IntelliJ IDEA Project
      Define Project Dependencies
      Configure Log4J2
      Create and Execute "Hello Kafka!"
      Integrate Kafka localhost in the IDE