Kafka Streams : Real-time Stream Processing!

Author : Prashant Kumar Pandey
Publisher : Learning Journal
Release Date : March 2019
Pages : 350 (Approx)


The book Kafka Streams: Real-time Stream Processing! helps you understand the stream processing in general and apply that skill to Kafka streams programming. This book is focusing mainly on the new generation of the Kafka Streams library available in the Apache Kafka 2.0.
The primary focus of this book is on Kafka Streams. However, the book also touches on the other Kafka capabilities and concepts that are necessary to grasp the Kafka Streams programming.
Apache Kafka is currently one of the most popular and necessary components of any Big Data solution. Initially conceived as a messaging queue, Kafka has quickly evolved into a full-fledged streaming platform. As a streaming platform, Kafka provides a highly scalable, fault tolerant, publish and subscribe pipeline of streams of events. With the addition of Kafka Streams library, it is now able to process the streams of events in real-time with millisecond responses to support a variety of business use cases. This book gives you a solid foundation to write modern real-time stream processing applications using Apache Kafka and Kafka Streams library.

Who should read this book

Kafka Streams: Real-time Stream Processing! is written for software engineers willing to develop stream processing application using Kafka streams library. I am also writing this book for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who don’t directly work with Kafka implementation, but they work with the people who implement Kafka Streams at the ground level.

What should you already know

This book assumes that the reader is familiar with the basics of Java programming language. The source code and examples in this book are using Java 8, and I will be using Java 8 lambda syntax, so experience with lambda will be helpful.
Kafka Streams is a library that runs on Kafka. Having good fundamental knowledge of Kafka is essential to get the most out of Kafka Streams. I will touch base on the mandatory Kafka concepts for those who are new to Kafka, and you should be able to learn Kafka Streams that is the main subject of the book.
The book also assumes that you have some familiarity and experience in running and working on the Linux operating system.

Kafka and source code version

This book is based on Kafka Streams library available in Apache Kafka 2.0. All the source code and examples in this book are tested on Apache Kafka 2.0 open source distribution.
Some chapters of this book also make use of Confluent Open Source platform to teach and demonstrate functionalities that are only available in Confluent Platform such as prebuild connectors, KSQL and Schema Registry.

Table of Contents

About the Author
About the Book
  Who should read this book
  What should you already know
  Kafka and source code version
References and Resources

Chapter 1 - Dawn of Bigdata

Chapter 2 - Real-time Streams
    Conception of Events
    Event Streams
    Stream Processing
    Stream Processing Use Cases
      Incremental ETL
      Real-time Reporting
      Real-time Alerting
      Real-time Decision Making and ML
      Online Machine Learning and AI
    Real-time Data Preparation at Scale
    Advantages of Stream Processing

Chapter 3 - Streaming Concepts
    Detaching Fallacies
      Stream Processing Vs Analytics
      Stream Processing Vs ETL
      Stream Processing Vs Cluster Computing
      Stream Processing vs Batch Processing
    Attaching difficulties
      Time Domain
      Event time vs Processing time
      Time Window
        Tumbling Window
        Sliding Window
      Watermarks & Triggers
      Other Window Approaches
      Stateful Streams
      Stream Table Duality
      Exactly Once Processing

Chapter 4 - Why Apache Kafka
    Realising the Problem
      Summarizing the problem
    Solving the Problem
      Databases as an Alternative
      Challenges with Database
      Database Log Files
      Enter the Kafka

Chapter 5 - Why Messaging System
    Data Integration problem
      Decision Criteria
      Integration Alternatives
    Enter the Messaging System
    Kafka in the Data Ecosystem

Chapter 6 - How Kafka Works
     Kafka Broker
     Kafka Message Log
       Replication Factor
       Offset Index
     Kafka Cluster
     Work Distribution
       Partition Allocation
       Leader, Follower and ISR
       Committed vs Uncommitted
       Minimum ISR size

Chapter 7 - Streaming into Kafka
     Retail Events - Example
     Streaming Alternatives
       Producer API
       Other Data Integration Tools
     Kafka Producer
       Hello Producer
       Producer Scalability
       Producer Multithreading
       Stock Data Provider - Example
     Producer Internals
       Message Serializer/DeSerializers
       Message Timestamp
       Topic Buffer
       Retries and Acks
     Advance Producers
       Idempotent Producer
       Transactional Producer

Chapter 8 - Producer Examples
     Custom Serializer
     Schema Registry
     Custom Partitioner
     Synchronous send API
     Producer Callback
     Micro Project

Chapter 9 - Kafka Consumers
     Kafka Consumer - Avro
     Kafka Consumer Groups
     Kafka Offsets and Consumer Positions
     More on Kafka Consumers

Chapter 10 - Kafka Streams API
    Kafka Streams Library
      Streams DSL
      Processor API
    Hello Streams
    Streams Application in a Nutshell

Chapter 11 - Creating Topology
    What is Topology
    Source Processor
    KStream Class
    POS Example
      Shipment Service
      Loyalty Management Service
      Trend Analytics
      Final Step
    Summing up
    Other Available Processors

Chapter 12 - Types and Serialization
    Reference Examples
    Working with Types
      JSON Schema to POJO
      Avro Schema to POJO
    Working with SerDes
      Json SerDes
        Generic Json Serializer
        JGeneric Json Deserializer
        Specific Json SerDes
        Challenges with Json SerDes
      Avro SerDes
        Avro Serializer/Deserializer
        Specific Avro SerDes
        Advantage of using Avro
    Schema Evolution

Chapter 13 - States and Stores
    What is State
    What is State Store
    Statefull Vs Stateless
    State Store Example
      Creating Topology
      Creating Store Builder
      Creating Value Transformer
      Creating Stateful Transformation
    State Store Changelog
    Caution with State Store
    Repartitioning KStream

Chapter 14 - Aggregates and Tables
    Streaming Aggregate - Example
      Creating KTable
      KTable Caching
      KTable Emit Rate
      Common Concepts
      KStream Aggregation
      Aggregation Challenges
      KTable Aggregattion

Chapter 15 - Joins and Windows
    Timestamps in Kafka
      Event Timestamp
      Ingestion Timestamp
      Processing Timestamp
    Timestamp Extractors
      Custom TimestampExtractor
      Using TimestampExtractor

Chapter 16 - Streams Architecture

Chapter 17 - Unit Testing Kafka Streams

Chapter 18 - Interactive Queries

Chapter 19 - Processor API

Chapter 20 - Introduction to KSQL

Installing Kafka Cluster
    Preparing VMs
    Configuring Zookeeper
    Configuring Brokers
    Testing Kafka Cluster
      List of Registered Brokers
      Create and List Topics
      Console Producer
      Console Consumer

Installing Kafka on Windows
    Installing JDK 1.8
    Configure and Test Kafka
      Configure Kafka for Windows 10
      Starting and Testing Kafka

Configuring IntelliJ IDEA for Kafka
    Installing Maven 3
    Installing IntelliJ IDEA
      Creating First IntelliJ IDEA Project
      Define Project Dependencies
      Configure Log4J2
      Create and Execute "Hello Kafka!"
      Integrate Kafka localhost in the IDE

Kafka Connect Basic Concepts

JDBC Connector Demo