Apache Kafka is a great tool. It is an event streaming platform that is capable to publish and subscribe to streams of events, store streams of events, and process streams of events. Basically, it’s all about “streams of events” making Kafka the best candidate for use-cases such as building real-time streaming data pipelines. But what if I need a near-real-time? What if I would like to work with micro-batches instead of streaming? While Spark Streaming is literally based on micro-batches, for Kafka we have to find some workaround.

Someone may ask, why would I need it? Well, there may be…

Alex Rosenblatt

Big Data and programming fan, but try to write on wide range of topics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store