August 30, 2022

Capacity Planning Your Apache Kafka Cluster

1 hour 1 minute

How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance?

When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.

Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup. A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures.

Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars).

Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase.

Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts.

Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective.

EPISODE LINKS

Capacity Planning and Sizing for Kafka Streams
Tales from the Frontline of Apache Kafka DevOps
Watch the video version of this podcast
Kris Jenkins’ Twitter
Streaming Audio Playlist
Join the Confluent Community
Learn more on Confluent Developer
Use PODCAST100 to get $100 of free Cloud usage (details)

SEASON 2
Hosted by Tim Berglund, Adi Polak and Viktor Gamov
Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed
Music by Coastal Kites
Artwork by Phil Vo

🎧 Subscribe to Confluent Developer wherever you listen to podcasts.
▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes.
👍 If you enjoyed this, please leave us a rating.
🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

...more

View all episodes

By Confluent

4.8

4343 ratings