Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Top 6 Worst Apache Kafka JIRA Bugs


Listen Later

Entomophiliac, Anna McDonald (Principal Customer Success Technical Architect, Confluent) has seen her fair share of Apache Kafka® bugs. For her annual holiday roundup of the most noteworthy Kafka bugs, Anna tells Kris Jenkins about some of the scariest, most surprising, and most enlightening corner cases that make you ask, “Ah, so that’s how it really works?”

She shares a lot of interesting details about how batching works, the replication protocol, how Kafka’s networking stack dances with Linux’s one, and which is the most important Scala class to read, if you’re only going to read one.

In particular, Anna gives Kris details about a bug that he’s been thinking about lately – sticky partitioner (KAFKA-10888). When a Kafka producer sends several records to the same partition at around the same time, the partition can get overloaded. As a result, if too many records get processed at once, they can get stuck causing an unbalanced workload. Anna goes on to explain that the fix required keeping track of the number of offsets/messages written to each partition, and then batching to force more balanced distributions.

She found another bug that occurs when Kafka server triggers TCP Congestion Control in some conditions (KAFKA-9648). Anna explains that when Kafka server restarts and then executes the preferred replica leader, lots of replica leaders trigger cluster metadata updates. Then, all clients establish a server connection at the same time that lots TCP requests are waiting in the TCP sync queue.

The third bug she talks about (KAFKA-9211), may cause TCP delays after upgrading…. Oh, that’s a nasty one. She goes on to tell Kris about a rare bug (KAFKA-12686) in Partition.scala where there’s a race condition between the handling of an AlterIsrResponse and a LeaderAndIsrRequest. This rare scenario involves the delay of AlterIsrResponse when lots of ISR and leadership changes occur due to broker restarts.

Bugs five (KAFKA-12964) and six (KAFKA-14334) are no better, but you’ll have to plug in your headphones and listen in to explore the ghoulish adventures of Anna McDonald as she gives a nightmarish peek into her world of JIRA bugs. It’s just what you might need this holiday season!

EPISODE LINKS

  • KAFKA-10888: Sticky partition leads to uneven product msg, resulting in abnormal delays in some partitions
  • KAFKA-9648: Add configuration to adjust listen backlog size for Acceptor
  • KAFKA-9211: Kafka upgrade 2.3.0 may cause tcp delay ack(Congestion Control)
  • KAFKA-12686: Race condition in AlterIsr response handling
  • KAFKA-12964: Corrupt segment recovery can delete new producer state snapshots
  • KAFKA-14334: DelayedFetch purgatory not completed when appending as follower

SEASON 2
Hosted by Tim Berglund, Adi Polak and Viktor Gamov
Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed
Music by Coastal Kites
Artwork by Phil Vo

  • 🎧 Subscribe to Confluent Developer wherever you listen to podcasts.
  • ▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes.
  • 👍 If you enjoyed this, please leave us a rating.
  • 🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.
...more
View all episodesView all episodes
Download on the App Store

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor GamovBy Confluent

  • 4.8
  • 4.8
  • 4.8
  • 4.8
  • 4.8

4.8

43 ratings


More shows like Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

View all
Software Engineering Radio by se-radio@computer.org

Software Engineering Radio

271 Listeners

Hanselminutes with Scott Hanselman by Scott Hanselman

Hanselminutes with Scott Hanselman

383 Listeners

The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

289 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

626 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

585 Listeners

Soft Skills Engineering by Jamison Dance and Dave Smith

Soft Skills Engineering

288 Listeners

Thoughtworks Technology Podcast by Thoughtworks

Thoughtworks Technology Podcast

43 Listeners

Python Bytes by Michael Kennedy and Brian Okken

Python Bytes

215 Listeners

Practical AI by Practical AI LLC

Practical AI

209 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

203 Listeners

The Real Python Podcast by Real Python

The Real Python Podcast

142 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

503 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

493 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

608 Listeners

Life Is But A Stream by Confluent

Life Is But A Stream

6 Listeners