The InfoQ Podcast

Gunnar Morling on Change Data Capture and Debezium

01.17.2020 - By InfoQPlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

Today, on The InfoQ Podcast, Wes Reisz talks with Gunnar Morling. Gunnar is a software engineer at RedHat and leads the Debezium project. Debezium is an open-source distributed platform for change data capture (CDC). On the show, the two discuss the project and many of its use cases. Additionally, topics covered on the podcast include bootstrapping, configuration, challenges, debugging, and operational modes. The show wraps with long term strategic goals for the project.

Why listen to this podcast:

- CDC is a set of software design patterns used to react to changing data in a data store. Used for things like internal changelogs, integrations, replication, and event streaming, CDC can be implemented leveraging queries or against the DB transaction log. Debezium leverages the transaction log to implement CDC and is extremely performant.

- Debezium has mature source and sink connectors for MySQL, SQL Server, and MongoDB. In addition, there are Incumbating connectors for Cassandra, Oracle, and DB2. Community sink connectors have been created for ElasticSearch.

- In a standard deployment, Debezium leverages a Kafka cluster by deploying connectors into Kafka Connect. The connectors establish a connection to the source database and then write changes to a Kafka topic.

- Debezium can be run in embedded mode. Embedded mode imports Java library into your own project and leverages callbacks for change events. The library approach allows Debezium implementations against other tools like AWS Kinesis or Azure's Event Hub. Going forward, there are plans to make a ready-made Debezium runtime.

- Out of the box, Debezium has a one-to-one mapping between tables and Kafka topic queues. The default approach exposes the internal table structure to the outside. One approach to address exposing DB internals is to leverage the Outbox Pattern. The Outbox Pattern uses a separate outbox table as a source. Inserts into your normal business logic tables also make writes to the outbox. Change events are then published to Kafka from the outbox source table.

More on this: Quick scan our curated show notes on InfoQ https://bit.ly/3737GZB

You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq

Subscribe: www.youtube.com/infoq

Like InfoQ on Facebook: bit.ly/2jmlyG8

Follow on Twitter: twitter.com/InfoQ

Follow on LinkedIn: www.linkedin.com/company/infoq

Check the landing page on InfoQ: https://bit.ly/3737GZB

More episodes from The InfoQ Podcast