
Sign up to save your podcasts
Or


Conviva operates at a massive scale, delivering outcome-based intelligence for digital businesses through real-time and batch data processing. As new use cases emerged, the team needed a way to extend a streaming-first architecture without rebuilding core systems.
In this episode, Han Zhang joins us to explain how Conviva uses Apache Airflow as the orchestration backbone for its batch workloads, how the control plane is designed and what trade-offs shaped their platform decisions.
Key Takeaways:
00:00 Introduction.
01:17 Large-scale data platforms require low-latency processing capabilities.
02:08 Batch workloads can complement streaming pipelines for additional use cases.
03:45 An orchestration framework can act as the core coordination layer.
06:12 Batch processing enables workloads that streaming alone cannot support.
08:50 Ecosystem maturity and observability are key orchestration considerations.
10:15 Built-in run history and logs make failures easier to diagnose.
14:20 Platform users can monitor workflows without managing orchestration logic.
17:08 Identity, secrets and scheduling present ongoing optimization challenges.
19:59 Configuration history and change visibility improve operational reliability.
Resources Mentioned:
Han Zhang
https://www.linkedin.com/in/zhanghan177
Conviva | Website
http://www.conviva.com
Apache Airflow
https://airflow.apache.org/
Celery
https://docs.celeryq.dev/
Temporal
https://temporal.io/
Kubernetes
https://kubernetes.io/
LDAP
https://ldap.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow
By Astronomer5
2020 ratings
Conviva operates at a massive scale, delivering outcome-based intelligence for digital businesses through real-time and batch data processing. As new use cases emerged, the team needed a way to extend a streaming-first architecture without rebuilding core systems.
In this episode, Han Zhang joins us to explain how Conviva uses Apache Airflow as the orchestration backbone for its batch workloads, how the control plane is designed and what trade-offs shaped their platform decisions.
Key Takeaways:
00:00 Introduction.
01:17 Large-scale data platforms require low-latency processing capabilities.
02:08 Batch workloads can complement streaming pipelines for additional use cases.
03:45 An orchestration framework can act as the core coordination layer.
06:12 Batch processing enables workloads that streaming alone cannot support.
08:50 Ecosystem maturity and observability are key orchestration considerations.
10:15 Built-in run history and logs make failures easier to diagnose.
14:20 Platform users can monitor workflows without managing orchestration logic.
17:08 Identity, secrets and scheduling present ongoing optimization challenges.
19:59 Configuration history and change visibility improve operational reliability.
Resources Mentioned:
Han Zhang
https://www.linkedin.com/in/zhanghan177
Conviva | Website
http://www.conviva.com
Apache Airflow
https://airflow.apache.org/
Celery
https://docs.celeryq.dev/
Temporal
https://temporal.io/
Kubernetes
https://kubernetes.io/
LDAP
https://ldap.com/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow

32,307 Listeners

230,157 Listeners

536 Listeners

623 Listeners

147 Listeners

3,992 Listeners

25 Listeners

140 Listeners

10,276 Listeners

59,000 Listeners

5,536 Listeners

13 Listeners

8 Listeners

24 Listeners

151 Listeners