Data platforms are moving from batch-first pipelines to near real-time systems where orchestration, observability, scalability and governance all have to work together.
In this episode, Arun Karthik, Director, Data Solutions Engineering at Condé Nast Technology Lab, joins us to share how data engineering evolves from relational databases and ETL into distributed processing, modern orchestration with Apache Airflow and managed Airflow with Astronomer.
Key Takeaways:
00:00 Introduction.
02:13 Early data systems rely heavily on relational databases and batch-oriented processing models.
07:01 Scheduling requirements evolve beyond fixed time windows as dependencies increase.
10:14 Ease of use and developer experience influence adoption of orchestration frameworks.
13:22 Operating open source orchestration tools requires ongoing engineering effort.
14:45 Managed services help teams reduce infrastructure and maintenance responsibilities.
17:27 Observability improves confidence in pipeline execution and system health.
19:12 Governance considerations grow in importance as data platforms mature.
20:46 Building data systems requires balancing speed, reliability and long-term sustainability.
Resources Mentioned:
Arun Karthik
https://www.linkedin.com/in/earunkarthik/
Condé Nast Technology Lab | LinkedIn
https://www.linkedin.com/company/conde-nast-technology-lab/
Condé Nast Technology Lab | Website
https://www.condenast.com/
Apache Airflow
https://airflow.apache.org/
Astronomer
https://www.astronomer.io/
Apache Spark
https://spark.apache.org/
Apache Hadoop
https://hadoop.apache.org/
Jenkins
https://www.jenkins.io/
dbt Labs
https://www.getdbt.com/product/what-is-dbt
Amazon Web Services
https://aws.amazon.com/free/?trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&trk=54026797-7540-48d8-9f6b-0db2c3a0040c&sc_channel=ps&ef_id=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE:G:s&s_kwcid=AL!4422!3!785574063524!e!!g!!amazon%20web%20services!23291338728!189486861095&gad_campaignid=23291338728&gbraid=0AAAAADjHtp813XNbg7azDj5QMwJPbGNqZ&gclid=CjwKCAiAmp3LBhAkEiwAJM2JUKIc3E2I-hDlF6fRWgZn5n2-RWX-kEDAVApJYd88wwlsiyosV71VixoCmRoQAvD_BwE
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow