
Sign up to save your podcasts
Or
In this episode of "How AI is Built", data architect Anjan Banerjee provides an in-depth look at the world of data architecture and building complex AI and data systems. Anjan breaks down the basics using simple analogies, explaining how data architecture involves sorting, cleaning, and painting a picture with data, much like organizing Lego bricks to build a structure.
Summary by Section
Introduction
Sources and Tools
Airflow and Orchestration
AI and Data Processing
Data Lakes and Storage
Data Quality and Standardization
Hot Takes and Wishes
Anjan Banerjee:
Nicolay Gerold:
00:00 Understanding Data Architecture
12:36 Choosing the Right Tools
20:36 The Benefits of Serverless Functions
21:34 Integrating AI in Data Acquisition
24:31 The Trend Towards Single Node Engines
26:51 Choosing the Right Database Management System and Storage
29:45 Adding Additional Storage Components
32:35 Reducing Human Errors for Better Data Quality
39:07 Overhyped and Underutilized Tools
Data architecture, AI, data systems, data sources, data extraction, data storage, multi-modal storage engines, data orchestration, Airflow, edge computing, batch processing, data lakes, Delta Lake, Iceberg, data quality, standardization, poka-yoke, compliance, entity resolution
In this episode of "How AI is Built", data architect Anjan Banerjee provides an in-depth look at the world of data architecture and building complex AI and data systems. Anjan breaks down the basics using simple analogies, explaining how data architecture involves sorting, cleaning, and painting a picture with data, much like organizing Lego bricks to build a structure.
Summary by Section
Introduction
Sources and Tools
Airflow and Orchestration
AI and Data Processing
Data Lakes and Storage
Data Quality and Standardization
Hot Takes and Wishes
Anjan Banerjee:
Nicolay Gerold:
00:00 Understanding Data Architecture
12:36 Choosing the Right Tools
20:36 The Benefits of Serverless Functions
21:34 Integrating AI in Data Acquisition
24:31 The Trend Towards Single Node Engines
26:51 Choosing the Right Database Management System and Storage
29:45 Adding Additional Storage Components
32:35 Reducing Human Errors for Better Data Quality
39:07 Overhyped and Underutilized Tools
Data architecture, AI, data systems, data sources, data extraction, data storage, multi-modal storage engines, data orchestration, Airflow, edge computing, batch processing, data lakes, Delta Lake, Iceberg, data quality, standardization, poka-yoke, compliance, entity resolution