Querying 100s of petabytes of data demands optimized query speed specifically when data accumulates over time. We have to ensure that the queries remain efficient because over time you may end up with a lot of small files and your data might not be optimally organized.
In this video, Dipankar will cover:
Apache Iceberg table format
Problems in the data lake: small files, unorganized files
Techniques such as: partitioning, compaction, metrics filtering
Overlapping metrics problem
Solving it using sorting, Z-order clustering
See all upcoming episodes: https://www.dremio.com/gnarly-data-wa...
Connect with us!
Twitter: https://bit.ly/30pcpE1
LinkedIn: https://bit.ly/2PoqsDq
Facebook: https://bit.ly/2BV881V
Community Forum: https://bit.ly/2ELXT0W
Github: https://bit.ly/3go4dcM
Blog: https://bit.ly/2DgyR9B
Questions?: https://bit.ly/30oi8tX
Website: https://bit.ly/2XmtEnN
#datalakehouse #analytics #datawarehouse #datalake #opendatalakehouse #gnarlydatawaves #apacheiceberg #dremio #dremioartic #datamesh #metadata #modernization #datasharing #datagovernance #ETL #datasilos #datagrowth #selfservice #compliance #arctic #dataascode #branches #tags #optimized #automates #datamovement #zorder #clustering #metrics #filtering #partitioning #sorting #tableformat