Talking Data

Big Data Normalization


Listen Later

Discussing a case study which describes a novel technique for efficiently storing and utilizing large amounts of data in massively parallel processing (MPP) databases. The technique, known as Anchor Modeling, is implemented in the HP Vertica database and is employed by Avito, a Russian e-commerce platform, to process terabytes of data for real-time analytics. The paper argues that traditional normalization techniques are inadequate for Big Data scenarios, highlighting the benefits of Anchor Modeling in terms of scalability, performance, and ease of data maintenance. The authors provide theoretical estimates and practical verification through experiments comparing the performance of Anchor Modeling with a traditional 3NF model, demonstrating its effectiveness in handling complex ad-hoc queries.

...more
View all episodesView all episodes
Download on the App Store

Talking DataBy Lars Rönnbäck