O'Reilly Data Show Podcast

In the age of AI, fundamental value resides in data

01.03.2019 - By O'Reilly MediaPlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

In this episode of the Data Show, I spoke with Haoyuan Li, CEO and founder of Alluxio, a startup commercializing the open source project with the same name (full disclosure: I’m an advisor to Alluxio). Our discussion focuses on the state of Alluxio (the open source project that has roots in UC Berkeley’s AMPLab), specifically emerging use cases here and in China. Given the large-scale use in China, I also wanted to get Li’s take on the state of data and AI technologies in Beijing and other parts of China.

Here are some highlights from our conversation:

A much needed layer between compute and storage in a world with disparate storage systems

This new layer, which we call a virtual distributed file system, sits in the middle between the compute and storage layers. This new layer virtualizes data from different storage systems and presents a unified API with a global namespace for the data-driven applications to interact with all of the data in the enterprise environment.

AI and machine learning applications

One key reason people use an object store is that it is cheap. Per gigabyte or per terabyte, it’s cheaper than other solutions in a market,…but performance is not as good. And from that perspective, by putting open source Alluxio on top of that, that improves performance from Alluxio’s caching functionality. On top of that, in many cases, machine learning libraries cannot directly talk with object stores, and Alluxio can also serve as a translation layer.

Adoption in China

Things are moving very fast in that region. People are eager to adopt new technology, particularly for AI and big data. Some are users we know very quickly boosted their Alluxio deployments to hundreds of nodes or even thousands of nodes. It’s amazing to see how fast they can adapt.

… Of the top 10 internet companies in China, nine are using open source Alluxio in production today. All nine of them have big data and AI use cases for Alluxio. … I also travel back and forth between these two regions quite often, and every time I go there, I see more use cases, more applications, and more innovation.

Related resources:

Michael Franklin on the lasting legacy of AMPLab

Jason Dai on why “Companies in China are moving quickly to embrace AI technologies”

Kai-Fu Lee on “China: AI superpower”

Andrew Feldman on why “Specialized hardware for deep learning will unleash innovation”

Greg Diamos on “How big compute is powering the deep learning rocket ship”

Tim Kraska on “How machine learning will accelerate data management systems”

More episodes from O'Reilly Data Show Podcast