June 16, 2022

#77 - Vitaliy Chiley (Cerebras)

1 hour 7 minutes

Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.

[00:00:00] Housekeeping

[00:01:08] Preamble

[00:01:50] Vitaliy Chiley Introduction

[00:03:11] Cerebrus architecture

[00:08:12] Memory management and FLOP utilisation

[00:18:01] Centralised vs decentralised compute architecture

[00:21:12] Sparsity

[00:23:47] Does Sparse NN imply Heterogeneous compute?

[00:29:21] Cost of distributed memory stores?

[00:31:01] Activation vs weight sparsity

[00:37:52] What constitutes a dead weight to be pruned?

[00:39:02] Is it still a saving if we have to choose between weight and activation sparsity?

[00:41:02] Cerebras is a cool place to work

[00:44:05] What is sparsity? Why do we need to start dense?

[00:46:36] Evolutionary algorithms on Cerebras?

[00:47:57] How can we start sparse? Google RIGL

[00:51:44] Inductive priors, why do we need them if we can start sparse?

[00:56:02] Why anthropomorphise inductive priors?

[01:02:13] Could Cerebras run a cyclic computational graph?

[01:03:16] Are NNs locality sensitive hashing tables?

References;

Rigging the Lottery: Making All Tickets Winners [RIGL]

https://arxiv.org/pdf/1911.11134.pdf

[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet

https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/

A Spline Theory of Deep Learning [Balestriero]

https://proceedings.mlr.press/v80/balestriero18b.html

...more

View all episodes

By Machine Learning Street Talk (MLST)

4.7

9090 ratings

June 16, 2022

#77 - Vitaliy Chiley (Cerebras)

1 hour 7 minutes

[00:00:00] Housekeeping

[00:01:08] Preamble

[00:01:50] Vitaliy Chiley Introduction

[00:03:11] Cerebrus architecture

[00:08:12] Memory management and FLOP utilisation

[00:18:01] Centralised vs decentralised compute architecture

[00:21:12] Sparsity

[00:23:47] Does Sparse NN imply Heterogeneous compute?

[00:29:21] Cost of distributed memory stores?

[00:31:01] Activation vs weight sparsity

[00:37:52] What constitutes a dead weight to be pruned?

[00:39:02] Is it still a saving if we have to choose between weight and activation sparsity?

[00:41:02] Cerebras is a cool place to work

[00:44:05] What is sparsity? Why do we need to start dense?

[00:46:36] Evolutionary algorithms on Cerebras?

[00:47:57] How can we start sparse? Google RIGL

[00:51:44] Inductive priors, why do we need them if we can start sparse?

[00:56:02] Why anthropomorphise inductive priors?

[01:02:13] Could Cerebras run a cyclic computational graph?

[01:03:16] Are NNs locality sensitive hashing tables?

References;

Rigging the Lottery: Making All Tickets Winners [RIGL]

https://arxiv.org/pdf/1911.11134.pdf

[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet

https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/

A Spline Theory of Deep Learning [Balestriero]

https://proceedings.mlr.press/v80/balestriero18b.html

...more

More shows like Machine Learning Street Talk (MLST)

View all

Data Skeptic

478 Listeners

a16z Podcast

1,100 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn

304 Listeners

NVIDIA AI Podcast

341 Listeners

Y Combinator Startup Podcast

236 Listeners

Practical AI

212 Listeners

ManifoldOne

92 Listeners

Google DeepMind: The Podcast

198 Listeners

Dwarkesh Podcast

502 Listeners

Big Technology Podcast

478 Listeners

No Priors: Artificial Intelligence | Technology | Startups

133 Listeners

This Day in AI Podcast

209 Listeners

AI + a16z

35 Listeners

Training Data

37 Listeners

Complex Systems with Patrick McKenzie (patio11)

133 Listeners

Share #77 - Vitaliy Chiley (Cerebras)

Sign up to save your podcasts

#77 - Vitaliy Chiley (Cerebras)

#77 - Vitaliy Chiley (Cerebras)

More shows like Machine Learning Street Talk (MLST)

Data Skeptic

a16z Podcast

Super Data Science: ML & AI Podcast with Jon Krohn

NVIDIA AI Podcast

Y Combinator Startup Podcast

Practical AI

ManifoldOne

Google DeepMind: The Podcast

Dwarkesh Podcast

Big Technology Podcast

No Priors: Artificial Intelligence | Technology | Startups

This Day in AI Podcast

AI + a16z

Training Data

Complex Systems with Patrick McKenzie (patio11)