
Sign up to save your podcasts
Or


Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.
[00:00:00] Housekeeping
[00:01:08] Preamble
[00:01:50] Vitaliy Chiley Introduction
[00:03:11] Cerebrus architecture
[00:08:12] Memory management and FLOP utilisation
[00:18:01] Centralised vs decentralised compute architecture
[00:21:12] Sparsity
[00:23:47] Does Sparse NN imply Heterogeneous compute?
[00:29:21] Cost of distributed memory stores?
[00:31:01] Activation vs weight sparsity
[00:37:52] What constitutes a dead weight to be pruned?
[00:39:02] Is it still a saving if we have to choose between weight and activation sparsity?
[00:41:02] Cerebras is a cool place to work
[00:44:05] What is sparsity? Why do we need to start dense?
[00:46:36] Evolutionary algorithms on Cerebras?
[00:47:57] How can we start sparse? Google RIGL
[00:51:44] Inductive priors, why do we need them if we can start sparse?
[00:56:02] Why anthropomorphise inductive priors?
[01:02:13] Could Cerebras run a cyclic computational graph?
[01:03:16] Are NNs locality sensitive hashing tables?
References;
Rigging the Lottery: Making All Tickets Winners [RIGL]
https://arxiv.org/pdf/1911.11134.pdf
[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet
https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/
A Spline Theory of Deep Learning [Balestriero]
https://proceedings.mlr.press/v80/balestriero18b.html
By Machine Learning Street Talk (MLST)4.7
9090 ratings
Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.
[00:00:00] Housekeeping
[00:01:08] Preamble
[00:01:50] Vitaliy Chiley Introduction
[00:03:11] Cerebrus architecture
[00:08:12] Memory management and FLOP utilisation
[00:18:01] Centralised vs decentralised compute architecture
[00:21:12] Sparsity
[00:23:47] Does Sparse NN imply Heterogeneous compute?
[00:29:21] Cost of distributed memory stores?
[00:31:01] Activation vs weight sparsity
[00:37:52] What constitutes a dead weight to be pruned?
[00:39:02] Is it still a saving if we have to choose between weight and activation sparsity?
[00:41:02] Cerebras is a cool place to work
[00:44:05] What is sparsity? Why do we need to start dense?
[00:46:36] Evolutionary algorithms on Cerebras?
[00:47:57] How can we start sparse? Google RIGL
[00:51:44] Inductive priors, why do we need them if we can start sparse?
[00:56:02] Why anthropomorphise inductive priors?
[01:02:13] Could Cerebras run a cyclic computational graph?
[01:03:16] Are NNs locality sensitive hashing tables?
References;
Rigging the Lottery: Making All Tickets Winners [RIGL]
https://arxiv.org/pdf/1911.11134.pdf
[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet
https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/
A Spline Theory of Deep Learning [Balestriero]
https://proceedings.mlr.press/v80/balestriero18b.html

479 Listeners

1,093 Listeners

302 Listeners

334 Listeners

227 Listeners

203 Listeners

95 Listeners

208 Listeners

517 Listeners

500 Listeners

130 Listeners

228 Listeners

36 Listeners

42 Listeners

134 Listeners