AWS re:Invent 2019

CMP304-R1: AWS infrastructure for large-scale training at Facebook AI


Listen Later

In this session, the Facebook AI team discusses its major machine learning models and workloads and the infrastructure challenges it faced with large-scale distributed training. They share details of how they tested these ML workloads on AWS infrastructure and the results of this benchmarking. Then we discuss how the deep breadth of AWS infrastructure for ML workloads in compute, networking, and storage helps address large-scale ML challenges. Specifically, we dive deep into the AWS machine learning stack to choose the right Amazon EC2 platform to fit your ML workload while leveraging 100 Gbps networking and high-performance file systems to efficiently scale from a single GPU to hundreds or thousands.
...more
View all episodesView all episodes
Download on the App Store

AWS re:Invent 2019By AWS

  • 4.3
  • 4.3
  • 4.3
  • 4.3
  • 4.3

4.3

6 ratings


More shows like AWS re:Invent 2019

View all
AWS Podcast by Amazon Web Services

AWS Podcast

200 Listeners