Datacast

Episode 58: Deep Learning Meets Distributed Systems with Jim Dowling


Listen Later

Show Notes
  • (1:56) Jim went over his education at Trinity College Dublin in the late 90s/early 2000s, where he got early exposure to academic research in distributed systems.
  • (4:26) Jim discussed his research focused on dynamic software architecture, particularly the K-Component model that enables individual components to adapt to a changing environment.
  • (5:37) Jim explained his research on collaborative reinforcement learning that enables groups of reinforcement learning agents to solve online optimization problems in dynamic systems.
  • (9:03) Jim recalled his time as a Senior Consultant for MySQL.
  • (9:52) Jim shared the initiatives at the RISE Research Institute of Sweden, in which he has been a researcher since 2007.
  • (13:16) Jim dissected his peer-to-peer systems research at RISE, including theoretical results for search algorithm and walk topology.
  • (15:30) Jim went over challenges building peer-to-peer live streaming systems at RISE, such as GradientTV and Glive.
  • (18:18) Jim provided an overview of research activities at the Division of Software and Computer Systems at the School of Electrical Engineering and Computer Science at KTH Royal Institute of Technology.
  • (19:04) Jim has taught courses on Distributed Systems and Deep Learning on Big Data at KTH Royal Institute of Technology.
  • (22:20) Jim unpacked his O’Reilly article in 2017 called “Distributed TensorFlow,” which includes the deep learning hierarchy of scale.
  • (29:47) Jim discussed the development of HopsFS, a next-generation distribution of the Hadoop Distributed File System (HDFS) that replaces its single-node in-memory metadata service with a distributed metadata service built on a NewSQL database.
  • (34:17) Jim rationalized the intention to commercialize HopsFS and built Hopsworks, an user-friendly data science platform for Hops.
  • (36:56) Jim explored the relative benefits of public research money and VC-funded money.
  • (41:48) Jim unpacked the key ideas in his post “Feature Store: The Missing Data Layer in ML Pipelines.”
  • (47:31) Jim dissected the critical design that enables the Hopsworks feature store to refactor a monolithic end-to-end ML pipeline into separate feature engineering and model training pipelines.
  • (52:49) Jim explained why data warehouses are insufficient for machine learning pipelines and why a feature store is needed instead.
  • (57:59) Jim discussed prioritizing the product roadmap for the Hopswork platform.
  • (01:00:25) Jim hinted at what’s on the 2021 roadmap for Hopswork.
  • (01:03:22) Jim recalled the challenges of getting early customers for Hopsworks.
  • (01:04:30) Jim intuited the differences and similarities between being a professor and being a founder.
  • (01:07:00) Jim discussed worrying trends in the European Tech ecosystem and the role that Logical Clocks will play in the long run.
  • (01:13:37) Closing segment.
Jim’s Contact Info
  • Logical Clocks
  • Twitter
  • LinkedIn
  • Google Scholar
  • Medium
  • ACM Profile
  • GitHub
Mentioned Content

Research Papers

  • “The K-Component Architecture Meta-Model for Self-Adaptive Software” (2001)
  • “Dynamic Software Evolution and The K-Component Model” (2001)
  • “Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing” (2005)
  • “Building Autonomic Systems Using Collaborative Reinforcement Learning” (2006)
  • “Improving ICE Service Selection in a P2P System using the Gradient Topology” (2007)
  • “gradienTv: Market-Based P2P Live Media Streaming on the Gradient Overlay” (2010)
  • “GLive: The Gradient Overlay as a Market Maker for Mesh-Based P2P Live Streaming” (2011)
  • “HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases” (2016)
  • “Scaling HDFS to More Than 1 Million Operations Per Second with HopsFS” (2017)
  • “Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata” (2017)
  • “Implicit Provenance for Machine Learning Artifacts” (2020)
  • “Time Travel and Provenance for Machine Learning Pipelines” (2020)
  • “Maggy: Scalable Asynchronous Parallel Hyperparameter Search” (2020)

Articles

  • “Distributed TensorFlow” (2017)
  • “Reflections on AWS’s S3 Architectural Flaws” (2017)
  • “Meet Michelangelo: Uber’s Machine Learning Platform” (2017)
  • “Feature Store: The Missing Data Layer in ML Pipelines” (2018)
  • “What Is Wrong With European Tech Companies?” (2019)
  • “ROI of Feature Stores” (2020)
  • “MLOps With A Feature Store” (2020)
  • “ML Engineer Guide: Feature Store vs. Data Warehouse” (2020)
  • “Unifying Single-Host and Distributed Machine Learning with Maggy” (2020)
  • “How We Secure Your Data With Hopsworks” (2020)
  • “One Function Is All You Need For ML Experiments” (2020)
  • “Hopsworks: World’s Only Cloud-Native Feature Store, now available on AWS and Azure” (2020)
  • “Hopsworks 2.0: The Next Generation Platform for Data-Intensive AI with a Feature Store” (2020)
  • “Hopsworks Feature Store API 2.0, a new paradigm” (2020)
  • Swedish startup Logical Clocks takes a crack at scaling MySQL backend for live recommendations” (2021)

Projects

  • Apache Hudi (by Uber)
  • Delta Lake (by Databricks)
  • Apache Iceberg (by Netflix)
  • MLflow (by Databricks)
  • Apache Flink (by The Apache Foundation)

People

  • Leslie Lamport (The Father of Distributed Computing)
  • Jeff Dean (Creator of MapReduce and TensorFlow, Lead of Google AI)
  • Richard Sutton (The Father of Reinforcement Learning — who wrote “The Bitter Lesson”)

Programming Books

  • C++ Programming Languages books (by Scott Meyers)
  • “Effective Java” (by Joshua Bloch)
  • “Programming Erlang” (by Joe Armstrong)
  • “Concepts, Techniques, and Models of Computer Programming” (by Peter Van Roy and Seif Haridi)


This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit datacast.substack.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

DatacastBy James Le