Share Machine Learning Street Talk (MLST)
Share to email
Share to Facebook
Share to X
By Machine Learning Street Talk (MLST)
4.8
7070 ratings
The podcast currently has 173 episodes available.
Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Genie's approach to learning interactive environments, balancing compression and fidelity.
The use of latent action models and VQE models for video processing and tokenization.
Challenges in maintaining action consistency across frames and integrating text-to-image models.
Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics.
The discussion also explored broader implications and applications:
The potential impact of AI video generation on content creation jobs.
Applications of Genie in game generation and robotics.
The use of foundation models in robotics and the differences between internet video data and specialized robotics data.
Challenges in mapping AI-generated actions to real-world robotic actions.
Ashley Edwards: https://ashedwards.github.io/
TOC (*) are best bits
00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations *
00:02:26 2. Genie's Architecture: Latent action, VQE, video processing *
00:05:06 3. Genie's Constraints: Frame consistency & image model integration
00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods
00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects
00:11:39 6. Model Scaling: Training data impact & computational trade-offs
00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges *
00:16:16 8. Robotics Foundation Models: Action space & data considerations *
00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos
00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety
00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies
Refs:
1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01]
https://arxiv.org/abs/2402.15391
2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43]
https://arxiv.org/abs/1711.00937
3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37]
https://arxiv.org/abs/1706.08500
4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02]
https://arxiv.org/abs/1806.00035
5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14]
https://arxiv.org/abs/2010.11929
6. Genie (robotics foundation models) / Google DeepMind [17:34]
https://deepmind.google/research/publications/60474/
7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38]
https://ai.stanford.edu/~cbfinn/
8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58]
https://arxiv.org/abs/1707.03374
9. Waymo's autonomous driving technology / Waymo [22:38]
https://waymo.com/
10. Gen3 model release by Runway / Runway [23:48]
https://runwayml.com/
11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43]
https://arxiv.org/abs/2207.12598
Saurabh Baji discusses Cohere's approach to developing and deploying large language models (LLMs) for enterprise use.
* Cohere focuses on pragmatic, efficient models tailored for business applications rather than pursuing the largest possible models.
* They offer flexible deployment options, from cloud services to on-premises installations, to meet diverse enterprise needs.
* Retrieval-augmented generation (RAG) is highlighted as a critical capability, allowing models to leverage enterprise data securely.
* Cohere emphasizes model customization, fine-tuning, and tools like reranking to optimize performance for specific use cases.
* The company has seen significant growth, transitioning from developer-focused to enterprise-oriented services.
* Major customers like Oracle, Fujitsu, and TD Bank are using Cohere's models across various applications, from HR to finance.
* Baji predicts a surge in enterprise AI adoption over the next 12-18 months as more companies move from experimentation to production.
* He emphasizes the importance of trust, security, and verifiability in enterprise AI applications.
The interview provides insights into Cohere's strategy, technology, and vision for the future of enterprise AI adoption.
https://www.linkedin.com/in/saurabhbaji/
https://x.com/sbaji
https://cohere.com/
https://cohere.com/business
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
TOC (*) are best bits
00:00:00 1. Introduction and Background
00:04:24 2. Cloud Infrastructure and LLM Optimization
00:06:43 2.1 Model deployment and fine-tuning strategies *
00:09:37 3. Enterprise AI Deployment Strategies
00:11:10 3.1 Retrieval-augmented generation in enterprise environments *
00:13:40 3.2 Standardization vs. customization in cloud services *
00:18:20 4. AI Model Evaluation and Deployment
00:18:20 4.1 Comprehensive evaluation frameworks *
00:21:20 4.2 Key components of AI model stacks *
00:25:50 5. Retrieval Augmented Generation (RAG) in Enterprise
00:32:10 5.1 Pragmatic approach to RAG implementation *
00:33:45 6. AI Agents and Tool Integration
00:33:45 6.1 Leveraging tools for AI insights *
00:35:30 6.2 Agent-based AI systems and diagnostics *
00:42:55 7. AI Transparency and Reasoning Capabilities
00:49:10 8. AI Model Training and Customization
00:57:10 9. Enterprise AI Model Management
01:02:10 9.1 Managing AI model versions for enterprise customers *
01:04:30 9.2 Future of language model programming *
01:06:10 10. AI-Driven Software Development
01:06:10 10.1 AI bridging human expression and task achievement *
01:08:00 10.2 AI-driven virtual app fabrics in enterprise *
01:13:33 11. Future of AI and Enterprise Applications
01:21:55 12. Cohere's Customers and Use Cases
01:21:55 12.1 Cohere's growth and enterprise partnerships *
01:27:14 12.2 Diverse customers using generative AI *
01:27:50 12.3 Industry adaptation to generative AI *
01:29:00 13. Technical Advantages of Cohere Models
01:29:00 13.1 Handling large context windows *
01:29:40 13.2 Low latency impact on developer productivity *
Disclaimer: This is the fifth video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. Filmed in Seattle in Aug 2024.
David Hanson, CEO of Hanson Robotics and creator of the humanoid robot Sofia, explores the intersection of artificial intelligence, ethics, and human potential. In this thought-provoking interview, Hanson discusses his vision for developing AI systems that embody the best aspects of humanity while pushing beyond our current limitations, aiming to achieve what he calls "super wisdom."
YT version: https://youtu.be/LFCIEhlsozU
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
The interview with David Hanson covers:
The importance of incorporating biological drives and compassion into AI systems
Hanson's concept of "existential pattern ethics" as a basis for AI morality
The potential for AI to enhance human intelligence and wisdom
Challenges in developing artificial general intelligence (AGI)
The need to democratize AI technologies globally
Potential future advancements in human-AI integration and their societal impacts
Concerns about technological augmentation exacerbating inequality
The role of ethics in guiding AI development and deployment
Hanson advocates for creating AI systems that embody the best aspects of humanity while surpassing current human limitations, aiming for "super wisdom" rather than just artificial super intelligence.
David Hanson:
https://www.hansonrobotics.com/david-hanson/
https://www.youtube.com/watch?v=9u1O954cMmE
TOC
1. Introduction and Background [00:00:00]
1.1. David Hanson's interdisciplinary background [0:01:49]
1.2. Introduction to Sofia, the realistic robot [0:03:27]
2. Human Cognition and AI [0:03:50]
2.1. Importance of social interaction in cognition [0:03:50]
2.2. Compassion as distinguishing factor [0:05:55]
2.3. AI augmenting human intelligence [0:09:54]
3. Developing Human-like AI [0:13:17]
3.1. Incorporating biological drives in AI [0:13:17]
3.2. Creating AI with agency [0:20:34]
3.3. Implementing flexible desires in AI [0:23:23]
4. Ethics and Morality in AI [0:27:53]
4.1. Enhancing humanity through AI [0:27:53]
4.2. Existential pattern ethics [0:30:14]
4.3. Expanding morality beyond restrictions [0:35:35]
5. Societal Impact of AI [0:38:07]
5.1. AI adoption and integration [0:38:07]
5.2. Democratizing AI technologies [0:38:32]
5.3. Human-AI integration and identity [0:43:37]
6. Future Considerations [0:50:03]
6.1. Technological augmentation and inequality [0:50:03]
6.2. Emerging technologies for mental health [0:50:32]
6.3. Corporate ethics in AI development [0:52:26]
This was filmed at AGI-24
David Spivak, a mathematician known for his work in category theory, discusses a wide range of topics related to intelligence, creativity, and the nature of knowledge. He explains category theory in simple terms and explores how it relates to understanding complex systems and relationships.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
We discuss abstract concepts like collective intelligence, the importance of embodiment in understanding the world, and how we acquire and process knowledge. Spivak shares his thoughts on creativity, discussing where it comes from and how it might be modeled mathematically.
A significant portion of the discussion focuses on the impact of artificial intelligence on human thinking and its potential role in the evolution of intelligence. Spivak also touches on the importance of language, particularly written language, in transmitting knowledge and shaping our understanding of the world.
David Spivak
http://www.dspivak.net/
TOC:
00:00:00 Introduction to category theory and functors
00:04:40 Collective intelligence and sense-making
00:09:54 Embodiment and physical concepts in knowledge acquisition
00:16:23 Creativity, open-endedness, and AI's impact on thinking
00:25:46 Modeling creativity and the evolution of intelligence
00:36:04 Evolution, optimization, and the significance of AI
00:44:14 Written language and its impact on knowledge transmission
REFS:
Mike Levin's work
https://scholar.google.com/citations?user=luouyakAAAAJ&hl=en
Eric Smith's videos on complexity and early life
https://www.youtube.com/watch?v=SpJZw-68QyE
Richard Dawkins' book "The Selfish Gene"
https://amzn.to/3X73X8w
Carl Sagan's statement about the cosmos knowing itself
https://amzn.to/3XhPruK
Herbert Simon's concept of "satisficing"
https://plato.stanford.edu/entries/bounded-rationality/
DeepMind paper on open-ended systems
https://arxiv.org/abs/2406.04268
Karl Friston's work on active inference
https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind
MIT category theory lectures by David Spivak (available on the Topos Institute channel)
https://www.youtube.com/watch?v=UusLtx9fIjs
Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.
YT version: https://youtu.be/DP454c1K_vQ
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
TOC
00:00:00 Intro
00:03:38 Reasoning
00:13:09 Potential AI Breakthroughs Reducing Computation Needs
00:20:39 Memorization vs. Generalization in AI
00:25:19 Approach to the ARC Challenge
00:29:10 Perceptions of Chat GPT and AGI
00:58:45 Abstract Principles of Jurgen's Approach
01:04:17 Analogical Reasoning and Compression
01:05:48 Breakthroughs in 1991: the P, the G, and the T in ChatGPT and Generative AI
01:15:50 Use of LSTM in Language Models by Tech Giants
01:21:08 Neural Network Aspect Ratio Theory
01:26:53 Reinforcement Learning Without Explicit Teachers
Refs:
★ "Annotated History of Modern AI and Deep Learning" (2022 survey by Schmidhuber):
★ Chain Rule For Backward Credit Assignment (Leibniz, 1676)
★ First Neural Net / Linear Regression / Shallow Learning (Gauss & Legendre, circa 1800)
★ First 20th Century Pioneer of Practical AI (Quevedo, 1914)
★ First Recurrent NN (RNN) Architecture (Lenz, Ising, 1920-1925)
★ AI Theory: Fundamental Limitations of Computation and Computation-Based AI (Gödel, 1931-34)
★ Unpublished ideas about evolving RNNs (Turing, 1948)
★ Multilayer Feedforward NN Without Deep Learning (Rosenblatt, 1958)
★ First Published Learning RNNs (Amari and others, ~1972)
★ First Deep Learning (Ivakhnenko & Lapa, 1965)
★ Deep Learning by Stochastic Gradient Descent (Amari, 1967-68)
★ ReLUs (Fukushima, 1969)
★ Backpropagation (Linnainmaa, 1970); precursor (Kelley, 1960)
★ Backpropagation for NNs (Werbos, 1982)
★ First Deep Convolutional NN (Fukushima, 1979); later combined with Backprop (Waibel 1987, Zhang 1988).
★ Metalearning or Learning to Learn (Schmidhuber, 1987)
★ Generative Adversarial Networks / Artificial Curiosity / NN Online Planners (Schmidhuber, Feb 1990; see the G in Generative AI and ChatGPT)
★ NNs Learn to Generate Subgoals and Work on Command (Schmidhuber, April 1990)
★ NNs Learn to Program NNs: Unnormalized Linear Transformer (Schmidhuber, March 1991; see the T in ChatGPT)
★ Deep Learning by Self-Supervised Pre-Training. Distilling NNs (Schmidhuber, April 1991; see the P in ChatGPT)
★ Experiments with Pre-Training; Analysis of Vanishing/Exploding Gradients, Roots of Long Short-Term Memory / Highway Nets / ResNets (Hochreiter, June 1991, further developed 1999-2015 with other students of Schmidhuber)
★ LSTM journal paper (1997, most cited AI paper of the 20th century)
★ xLSTM (Hochreiter, 2024)
★ Reinforcement Learning Prompt Engineer for Abstract Reasoning and Planning (Schmidhuber 2015)
★ Mindstorms in Natural Language-Based Societies of Mind (2023 paper by Schmidhuber's team)
https://arxiv.org/abs/2305.17066
★ Bremermann's physical limit of computation (1982)
EXTERNAL LINKS
CogX 2018 - Professor Juergen Schmidhuber
https://www.youtube.com/watch?v=17shdT9-wuA
Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability (Neural Networks, 1997)
https://sferics.idsia.ch/pub/juergen/loconet.pdf
The paradox at the heart of mathematics: Gödel's Incompleteness Theorem - Marcus du Sautoy
https://www.youtube.com/watch?v=I4pQbo5MQOs
(Refs truncated, full version on YT VD)
Professor Pedro Domingos, is an AI researcher and professor of computer science. He expresses skepticism about current AI regulation efforts and argues for faster AI development rather than slowing it down. He also discusses the need for new innovations to fulfil the promises of current AI techniques.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmented generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Show notes:
* Domingos' views on AI regulation and why he believes it's misguided
* His thoughts on the current state of AI technology and its limitations
* Discussion of his novel "2040", a satirical take on AI and tech culture
* Explanation of his work on "tensor logic", which aims to unify neural networks and symbolic AI
* Critiques of other approaches in AI, including those of OpenAI and Gary Marcus
* Thoughts on the AI "bubble" and potential future developments in the field
Prof. Pedro Domingos:
https://x.com/pmddomingos
2040: A Silicon Valley Satire [Pedro's new book]
https://amzn.to/3T51ISd
TOC:
00:00:00 Intro
00:06:31 Bio
00:08:40 Filmmaking skit
00:10:35 AI and the wisdom of crowds
00:19:49 Social Media
00:27:48 Master algorithm
00:30:48 Neurosymbolic AI / abstraction
00:39:01 Language
00:45:38 Chomsky
01:00:49 2040 Book
01:18:03 Satire as a shield for criticism?
01:29:12 AI Regulation
01:35:15 Gary Marcus
01:52:37 Copyright
01:56:11 Stochastic parrots come home to roost
02:00:03 Privacy
02:01:55 LLM ecosystem
02:05:06 Tensor logic
Refs:
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World [Pedro Domingos]
https://amzn.to/3MiWs9B
Rebooting AI: Building Artificial Intelligence We Can Trust [Gary Marcus]
https://amzn.to/3AAywvL
Flash Boys [Michael Lewis]
https://amzn.to/4dUGm1M
Andrew Ilyas, a PhD student at MIT who is about to start as a professor at CMU. We discuss Data modeling and understanding how datasets influence model predictions, Adversarial examples in machine learning and why they occur, Robustness in machine learning models, Black box attacks on machine learning systems, Biases in data collection and dataset creation, particularly in ImageNet and Self-selection bias in data and methods to address it.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api
Andrew's site:
https://andrewilyas.com/
https://x.com/andrew_ilyas
TOC:
00:00:00 - Introduction and Andrew's background
00:03:52 - Overview of the machine learning pipeline
00:06:31 - Data modeling paper discussion
00:26:28 - TRAK: Evolution of data modeling work
00:43:58 - Discussion on abstraction, reasoning, and neural networks
00:53:16 - "Adversarial Examples Are Not Bugs, They Are Features" paper
01:03:24 - Types of features learned by neural networks
01:10:51 - Black box attacks paper
01:15:39 - Work on data collection and bias
01:25:48 - Future research plans and closing thoughts
References:
Adversarial Examples Are Not Bugs, They Are Features
https://arxiv.org/pdf/1905.02175
TRAK: Attributing Model Behavior at Scale
https://arxiv.org/pdf/2303.14186
Datamodels: Predicting Predictions from Training Data
https://arxiv.org/pdf/2202.00622
Adversarial Examples Are Not Bugs, They Are Features
https://arxiv.org/pdf/1905.02175
IMAGENET-TRAINED CNNS
https://arxiv.org/pdf/1811.12231
ZOO: Zeroth Order Optimization Based Black-box
https://arxiv.org/pdf/1708.03999
A Spline Theory of Deep Networks
https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
Scaling Monosemanticity
https://transformer-circuits.pub/2024/scaling-monosemanticity/
Adversarial Examples Are Not Bugs, They Are Features
https://gradientscience.org/adv/
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
https://proceedings.mlr.press/v235/bartoldson24a.html
Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors
https://arxiv.org/abs/1807.07978
Estimation of Standard Auction Models
https://arxiv.org/abs/2205.02060
From ImageNet to Image Classification: Contextualizing Progress on Benchmarks
https://arxiv.org/abs/2005.11295
Estimation of Standard Auction Models
https://arxiv.org/abs/2205.02060
What Makes A Good Fisherman? Linear Regression under Self-Selection Bias
https://arxiv.org/abs/2205.03246
Towards Tracing Factual Knowledge in Language Models Back to the
Training Data [Akyürek]
https://arxiv.org/pdf/2205.11482
Dr. Joscha Bach introduces a surprising idea called "cyber animism" in his AGI-24 talk - the notion that nature might be full of self-organizing software agents, similar to the spirits in ancient belief systems. Bach suggests that consciousness could be a kind of software running on our brains, and wonders if similar "programs" might exist in plants or even entire ecosystems.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Joscha takes us on a tour de force through history, philosophy, and cutting-edge computer science, teasing us to rethink what we know about minds, machines, and the world around us. Joscha believes we should blur the lines between human, artificial, and natural intelligence, and argues that consciousness might be more widespread and interconnected than we ever thought possible.
Dr. Joscha Bach
https://x.com/Plinz
This is video 2/9 from our coverage of AGI-24 in Seattle https://agi-conf.org/2024/
Watch the official MLST interview with Joscha which we did right after this talk on our Patreon now on early access - https://www.patreon.com/posts/joscha-bach-110199676 (you also get access to our private discord and biweekly calls)
TOC:
00:00:00 Introduction: AGI and Cyberanimism
00:03:57 The Nature of Consciousness
00:08:46 Aristotle's Concepts of Mind and Consciousness
00:13:23 The Hard Problem of Consciousness
00:16:17 Functional Definition of Consciousness
00:20:24 Comparing LLMs and Human Consciousness
00:26:52 Testing for Consciousness in AI Systems
00:30:00 Animism and Software Agents in Nature
00:37:02 Plant Consciousness and Ecosystem Intelligence
00:40:36 The California Institute for Machine Consciousness
00:44:52 Ethics of Conscious AI and Suffering
00:46:29 Philosophical Perspectives on Consciousness
00:49:55 Q&A: Formalisms for Conscious Systems
00:53:27 Coherence, Self-Organization, and Compute Resources
YT version (very high quality, filmed by us live)
https://youtu.be/34VOI_oo-qM
Refs:
Aristotle's work on the soul and consciousness
Richard Dawkins' work on genes and evolution
Gerald Edelman's concept of Neural Darwinism
Thomas Metzinger's book "Being No One"
Yoshua Bengio's concept of the "consciousness prior"
Stuart Hameroff's theories on microtubules and consciousness
Christof Koch's work on consciousness
Daniel Dennett's "Cartesian Theater" concept
Giulio Tononi's Integrated Information Theory
Mike Levin's work on organismal intelligence
The concept of animism in various cultures
Freud's model of the mind
Buddhist perspectives on consciousness and meditation
The Genesis creation narrative (for its metaphorical interpretation)
California Institute for Machine Consciousness
Prof Gary Marcus revisited his keynote from AGI-21, noting that many of the issues he highlighted then are still relevant today despite significant advances in AI.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Gary Marcus criticized current large language models (LLMs) and generative AI for their unreliability, tendency to hallucinate, and inability to truly understand concepts.
Marcus argued that the AI field is experiencing diminishing returns with current approaches, particularly the "scaling hypothesis" that simply adding more data and compute will lead to AGI.
He advocated for a hybrid approach to AI that combines deep learning with symbolic AI, emphasizing the need for systems with deeper conceptual understanding.
Marcus highlighted the importance of developing AI with innate understanding of concepts like space, time, and causality.
He expressed concern about the moral decline in Silicon Valley and the rush to deploy potentially harmful AI technologies without adequate safeguards.
Marcus predicted a possible upcoming "AI winter" due to inflated valuations, lack of profitability, and overhyped promises in the industry.
He stressed the need for better regulation of AI, including transparency in training data, full disclosure of testing, and independent auditing of AI systems.
Marcus proposed the creation of national and global AI agencies to oversee the development and deployment of AI technologies.
He concluded by emphasizing the importance of interdisciplinary collaboration, focusing on robust AI with deep understanding, and implementing smart, agile governance for AI and AGI.
YT Version (very high quality filmed)
https://youtu.be/91SK90SahHc
Pre-order Gary's new book here:
Taming Silicon Valley: How We Can Ensure That AI Works for Us
https://amzn.to/4fO46pY
Filmed at the AGI-24 conference:
https://agi-conf.org/2024/
TOC:
00:00:00 Introduction
00:02:34 Introduction by Ben G
00:05:17 Gary Marcus begins talk
00:07:38 Critiquing current state of AI
00:12:21 Lack of progress on key AI challenges
00:16:05 Continued reliability issues with AI
00:19:54 Economic challenges for AI industry
00:25:11 Need for hybrid AI approaches
00:29:58 Moral decline in Silicon Valley
00:34:59 Risks of current generative AI
00:40:43 Need for AI regulation and governance
00:49:21 Concluding thoughts
00:54:38 Q&A: Cycles of AI hype and winters
01:00:10 Predicting a potential AI winter
01:02:46 Discussion on interdisciplinary approach
01:05:46 Question on regulating AI
01:07:27 Ben G's perspective on AI winter
DeepMind Research Scientist / MIT scholar Dr. Timothy Nguyen discusses his recent paper on understanding transformers through n-gram statistics. Nguyen explains his approach to analyzing transformer behavior using a kind of "template matching" (N-grams), providing insights into how these models process and predict language.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Key points covered include:
A method for describing transformer predictions using n-gram statistics without relying on internal mechanisms.
The discovery of a technique to detect overfitting in large language models without using holdout sets.
Observations on curriculum learning, showing how transformers progress from simpler to more complex rules during training.
Discussion of distance measures used in the analysis, particularly the variational distance.
Exploration of model sizes, training dynamics, and their impact on the results.
We also touch on philosophical aspects of describing versus explaining AI behavior, and the challenges in understanding the abstractions formed by neural networks. Nguyen concludes by discussing potential future research directions, including attempts to convert descriptions of transformer behavior into explanations of internal mechanisms.
Timothy Nguyen's earned his B.S. and Ph.D. in mathematics from Caltech and MIT, respectively. He held positions as Research Assistant Professor at the Simons Center for Geometry and Physics (2011-2014) and Visiting Assistant Professor at Michigan State University (2014-2017). During this time, his research expanded into high-energy physics, focusing on mathematical problems in quantum field theory. His work notably provided a simplified and corrected formulation of perturbative path integrals.
Since 2017, Nguyen has been working in industry, applying his expertise to machine learning. He is currently at DeepMind, where he contributes to both fundamental research and practical applications of deep learning to solve real-world problems.
Refs:
The Cartesian Cafe
https://www.youtube.com/@TimothyNguyen
Understanding Transformers via N-Gram Statistics
https://www.researchgate.net/publication/382204056_Understanding_Transformers_via_N-Gram_Statistics
TOC
00:00:00 Timothy Nguyen's background
00:02:50 Paper overview: transformers and n-gram statistics
00:04:55 Template matching and hash table approach
00:08:55 Comparing templates to transformer predictions
00:12:01 Describing vs explaining transformer behavior
00:15:36 Detecting overfitting without holdout sets
00:22:47 Curriculum learning in training
00:26:32 Distance measures in analysis
00:28:58 Model sizes and training dynamics
00:30:39 Future research directions
00:32:06 Conclusion and future topics
The podcast currently has 173 episodes available.
468 Listeners
445 Listeners
285 Listeners
289 Listeners
269 Listeners
3,978 Listeners
175 Listeners
197 Listeners
178 Listeners
108 Listeners
68 Listeners
129 Listeners
154 Listeners
49 Listeners
53 Listeners