
Sign up to save your podcasts
Or


InferenceX, formerly InferenceMAX: https://inferencex.com/
Article (latest): https://newsletter.semianalysis.com/p/inferencex-v2-nvidia-blackwell-vs
GitHub https://github.com/SemiAnalysisAI/InferenceX
Article (original) https://newsletter.semianalysis.com/p/inferencemax-open-source-inference
00:00 Introduction to InferenceX
02:52 Evolution from InferenceMAX to InferenceX
06:06 Benchmarking and Performance Insights
08:43 The Scale of Benchmarking Work
11:39 Collaboration with AMD and Nvidia
14:52 The Evolution of Inference Benchmarking
17:34 Optimizations and Their Impact
20:47 Challenges in Composability
23:51 Multi-Token Prediction Explained
26:52 Cost Implications of Optimizations
31:06 Understanding Inference Workloads and Benchmarks
33:44 Future Plans for Inference Optimization
37:16 Roadmap for New Models and Data Sets
39:03 Challenges in Benchmarking Multi-Turn and Multi-Modal Data
42:44 Experiences with AI Models and Their Limitations
48:43 Skepticism About Future AI Improvements
By Jordan Nanos, Doug O'LaughlinInferenceX, formerly InferenceMAX: https://inferencex.com/
Article (latest): https://newsletter.semianalysis.com/p/inferencex-v2-nvidia-blackwell-vs
GitHub https://github.com/SemiAnalysisAI/InferenceX
Article (original) https://newsletter.semianalysis.com/p/inferencemax-open-source-inference
00:00 Introduction to InferenceX
02:52 Evolution from InferenceMAX to InferenceX
06:06 Benchmarking and Performance Insights
08:43 The Scale of Benchmarking Work
11:39 Collaboration with AMD and Nvidia
14:52 The Evolution of Inference Benchmarking
17:34 Optimizations and Their Impact
20:47 Challenges in Composability
23:51 Multi-Token Prediction Explained
26:52 Cost Implications of Optimizations
31:06 Understanding Inference Workloads and Benchmarks
33:44 Future Plans for Inference Optimization
37:16 Roadmap for New Models and Data Sets
39:03 Challenges in Benchmarking Multi-Turn and Multi-Modal Data
42:44 Experiences with AI Models and Their Limitations
48:43 Skepticism About Future AI Improvements