Inference Time Tactics

The Thinking Algorithm Leaderboard: Why No Single Model Wins


Listen Later

In this episode of Inference Time Tactics, Cooper and Byron break down NeuroMetric's Thinking Algorithm Leaderboard and what it reveals about building production-ready AI agents. They share why prompt engineering with a single model won't cut it for enterprise use cases, explore the impact of inference-time compute strategies, and discuss what they learned from testing 10 models across real CRM tasks—from surprising token inefficiency to catastrophic failures in SQL generation.

 

We talked about:

 

  • Why NeuroMetric built the first leaderboard combining models with inference-time compute strategies. 
  • How Salesforce's CRMArena-Pro reflects real multi-step business tasks better than pure reasoning benchmarks. 
  • The jagged frontier: no single model or technique dominates across all tasks. 
  • Why GPT 20B was surprisingly token inefficient—twice as slow as GPT 120B for similar accuracy. 
  • How GPT-5 nano's conversational style broke SQL generation tasks completely. 
  • Trading accuracy for speed: two-model ensembles versus five, and saving 20+ seconds per task. 
  • Throughput constraints as a hidden bottleneck when scaling to production volumes. 
  • Future directions: LLM-guided search, task clustering, and compression to specialized small models.


  • Resources Mentioned:

    CRMArena-Pro from Saleforce:

    https://www.salesforce.com/blog/crmarena-pro/

    Thinking Algorithm Leaderboard: 

    https://leaderboard.neurometric.ai/ 



    Connect with Neurometric:

    Website: https://www.neurometric.ai/ 

    Substack: https://neurometric.substack.com/ 

    X: https://x.com/neurometric/ 

    Bluesky: https://bsky.app/profile/neurometric.bsky.social

     

    Hosts:

    Calvin Cooper

    https://x.com/cooper_nyc_ 

    https://www.linkedin.com/in/coopernyc

     

    Guest/s:

    Byron Galbraith

    https://x.com/bgalbraith 

    https://www.linkedin.com/in/byrongalbraith 

    ...more
    View all episodesView all episodes
    Download on the App Store

    Inference Time TacticsBy NeuroMetric AI