
Sign up to save your podcasts
Or
Nicolay here,
I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points.
If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems.
If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software.
Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function.
It’s like swapping duct-taped Python scripts for a purpose-built compiler.
Vaibhav advocates for building first principle based primitives.
One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts.
Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs.
We also cover:
💡 Core Concepts
📶 Connect with Vaibhav:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at [email protected].
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests:
That's our agreement - I deliver actionable AI insights, you help grow this. ♻️
Nicolay here,
I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points.
If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems.
If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software.
Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function.
It’s like swapping duct-taped Python scripts for a purpose-built compiler.
Vaibhav advocates for building first principle based primitives.
One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts.
Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs.
We also cover:
💡 Core Concepts
📶 Connect with Vaibhav:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at [email protected].
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests:
That's our agreement - I deliver actionable AI insights, you help grow this. ♻️