Life on Hard Mode with Pratik Karki

Life on Hard Mode #1 - Pilot


Listen Later

Why did I start a podcast?

I started this podcast because I wanted a place to speak honestly about the work I am doing, the ideas I care about, and the challenges that come with building Anthromind. My inbox has been full of questions about careers, AI, visas, startups, and life choices. I cannot give everyone long personal replies (trust me, I tried), but I can share what I am learning in a format that scales.

A weekly podcast forces me to slow down and think. It lets me break down research papers, explain how I build and sell products, talk about what I see in the AI world, and tell the truth about what it feels like to live life on hard mode. It is also a way for me to stay accountable. Every Friday, I have to show up, speak clearly, and share what I worked on. It’s a no BS method of sharing and building in public as much as I am able to.

I want this to be useful to people who are trying to carve their own path. If someone can learn from my mistakes or get a clearer picture of how to build something in AI today, then this will be worth it :) The podcast is my way of opening the door a little wider and letting people see how all of this actually works.Please continue to send me your questions, thoughts, and concerns. I’ll tackle them every week. Also, I’m open to ideas on areas I should focus upon.

Topic of the week: Post-training for LLMs explained

Post-training is the set of steps that happen after an LLM finishes its main training run. The base model (think GPT 4 or Deepseek R1) learns general patterns from huge amounts of text, but that alone does not make it safe, helpful, or aligned with what people want. Post-training is where we shape the model into something usable. And this is why I built Anthromind specifically.

There are three major parts worth understanding.

1. Preference training

This is where humans show the model examples of good and bad behavior. We rank outputs, score them, and teach the model what we prefer. This shifts the model toward clearer reasoning, safer answers, and more predictable behavior. Think of ranking by bias, trustworthiness on a scale of zero to ten. Recent work, like step-wise preference training, focuses on rewarding the thinking process rather than just the final answer.

2. Reinforcement learning and policy shaping

Here we adjust the model using reward signals. The idea is simple. If the model gives answers we like, we push it to produce more of that behavior. If it produces something we do not want, we correct it. Newer techniques use faster and more stable methods than the original RLHF. Techniques like Reinforcement learning through Verifiable rewards (RLVR). The industry is moving toward training the model to explain its reasoning chains, which makes its internal decisions easier to steer.

3. Tool use and structured behaviors

Large models today do more than predict text. They plan, call APIs, write code, and follow rules. Post-training is where we teach the model how to use tools, how to follow steps, and how to interact with real systems. This is also where we reduce errors, hallucinations, and unsafe actions. In a sense, post-training is most important here, and organizations adopting AI cannot survive without it.

The short version is that post-training is the bridge between raw capability and real-world usefulness. It is how we take a powerful yet chaotic system and shape it into something that can help people, answer questions safely, and work seamlessly within products. Without post-training, even the strongest base model is not something you can ship.

In conclusion, AI adoption will not be possible without post-training techniques. A recent article by MIT reported that 95% of generative AI pilots failed, and the lack of post-training techniques and awareness is cited as the reason.

Sources:

* https://alphaxiv.org/abs/2502.21321

* https://x.com/tszzl/status/1948907851508056495

* https://alphaxiv.org/abs/2509.25300

* https://alphaxiv.org/abs/2509.17866

Questions and answers!

I received 24 questions across LinkedIn, X, FB, Insta, and Substack. I’ve compiled and answered quite a few in the episode! I’ll get to more in future episodes. Please keep them coming.

P.S. What have I been reading/watching/playing?

* [Book] The Hard Thing About Hard Things: To be honest, I heard about this book for the longest time and didn’t really see a point in reading it, but I’m thankful that I did. Ben Horowitz goes through the details about why founding and starting a company feels like it’s rainbows and whiskers, but is not really true. It is brutally tough, man. However, it is also inspiring because he walks us through how to navigate difficult scenarios in a very pragmatic manner.

* [Book] The Fall of Hyperion: I read the first Hyperion book by Dan Simmons, and immediately I was like, “I need to read the sequel.” And the sequel does not disappoint. It’s a different storytelling structure than the first one, but it has a very complex story and touches upon topics such as AIs and systems so complex that humans cannot comprehend them. It is truly ahead of its time.

* [Movie] Guillermo Del Toro’s Frankenstein: I’ve been a fan of Guillermo del Toro since Pan’s Labyrinth and the Hellboy days. Frankenstein was released this year, sadly, only on Netflix, with a limited theatrical release. It’s mind-blowingly good and should be watched in theaters like I did. The less said, the better, but you truly feel for the monster by the end. Probably my favorite movie of the year so far.

* [Game] Hotline Miami 1 & 2: I played the first Hotline Miami game when I was a kid, and it was really fun. The music was good, but I didn’t really understand the story. I finally played the second one, which serves as a proper conclusion and includes a lot of backstory for the characters. The story is so compelling; it explores themes such as violence, escalation, and legacy. I’ve been a massive fan of it ever since I played it again, and have been listening to music on repeat.



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit pratikkarki.substack.com
...more
View all episodesView all episodes
Download on the App Store

Life on Hard Mode with Pratik KarkiBy Pratik Karki