
Sign up to save your podcasts
Or


As AI moves from its centralised, expensive early phase into mass diffusion, I see enterprises facing a structural reckoning: processing millions of inference calls against frontier large language models is no longer just a technology choice — it is a capital allocation decision with material consequences for margins and business model sustainability. I argue that Small Language Models are the efficient market response. A model fine-tuned on a narrow domain will consistently outperform a generalist model on that specific task while cutting inference costs by 80–95%, improving latency, satisfying data residency requirements, and eliminating vendor concentration risk. The key insight I draw on is that comparative advantage belongs not to the broadest capability set, but to the system most precisely matched to the task — the same principle that explains why specialisation creates value throughout economic history.
The theoretical gains of SLMs, however, only materialise through what I call "harness engineering" — the surrounding infrastructure of evaluation pipelines, automated testing, production monitoring, and deployment tooling that converts a model's potential into reliable business output. Without it, SLMs fail not because the models are inadequate, but because the organisational systems governing them are. More importantly, I find that this discipline generates compounding returns over time: because SLMs are lightweight and fast to retrain, production signal feeds directly back into improved models, with each iteration enriching the evaluation dataset and refining the deployment playbook. Organisations that build this stack are not merely reducing AI costs — they are accumulating proprietary cognitive infrastructure that appreciates with use, insulated from frontier model pricing volatility, and positioned to treat intelligence as an owned organisational capability rather than a vendor relationship.
By Smriti KirubanandanAs AI moves from its centralised, expensive early phase into mass diffusion, I see enterprises facing a structural reckoning: processing millions of inference calls against frontier large language models is no longer just a technology choice — it is a capital allocation decision with material consequences for margins and business model sustainability. I argue that Small Language Models are the efficient market response. A model fine-tuned on a narrow domain will consistently outperform a generalist model on that specific task while cutting inference costs by 80–95%, improving latency, satisfying data residency requirements, and eliminating vendor concentration risk. The key insight I draw on is that comparative advantage belongs not to the broadest capability set, but to the system most precisely matched to the task — the same principle that explains why specialisation creates value throughout economic history.
The theoretical gains of SLMs, however, only materialise through what I call "harness engineering" — the surrounding infrastructure of evaluation pipelines, automated testing, production monitoring, and deployment tooling that converts a model's potential into reliable business output. Without it, SLMs fail not because the models are inadequate, but because the organisational systems governing them are. More importantly, I find that this discipline generates compounding returns over time: because SLMs are lightweight and fast to retrain, production signal feeds directly back into improved models, with each iteration enriching the evaluation dataset and refining the deployment playbook. Organisations that build this stack are not merely reducing AI costs — they are accumulating proprietary cognitive infrastructure that appreciates with use, insulated from frontier model pricing volatility, and positioned to treat intelligence as an owned organisational capability rather than a vendor relationship.