
Sign up to save your podcasts
Or


Today on the Salesforce Admins Podcast, we talk to Joshua Birk, Senior Director of Admin Evangelism at Salesforce. Join us as we chat about how the human in the loop is key to building reliable, predictable AI.
You should subscribe for the full episode, but here are a few takeaways from our conversation with Joshua Birk.
It seems like every week, there’s a new headline about an AI agent doing something it shouldn’t. As Josh explains, that’s because we’re still in the process of understanding AI as a tool. That’s why we sat down to discuss how to build predictable, reliable solutions in Agentforce.
When an agent behaves non-deterministically, it’s usually because there weren’t enough guardrails in place. The thing is, if you’re building an AI agent to do everything, it’s hard to control what it can and cannot do.
Josh’s advice is to narrow the scope of your agent and build it for a very specific purpose. This makes it easier to build guardrails and also allows you to test it thoroughly before release.
When it comes to testing, there’s an old programming joke that comes to mind. A QA engineer walks into a bar. He orders a drink. He orders five drinks. He orders zero drinks. He orders infinite drinks. He orders a horse. However, when the first real customer walks in and asks where the bathroom is, the entire bar bursts into flames.
As Josh explains, it’s important to test for all sorts of weird edge cases and make sure your agent performs predictably. But it’s even more important to think things through from the user’s perspective so you don’t miss something that should be obvious. AI can do extraordinary things, but you still need a human in the loop.
Josh emphasizes that the first part of testing is planning: “What are the Ifs? What are the Thens? What are the things you absolutely don’t want it to do?” The more specifically you can answer these questions, the easier it will be to build and test agentic solutions that are predictable and reliable.
The most effective AI agents aren’t autonomous solutions. They’re tools that give the humans who use them superpowers. You still need a human in the loop to make sure they’re used for good.
Be sure to listen to my full conversation with Josh for more about testing and building in Agentforce. And don’t forget to subscribe to the Salesforce Admins Podcast to catch a new episode every Thursday.
Mike:
So, Josh, welcome back to the podcast.
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
But now it’s like, I’m like, “Hey…” I think it was Gemini. I’m like, “Hey, Gemini, my wife would like to go shopping, we would like to go eating, and I have an interest in museums and aquariums,” and it’s like, “Here’s your day.” And I’m like, “Can you add that to my calendar?” And it’s like, it’s added to your calendar. And I’m like, it’s moving so fast. My dark humor side of me just can’t keep up because I don’t know what I can make fun of AI anymore because it just keeps getting better.
Mike:
I recently, coming off of, I think it was last week, I wrote a blog post about service assistant, and to me it just felt so frictionless because it was easy to put in all of the prompts for the user and kind of really give AI this single mindset and repeatable, but not… I’m still trying to work through deterministic and that stuff. It felt, yes, I know what it’s going to tell me and it’s not going to go off the rails. And that, to me, felt so much more comforting if I was an admin trying to roll something out as opposed to, “Okay, I turned on Agentforce. Now ask it a question,” and you know you’re going to have that one user that’s like, “I asked it for the nearest Mexican taco place,” and you’re like, “Bob…” Because it’d be Bob in sales that would say that.
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
And even when you and I were talking AI and doing workshops together, I’m like, “We’re seeing a data set we’ve given you. But remember, these answers will change if you have 1 record, 5 record, 500 records, 750 records, 250 which should be archived and 300 of which you don’t want.” It’s like, your dataset will always matter no matter how smart and predictable your AI gets.
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
What you have as an admin is the ability to then start… So assume Salesforce… Trust is our number one, right? So we’ve already given you a very reasonable starting place to have a professional-sounding AI. Now take that and start building towards your use case. What are the ifs? What are thens? What are the things you absolutely don’t want it to do? And now we’re making it even easier to hand it off to a human. So we’re really preaching autonomy. An AI can help you find a hotel and it can help you do this, but at some point you want to make sure that the human stays in the loop. So I know we’re talking about testing, but I think the first part of testing is planning, and it’s that level of planning that’s going to let you be like, “Did my ifs, thens, and dos and don’ts, did they actually work correctly?”
Mike:
Josh Birk:
Mike:
I think one of the things I was talking to you about is memory bleed. If Sally asks something a hundred times, then the 101st time, it’s just going to assume, “Oh, you’re asking me about this and not something else.”
Josh Birk:
The example I gave in one of the keynotes… So I know we talked about not making this too dark, but I’m just going to bring up… So one of the reasons this keynote came up was because I would go to conferences and people kept wanting to talk about these kind of brazen theories of AI. And remember, AI’s not new, so a lot of these theories have been around for a while, which is great because they’re all kind of cautionary tales. And one of the cautionary tales is the paperclip maximizer, and to make that long story short, it’s an AI whose sole purpose is to create paperclips. And without guardrails, without a human saying, “Don’t start wars. Don’t burn down rainforests,” because remember at its core, an AI is not an ethical machine, it will burn down rainforests.
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
But I think the real, I wouldn’t call it human in the loop, but it’s almost like the check-in factor is… So once you have your agent built, and once you have maybe service assistant up there on the screen, who are the two or three users you’re checking back into to say, “So how’s this going,” and then sitting with them in their environment? And I figured that out really fast when I had to go to a call center. Because at a call center, you get all kinds of different attitudes, you get all kinds of different personalities, and it’s chaotic. And some people’s desks are sterile like an operating table because they need that, and some people have 10,000 beanie babies and, what are those, Labubus all over the place and stickers, and it’s very chaotic, and you’re like, “Oh, I didn’t test Agentforce.”
Josh Birk:
Mike:
Josh Birk:
No, I think that’s an excellent point. And also, I think it brings up, too, because I have grown angry with customer support when the customer support was a really bad IVT system, and that’s really honestly like… If you want to abuse your customer support staff, give your customers a really horrible automated system. And that is the first way… And the first human I talked to, I had to be like, “I’m angry, but not at you,” because I knew I was so angry at the time, I wasn’t going to be calming down really quickly.
But I think that plays into… First of all, the idea of Agentforce is that it’s supposed to give you this very natural conversation. And second of all, kind of on the flip side, is a good kind of a memory bleed, it’s supposed to remember and retain what was being talked about. So there’s that classic, you call your phone company, the first thing they ask you is your phone number, and you’re kind of like, “You should know that.” And then you put in your phone number, and then you do 15 things, and it figures out, “I can’t do anything,” and it calls a human, the first thing the human asks you is your phone number. It’s like, “This is not fun. This is not a productive use of my time right now.” And we really try to resolve that, noting things like the human gets a transcript of what was talking about as well. They get a little AI summary of like, “This is what’s wrong with the customer,” kind of thing.
But I think to your point, you don’t necessarily know how that plays out until the rubber hits the road. What are the right points for the agent to go in and have autonomy, and then what are the right points for the human to jump in and be that actual person that somebody can talk to?
Mike:
Josh Birk:
Mike:
And it was funny because nobody had gone through, what happens if we ship half an order? And I get that… And this is the same for what admins deal with too. There is infinite possibilities of ways that an item can end up in a customer’s hands different than what was intended. But I feel like when they put things in there, nobody was just like, “What happens if they type half an order?” I literally got, “One of this, expected two. One of this, expected two,” and it gets down to the end and it’s like, “Yeah, I can’t help you with that. Transferring you to chat,” and then the chat didn’t carry over any of the information. I was like, “Okay. I mean, I love that you guys are trying this, but did nobody test this?”
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
I think what’s interesting is we very rapidly, at least in my perception, have gone through AI is taking our jobs to it’s not. And here’s what I posit for this year. I actually think AI could lead us to full employment. And if you’ve ever studied macroeconomics, full employment isn’t every single person having a job. That’s theoretically impossible. But it’s the smallest percentage of the workforce unemployed as possible.
Josh Birk:
Mike:
Josh Birk:
Mike:
And so here’s what I get at it could be better customer service. Then the person who opens up the window, instead of just being like, “Hey,” or like, “22.16,” because that’s how much it is for everything nowadays, “22.16,” they just kind of look at you, now they can be having a good day and they can turn and be like, “Hi, Mike,” because they know my name, and be like, “It’s 22.16,” and they can take my card and they can pay, and they can be like, “Here’s your food,” and it’s 100% correct. You can be a happy person working at your job because AI helped you out, and then you’re not grumpy like, “Ugh, this job sucks.” Right?
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
The post The Importance of Human in the Loop for Agentforce appeared first on Salesforce Admins.
By Mike Gerholdt4.7
201201 ratings
Today on the Salesforce Admins Podcast, we talk to Joshua Birk, Senior Director of Admin Evangelism at Salesforce. Join us as we chat about how the human in the loop is key to building reliable, predictable AI.
You should subscribe for the full episode, but here are a few takeaways from our conversation with Joshua Birk.
It seems like every week, there’s a new headline about an AI agent doing something it shouldn’t. As Josh explains, that’s because we’re still in the process of understanding AI as a tool. That’s why we sat down to discuss how to build predictable, reliable solutions in Agentforce.
When an agent behaves non-deterministically, it’s usually because there weren’t enough guardrails in place. The thing is, if you’re building an AI agent to do everything, it’s hard to control what it can and cannot do.
Josh’s advice is to narrow the scope of your agent and build it for a very specific purpose. This makes it easier to build guardrails and also allows you to test it thoroughly before release.
When it comes to testing, there’s an old programming joke that comes to mind. A QA engineer walks into a bar. He orders a drink. He orders five drinks. He orders zero drinks. He orders infinite drinks. He orders a horse. However, when the first real customer walks in and asks where the bathroom is, the entire bar bursts into flames.
As Josh explains, it’s important to test for all sorts of weird edge cases and make sure your agent performs predictably. But it’s even more important to think things through from the user’s perspective so you don’t miss something that should be obvious. AI can do extraordinary things, but you still need a human in the loop.
Josh emphasizes that the first part of testing is planning: “What are the Ifs? What are the Thens? What are the things you absolutely don’t want it to do?” The more specifically you can answer these questions, the easier it will be to build and test agentic solutions that are predictable and reliable.
The most effective AI agents aren’t autonomous solutions. They’re tools that give the humans who use them superpowers. You still need a human in the loop to make sure they’re used for good.
Be sure to listen to my full conversation with Josh for more about testing and building in Agentforce. And don’t forget to subscribe to the Salesforce Admins Podcast to catch a new episode every Thursday.
Mike:
So, Josh, welcome back to the podcast.
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
But now it’s like, I’m like, “Hey…” I think it was Gemini. I’m like, “Hey, Gemini, my wife would like to go shopping, we would like to go eating, and I have an interest in museums and aquariums,” and it’s like, “Here’s your day.” And I’m like, “Can you add that to my calendar?” And it’s like, it’s added to your calendar. And I’m like, it’s moving so fast. My dark humor side of me just can’t keep up because I don’t know what I can make fun of AI anymore because it just keeps getting better.
Mike:
I recently, coming off of, I think it was last week, I wrote a blog post about service assistant, and to me it just felt so frictionless because it was easy to put in all of the prompts for the user and kind of really give AI this single mindset and repeatable, but not… I’m still trying to work through deterministic and that stuff. It felt, yes, I know what it’s going to tell me and it’s not going to go off the rails. And that, to me, felt so much more comforting if I was an admin trying to roll something out as opposed to, “Okay, I turned on Agentforce. Now ask it a question,” and you know you’re going to have that one user that’s like, “I asked it for the nearest Mexican taco place,” and you’re like, “Bob…” Because it’d be Bob in sales that would say that.
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
And even when you and I were talking AI and doing workshops together, I’m like, “We’re seeing a data set we’ve given you. But remember, these answers will change if you have 1 record, 5 record, 500 records, 750 records, 250 which should be archived and 300 of which you don’t want.” It’s like, your dataset will always matter no matter how smart and predictable your AI gets.
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
What you have as an admin is the ability to then start… So assume Salesforce… Trust is our number one, right? So we’ve already given you a very reasonable starting place to have a professional-sounding AI. Now take that and start building towards your use case. What are the ifs? What are thens? What are the things you absolutely don’t want it to do? And now we’re making it even easier to hand it off to a human. So we’re really preaching autonomy. An AI can help you find a hotel and it can help you do this, but at some point you want to make sure that the human stays in the loop. So I know we’re talking about testing, but I think the first part of testing is planning, and it’s that level of planning that’s going to let you be like, “Did my ifs, thens, and dos and don’ts, did they actually work correctly?”
Mike:
Josh Birk:
Mike:
I think one of the things I was talking to you about is memory bleed. If Sally asks something a hundred times, then the 101st time, it’s just going to assume, “Oh, you’re asking me about this and not something else.”
Josh Birk:
The example I gave in one of the keynotes… So I know we talked about not making this too dark, but I’m just going to bring up… So one of the reasons this keynote came up was because I would go to conferences and people kept wanting to talk about these kind of brazen theories of AI. And remember, AI’s not new, so a lot of these theories have been around for a while, which is great because they’re all kind of cautionary tales. And one of the cautionary tales is the paperclip maximizer, and to make that long story short, it’s an AI whose sole purpose is to create paperclips. And without guardrails, without a human saying, “Don’t start wars. Don’t burn down rainforests,” because remember at its core, an AI is not an ethical machine, it will burn down rainforests.
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
But I think the real, I wouldn’t call it human in the loop, but it’s almost like the check-in factor is… So once you have your agent built, and once you have maybe service assistant up there on the screen, who are the two or three users you’re checking back into to say, “So how’s this going,” and then sitting with them in their environment? And I figured that out really fast when I had to go to a call center. Because at a call center, you get all kinds of different attitudes, you get all kinds of different personalities, and it’s chaotic. And some people’s desks are sterile like an operating table because they need that, and some people have 10,000 beanie babies and, what are those, Labubus all over the place and stickers, and it’s very chaotic, and you’re like, “Oh, I didn’t test Agentforce.”
Josh Birk:
Mike:
Josh Birk:
No, I think that’s an excellent point. And also, I think it brings up, too, because I have grown angry with customer support when the customer support was a really bad IVT system, and that’s really honestly like… If you want to abuse your customer support staff, give your customers a really horrible automated system. And that is the first way… And the first human I talked to, I had to be like, “I’m angry, but not at you,” because I knew I was so angry at the time, I wasn’t going to be calming down really quickly.
But I think that plays into… First of all, the idea of Agentforce is that it’s supposed to give you this very natural conversation. And second of all, kind of on the flip side, is a good kind of a memory bleed, it’s supposed to remember and retain what was being talked about. So there’s that classic, you call your phone company, the first thing they ask you is your phone number, and you’re kind of like, “You should know that.” And then you put in your phone number, and then you do 15 things, and it figures out, “I can’t do anything,” and it calls a human, the first thing the human asks you is your phone number. It’s like, “This is not fun. This is not a productive use of my time right now.” And we really try to resolve that, noting things like the human gets a transcript of what was talking about as well. They get a little AI summary of like, “This is what’s wrong with the customer,” kind of thing.
But I think to your point, you don’t necessarily know how that plays out until the rubber hits the road. What are the right points for the agent to go in and have autonomy, and then what are the right points for the human to jump in and be that actual person that somebody can talk to?
Mike:
Josh Birk:
Mike:
And it was funny because nobody had gone through, what happens if we ship half an order? And I get that… And this is the same for what admins deal with too. There is infinite possibilities of ways that an item can end up in a customer’s hands different than what was intended. But I feel like when they put things in there, nobody was just like, “What happens if they type half an order?” I literally got, “One of this, expected two. One of this, expected two,” and it gets down to the end and it’s like, “Yeah, I can’t help you with that. Transferring you to chat,” and then the chat didn’t carry over any of the information. I was like, “Okay. I mean, I love that you guys are trying this, but did nobody test this?”
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
I think what’s interesting is we very rapidly, at least in my perception, have gone through AI is taking our jobs to it’s not. And here’s what I posit for this year. I actually think AI could lead us to full employment. And if you’ve ever studied macroeconomics, full employment isn’t every single person having a job. That’s theoretically impossible. But it’s the smallest percentage of the workforce unemployed as possible.
Josh Birk:
Mike:
Josh Birk:
Mike:
And so here’s what I get at it could be better customer service. Then the person who opens up the window, instead of just being like, “Hey,” or like, “22.16,” because that’s how much it is for everything nowadays, “22.16,” they just kind of look at you, now they can be having a good day and they can turn and be like, “Hi, Mike,” because they know my name, and be like, “It’s 22.16,” and they can take my card and they can pay, and they can be like, “Here’s your food,” and it’s 100% correct. You can be a happy person working at your job because AI helped you out, and then you’re not grumpy like, “Ugh, this job sucks.” Right?
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
Josh Birk:
Mike:
The post The Importance of Human in the Loop for Agentforce appeared first on Salesforce Admins.

16,171 Listeners

3,585 Listeners

3,232 Listeners

4,401 Listeners

1,655 Listeners

186 Listeners

5,160 Listeners

3,989 Listeners

3,092 Listeners

9 Listeners

10,264 Listeners

2,220 Listeners

652 Listeners

122 Listeners

0 Listeners