May 28, 2026

Why Traditional Testing Fails for AI Systems - Dušanka Lečić

24 minutes

From prompt failures to hallucinations: what breaks in AI testing

📘 Free e-book: The 7 success factors of software testing. 25 years of project experience in one 33-page workbook, now also in English 👉 Get it for free

"For the same input we have a lot of different outputs, some of them can be similar, but yeah still non-determinism is completely there." - Dušanka Lečić

This time I talk with Dušanka Lečić about why testing chatbots breaks everything we know about traditional QA. She explains how chatbot bugs are invisible – they hide in prompts, retrieval logic, and chunks, not in code – and why the same input can produce dozens of valid outputs. Dušanka shares her framework for testing context retention, hallucination control, and accuracy, and reveals why stress testing a chatbot means checking for typos and user frustration, not system load.

Dušanka Lečić is a dynamic leader and technical expert with nearly a decade of experience steering software testing initiatives across international teams. As a Test Lead and Department Manager at Levi9, she specializes in performance testing, agile methodologies, and engineering excellence. Holding a Ph.D. in Technical Sciences, Dušanka blends academic insight with real-world execution, and is a frequent contributor to industry conferences, mentoring programs, and expert communities. Her sessions offer a rich perspective on quality assurance, innovation, and leadership in fast-paced development environments.

Highlights:

Chatbot bugs are invisible in the traditional sense because they live not only in code, but in prompts, retrieval logic, and response generation, requiring a different debugging approach entirely.

Non-determinism in chatbot responses means multiple valid outputs exist for the same input, which breaks the classical pass/fail model and demands a wider definition of what counts as a correct test result.

Traceability in chatbot testing must cover chunks, retrieval results, and queries, not just the final response, because without that full log, root-cause analysis of a wrong answer is nearly impossible.

The CHAT framework structures chatbot testing around four concerns: context retention, hallucination control, accuracy and relevance, and a testing workflow that includes tracing, fixing, and retesting with similar queries.

Stress testing for chatbots means checking responses to misspellings, ambiguous terms, and bad wording that frustrated users produce, not measuring system performance under load.

...more

View all episodes

By Richard Seidl | Software Development & Testing Expert

May 28, 2026

Why Traditional Testing Fails for AI Systems - Dušanka Lečić

24 minutes

From prompt failures to hallucinations: what breaks in AI testing

📘 Free e-book: The 7 success factors of software testing. 25 years of project experience in one 33-page workbook, now also in English 👉 Get it for free

"For the same input we have a lot of different outputs, some of them can be similar, but yeah still non-determinism is completely there." - Dušanka Lečić

Highlights:

Chatbot bugs are invisible in the traditional sense because they live not only in code, but in prompts, retrieval logic, and response generation, requiring a different debugging approach entirely.

Stress testing for chatbots means checking responses to misspellings, ambiguous terms, and bad wording that frustrated users produce, not measuring system performance under load.

...more

Share Why Traditional Testing Fails for AI Systems - Dušanka Lečić

Sign up to save your podcasts

Why Traditional Testing Fails for AI Systems - Dušanka Lečić

Why Traditional Testing Fails for AI Systems - Dušanka Lečić