Software Testing Unleashed - QA, DevEx & Quality Engineering

Why Traditional Testing Fails for AI Systems - Dušanka Lečić


Listen Later

From prompt failures to hallucinations: what breaks in AI testing

📘 Free e-book: The 7 success factors of software testing. 25 years of project experience in one 33-page workbook, now also in English 👉 Get it for free

"For the same input we have a lot of different outputs, some of them can be similar, but yeah still non-determinism is completely there." - Dušanka Lečić

This time I talk with Dušanka Lečić about why testing chatbots breaks everything we know about traditional QA. She explains how chatbot bugs are invisible – they hide in prompts, retrieval logic, and chunks, not in code – and why the same input can produce dozens of valid outputs. Dušanka shares her framework for testing context retention, hallucination control, and accuracy, and reveals why stress testing a chatbot means checking for typos and user frustration, not system load.

Dušanka Lečić is a dynamic leader and technical expert with nearly a decade of experience steering software testing initiatives across international teams. As a Test Lead and Department Manager at Levi9, she specializes in performance testing, agile methodologies, and engineering excellence. Holding a Ph.D. in Technical Sciences, Dušanka blends academic insight with real-world execution, and is a frequent contributor to industry conferences, mentoring programs, and expert communities. Her sessions offer a rich perspective on quality assurance, innovation, and leadership in fast-paced development environments.

Highlights:

  • Chatbot bugs are invisible in the traditional sense because they live not only in code, but in prompts, retrieval logic, and response generation, requiring a different debugging approach entirely.
  • Non-determinism in chatbot responses means multiple valid outputs exist for the same input, which breaks the classical pass/fail model and demands a wider definition of what counts as a correct test result.
  • Traceability in chatbot testing must cover chunks, retrieval results, and queries, not just the final response, because without that full log, root-cause analysis of a wrong answer is nearly impossible.
  • The CHAT framework structures chatbot testing around four concerns: context retention, hallucination control, accuracy and relevance, and a testing workflow that includes tracing, fixing, and retesting with similar queries.
  • Stress testing for chatbots means checking responses to misspellings, ambiguous terms, and bad wording that frustrated users produce, not measuring system performance under load.
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Software Testing Unleashed - QA, DevEx & Quality EngineeringBy Richard Seidl | Software Development & Testing Expert