The Real Python Podcast

Detecting Outliers in Your Data With Python


Listen Later

How do you find the most interesting or suspicious points within your data? What libraries and techniques can you use to detect these anomalies with Python? This week on the show, we speak with author Brett Kennedy about his book “Outlier Detection in Python.”

Brett describes initially getting involved with detecting outliers in financial data. He discusses various applications and techniques in security, manufacturing, quality assurance, and fraud. We also dig into the concept of explainable AI and the differences between supervised and unsupervised learning.

This episode is sponsored by APILayer.

Course Spotlight: Using k-Nearest Neighbors (kNN) in Python

In this video course, you’ll learn all about the k-nearest neighbors (kNN) algorithm in Python, including how to implement kNN from scratch. Once you understand how kNN works, you’ll use scikit-learn to facilitate your coding process.

Topics:

  • 00:00:00 – Introduction
  • 00:01:56 – Describing the book
  • 00:03:22 – How did you get involved in outlier detection?
  • 00:06:50 – Initially looking at the data to spot errors
  • 00:08:22 – Amount of fraud and financial errors
  • 00:09:50 – Understanding the nature of the outliers
  • 00:12:15 – Industries that would be interested in detection
  • 00:18:21 – Sponsor: APILayer.com
  • 00:19:15 – Who is the intended audience for the book?
  • 00:22:16 – Differences between supervised vs unsupervised learning
  • 00:25:48 – Autonomous vehicles detecting anomalous imagery
  • 00:29:08 – What is explainable AI?
  • 00:36:21 – Video Course Spotlight
  • 00:37:43 – Detecting an outlier across multiple columns
  • 00:44:32 – Detection of LLM and bot activity
  • 00:49:49 – Proving you are a human checkbox
  • 00:52:25 – What are Python libraries for outlier detection?
  • 00:53:57 – Creating synthetic data to work through examples
  • 00:57:10 – Tools developed and described in the book
  • 01:01:29 – How to find the book
  • 01:02:27 – What are you excited about in the world of Python?
  • 01:04:55 – What do you want to learn next?
  • 01:05:52 – How can people follow your work online?
  • 01:06:16 – Thanks and goodbye
  • Show Links:

    • Outlier Detection in Python
    • Episode #169: Improving Classification Models With XGBoost – The Real Python Podcast
    • XGBoost Documentation — xgboost 1.7.6 documentation
    • SHAP (SHapley Additive exPlanations) Documentation
    • I’m a teacher and this is the simple way I can tell if students have used AI to cheat in their essays - Daily Mail Online
    • pyod: A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)
    • DeepOD: Deep learning-based outlier/anomaly detection
    • scikit-learn: machine learning in Python — scikit-learn 1.5.0 documentation
    • DataConsistencyChecker: A Python tool to examine datasets for consistency
    • Brett Kennedy - LinkedIn
    • Brett-Kennedy - GitHub
    • Level up your Python skills with our expert-led courses:

      • Data Cleaning With pandas and NumPy
      • Using k-Nearest Neighbors (kNN) in Python
      • Starting With Linear Regression in Python
      • Support the podcast & join our community of Pythonistas

        ...more
        View all episodesView all episodes
        Download on the App Store

        The Real Python PodcastBy Real Python

        • 4.7
        • 4.7
        • 4.7
        • 4.7
        • 4.7

        4.7

        134 ratings


        More shows like The Real Python Podcast

        View all
        Hanselminutes with Scott Hanselman by Scott Hanselman

        Hanselminutes with Scott Hanselman

        378 Listeners

        Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

        Software Engineering Radio - the podcast for professional software developers

        262 Listeners

        The Changelog: Software Development, Open Source by Changelog Media

        The Changelog: Software Development, Open Source

        285 Listeners

        LINUX Unplugged by Jupiter Broadcasting

        LINUX Unplugged

        263 Listeners

        Thoughtworks Technology Podcast by Thoughtworks

        Thoughtworks Technology Podcast

        43 Listeners

        Talk Python To Me by Michael Kennedy

        Talk Python To Me

        585 Listeners

        Software Engineering Daily by Software Engineering Daily

        Software Engineering Daily

        630 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        295 Listeners

        Python Bytes by Michael Kennedy and Brian Okken

        Python Bytes

        212 Listeners

        Data Engineering Podcast by Tobias Macey

        Data Engineering Podcast

        142 Listeners

        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

        Syntax - Tasty Web Development Treats

        984 Listeners

        CoRecursive: Coding Stories by Adam Gordon Bell - Software Developer

        CoRecursive: Coding Stories

        185 Listeners

        DataFramed by DataCamp

        DataFramed

        267 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        196 Listeners

        The Stack Overflow Podcast by The Stack Overflow Podcast

        The Stack Overflow Podcast

        63 Listeners