
Sign up to save your podcasts
Or


How do you process and classify text documents in Python? What are the fundamental techniques and building blocks for Natural Language Processing (NLP)? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, talks about how machine learning (ML) models understand text.
Jodie explains how ML models require data in a structured format, which involves transforming text documents into columns and rows. She covers the most straightforward approach, called binary vectorization. We discuss the bag-of-words method and the tools of stemming, lemmatization, and count vectorization.
We jump into word embedding models next. Jodie talks about WordNet, Natural Language Toolkit (NLTK), word2vec, and Gensim. Our conversation lays a foundation for starting with text classification, implementing sentiment analysis, and building projects using these tools. Jodie also shares multiple resources to help you continue exploring NLP and modeling.
Course Spotlight: Learn Text Classification With Python and Keras
In this course, you’ll learn about Python text classification with Keras, working your way from a bag-of-words model with logistic regression to more advanced methods, such as convolutional neural networks. You’ll see how you can use pretrained word embeddings, and you’ll squeeze more performance out of your model through hyperparameter optimization.
Topics:
Show Links:
Level up your Python skills with our expert-led courses:
Support the podcast & join our community of Pythonistas
By Real Python4.7
139139 ratings
How do you process and classify text documents in Python? What are the fundamental techniques and building blocks for Natural Language Processing (NLP)? This week on the show, Jodie Burchell, developer advocate for data science at JetBrains, talks about how machine learning (ML) models understand text.
Jodie explains how ML models require data in a structured format, which involves transforming text documents into columns and rows. She covers the most straightforward approach, called binary vectorization. We discuss the bag-of-words method and the tools of stemming, lemmatization, and count vectorization.
We jump into word embedding models next. Jodie talks about WordNet, Natural Language Toolkit (NLTK), word2vec, and Gensim. Our conversation lays a foundation for starting with text classification, implementing sentiment analysis, and building projects using these tools. Jodie also shares multiple resources to help you continue exploring NLP and modeling.
Course Spotlight: Learn Text Classification With Python and Keras
In this course, you’ll learn about Python text classification with Keras, working your way from a bag-of-words model with logistic regression to more advanced methods, such as convolutional neural networks. You’ll see how you can use pretrained word embeddings, and you’ll squeeze more performance out of your model through hyperparameter optimization.
Topics:
Show Links:
Level up your Python skills with our expert-led courses:
Support the podcast & join our community of Pythonistas

288 Listeners

625 Listeners

579 Listeners

289 Listeners

302 Listeners

213 Listeners

988 Listeners

8,088 Listeners

969 Listeners

200 Listeners

207 Listeners

75 Listeners

310 Listeners

100 Listeners

70 Listeners