September 08, 2022

Reddit Sentiment Analysis with Apache Kafka-Based Microservices

35 minutes

How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey.

Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog—Data Enrichment in Existing Data Pipelines Using Confluent Cloud.

The second project Shufan presents is a sentiment analysis system that gathers data from a given subreddit, then assigns the data a sentiment score. He points out that its results would be hard to duplicate manually by simply reading through a subreddit—you really need the assistance of AI. The project consists of four microservices:

A user input service that collects requests in a Kafka topic, which consist of the desired subreddit, along with the dates between which data should be collected
An API polling service that fetches the requests from the user input service, collects the relevant data from the Reddit API, then appends it to a new topic
A sentiment analysis service that analyzes the appended topic from the API polling service using the Python library NLTK; it calculates averages with ksqlDB
A results-displaying service that consumes from a topic with the calculations

Interesting subreddits that Shufan has analyzed for sentiment include gaming forums before and after key releases; crypto and stock trading forums at various meaningful points in time; and sports-related forums both before the season and several games into it.

EPISODE LINKS

Data Enrichment in Existing Data Pipelines Using Confluent Cloud
Watch the video version of this podcast
Kris Jenkins’ Twitter
Streaming Audio Playlist
Join the Confluent Community
Learn more with Kafka tutorials, resources, and guides at Confluent Developer
Live demo: Intro to Event-Driven Microservices with Confluent
Use PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

SEASON 2
Hosted by Tim Berglund, Adi Polak and Viktor Gamov
Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed
Music by Coastal Kites
Artwork by Phil Vo

🎧 Subscribe to Confluent Developer wherever you listen to podcasts.
▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes.
👍 If you enjoyed this, please leave us a rating.
🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

...more

View all episodes

By Confluent

4.8

4343 ratings