The Python Podcast.__init__

Polyglot: Multi-Lingual Natural Language Processing with Rami Al-Rfou


Listen Later

Summary

Using computers to analyze text can produce useful and inspirational insights. However, when working with multiple languages the capabilities of existing models are severely limited. In order to help overcome this limitation Rami Al-Rfou built Polyglot. In this episode he explains his motivation for creating a natural language processing library with support for a vast array of languages, how it works, and how you can start using it for your own projects. He also discusses current research on multi-lingual text analytics, how he plans to improve Polyglot in the future, and how it fits in the Python ecosystem.

Preface
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Rami Al-Rfou about Polyglot, a natural language pipeline with support for an impressive amount of languages
  • Interview
    • Introductions
    • How did you get introduced to Python?
    • Can you start by describing what Polyglot is and your reasons for starting the project?
    • What are the types of use cases that Polyglot enables which would be impractical with something such as NLTK or SpaCy?
    • A majority of NLP libraries have a limited set of languages that they support. What is involved in adding support for a given language to a natural language tool?
      • What is involved in adding a new language to Polyglot?
      • Which families of languages are the most challenging to support?

      • What types of operations are supported and how consistently are they supported across languages?

      • How is Polyglot implemented?

      • Is there any capacity for integrating Polyglot with other tools such as SpaCy or Gensim?

      • How much domain knowledge is required to be able to effectively use Polyglot within an application?

      • What are some of the most interesting or unique uses of Polyglot that you have seen?

      • What have been some of the most complex or challenging aspects of building Polyglot?

      • What do you have planned for the future of Polyglot?

      • What are some areas of NLP research that you are excited for?

      • Keep In Touch
        Picks
        • Tobias
          • Duolingo

          • Rami

            • The Wizard and the Prophet: Two Remarkable Scientists and Their Dueling Visions to Shape Tomorrow’s World by Charles C. Mann

            • Links
              • Polyglot
              • Polyglot-NER
              • Jordan
              • NLP (Natural Language Processing)
              • Stony Brook University
              • Arabic
              • Sentiment Analysis
              • Assembly Language
              • C
              • .NET
              • Stack Overflow
              • Deep Learning
              • Word Embedding
              • Wikipedia
              • Word2Vec
              • NLTK (Python Natural Language Toolkit)
              • SpaCy
                • Podcast Episode

                • Gensim

                  • Podcast Episode

                  • Morphology

                  • Morpheme

                  • Transfer Learning

                  • Read The Docs

                  • BERT (Bidirectional Encoder Representations from Transformers)

                  • FastText

                  • data.world

                    • Data Engineering Podcast Episode

                    • Quilt package management for data

                      • Data Engineering Podcast Episode

                      • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                        ...more
                        View all episodesView all episodes
                        Download on the App Store

                        The Python Podcast.__init__By Tobias Macey

                        • 4.4
                        • 4.4
                        • 4.4
                        • 4.4
                        • 4.4

                        4.4

                        100 ratings


                        More shows like The Python Podcast.__init__

                        View all
                        The Changelog: Software Development, Open Source by Changelog Media

                        The Changelog: Software Development, Open Source

                        283 Listeners

                        Data Skeptic by Kyle Polich

                        Data Skeptic

                        482 Listeners

                        Chat With Traders by Tessa Dao

                        Chat With Traders

                        1,979 Listeners

                        Talk Python To Me by Michael Kennedy

                        Talk Python To Me

                        593 Listeners

                        Software Engineering Daily by Software Engineering Daily

                        Software Engineering Daily

                        624 Listeners

                        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                        445 Listeners

                        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                        Super Data Science: ML & AI Podcast with Jon Krohn

                        298 Listeners

                        Python Bytes by Michael Kennedy and Brian Okken

                        Python Bytes

                        215 Listeners

                        Data Engineering Podcast by Tobias Macey

                        Data Engineering Podcast

                        142 Listeners

                        Machine Learning Guide by OCDevel

                        Machine Learning Guide

                        764 Listeners

                        Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                        Syntax - Tasty Web Development Treats

                        982 Listeners

                        DataFramed by DataCamp

                        DataFramed

                        267 Listeners

                        Practical AI by Practical AI LLC

                        Practical AI

                        189 Listeners

                        The Real Python Podcast by Real Python

                        The Real Python Podcast

                        140 Listeners

                        Hard Fork by The New York Times

                        Hard Fork

                        5,426 Listeners