The Python Podcast.__init__

Synthetic Data Generation Using Mimesis with Nikita Sobolev


Listen Later

Summary

Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.

Preface
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • Your host as usual is Tobias Macey and today I’m interviewing Nikita Sobolev about Mimesis, a library for quickly generating synthetic data
  • Interview
    • Introductions
    • How did you get introduced to Python?
    • What is mimesis and how does it compare to other projects such as faker and factory_boy?
      • What was the motivation for creating it?

      • One of the features that is advertised is the speed of Mimesis. What techniques are used to ensure that the data is generated quickly?

      • What are the built in mechanisms for generating data?

        • What options do users have for customizing the types of data that can get generated?

        • What are some of the most complicated providers to write and maintain?

        • What are some of the use cases outside of unit or integration tests where Mimesis could be beneficial?

          • How would you use Mimesis to anonymize data from a production environment to be used for testing?

          • What are the most challenging aspects of maintaining the Mimesis project?

          • What are some of the plans that you have for the future of Mimesis?

          • Keep In Touch
            • sobolevn on GitHub
            • @sobolevn on Twitter
            • Email
            • Picks
              • Tobias
                • Coco

                • Nikita

                  • I Am A Mediocre Developer

                  • Links
                    • Mimesis
                    • Django
                    • Faker
                    • Factory Boy
                    • Internationalization (I18N)
                    • Unicode
                    • Enum
                    • Pipfile
                    • GeoJSON
                    • Mimesis Cloud
                    • Sanic
                    • GraphQL
                    • Impostor Syndrome
                    • Imposter Syndrome Disclaimer: Add this to all of your projects!
                    • Jacob Kaplan-Moss PyCon Keynote
                    • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                      ...more
                      View all episodesView all episodes
                      Download on the App Store

                      The Python Podcast.__init__By Tobias Macey

                      • 4.4
                      • 4.4
                      • 4.4
                      • 4.4
                      • 4.4

                      4.4

                      100 ratings


                      More shows like The Python Podcast.__init__

                      View all
                      The Changelog: Software Development, Open Source by Changelog Media

                      The Changelog: Software Development, Open Source

                      284 Listeners

                      All Ears English Podcast by Lindsay McMahon and Michelle Kaplan

                      All Ears English Podcast

                      2,307 Listeners

                      Data Skeptic by Kyle Polich

                      Data Skeptic

                      475 Listeners

                      Talk Python To Me by Michael Kennedy

                      Talk Python To Me

                      583 Listeners

                      Software Engineering Daily by Software Engineering Daily

                      Software Engineering Daily

                      626 Listeners

                      The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

                      The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

                      438 Listeners

                      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                      Super Data Science: ML & AI Podcast with Jon Krohn

                      296 Listeners

                      Python Bytes by Michael Kennedy and Brian Okken

                      Python Bytes

                      214 Listeners

                      Data Engineering Podcast by Tobias Macey

                      Data Engineering Podcast

                      141 Listeners

                      Machine Learning Guide by OCDevel

                      Machine Learning Guide

                      770 Listeners

                      Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                      Syntax - Tasty Web Development Treats

                      987 Listeners

                      DataFramed by DataCamp

                      DataFramed

                      270 Listeners

                      Practical AI by Practical AI LLC

                      Practical AI

                      187 Listeners

                      The Real Python Podcast by Real Python

                      The Real Python Podcast

                      140 Listeners

                      Business English from All Ears English by Lindsay McMahon

                      Business English from All Ears English

                      73 Listeners