The Python Podcast.__init__

Synthetic Data Generation Using Mimesis with Nikita Sobolev


Listen Later

Summary

Most applications require data to operate on in order to function, but sometimes that data is hard to come by, so why not just make it up? Mimesis is a library for randomly generating data of different types, such as names, addresses, and credit card numbers, so that you can use it for testing, anonymizing real data, or for placeholders. This week Nikita Sobolev discusses how the project got started, the challenges that it has posed, and how you can use it in your applications.

Preface
  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to scale up. Go to podcastinit.com/linode to get a $20 credit and launch a new server in under a minute.
  • To get worry-free releases download GoCD, the open source continous delivery server built by Thoughworks. You can use their pipeline modeling and value stream map to build, control and monitor every step from commit to deployment in one place. Go to podcastinit.com/gocd to learn more about their professional support services and enterprise add-ons.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email [email protected])
  • Your host as usual is Tobias Macey and today I’m interviewing Nikita Sobolev about Mimesis, a library for quickly generating synthetic data
  • Interview
    • Introductions
    • How did you get introduced to Python?
    • What is mimesis and how does it compare to other projects such as faker and factory_boy?
      • What was the motivation for creating it?

      • One of the features that is advertised is the speed of Mimesis. What techniques are used to ensure that the data is generated quickly?

      • What are the built in mechanisms for generating data?

        • What options do users have for customizing the types of data that can get generated?

        • What are some of the most complicated providers to write and maintain?

        • What are some of the use cases outside of unit or integration tests where Mimesis could be beneficial?

          • How would you use Mimesis to anonymize data from a production environment to be used for testing?

          • What are the most challenging aspects of maintaining the Mimesis project?

          • What are some of the plans that you have for the future of Mimesis?

          • Keep In Touch
            • sobolevn on GitHub
            • @sobolevn on Twitter
            • Email
            • Picks
              • Tobias
                • Coco

                • Nikita

                  • I Am A Mediocre Developer

                  • Links
                    • Mimesis
                    • Django
                    • Faker
                    • Factory Boy
                    • Internationalization (I18N)
                    • Unicode
                    • Enum
                    • Pipfile
                    • GeoJSON
                    • Mimesis Cloud
                    • Sanic
                    • GraphQL
                    • Impostor Syndrome
                    • Imposter Syndrome Disclaimer: Add this to all of your projects!
                    • Jacob Kaplan-Moss PyCon Keynote
                    • The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

                      ...more
                      View all episodesView all episodes
                      Download on the App Store

                      The Python Podcast.__init__By Tobias Macey

                      • 4.4
                      • 4.4
                      • 4.4
                      • 4.4
                      • 4.4

                      4.4

                      100 ratings


                      More shows like The Python Podcast.__init__

                      View all
                      Freakonomics Radio by Freakonomics Radio + Stitcher

                      Freakonomics Radio

                      32,011 Listeners

                      Odd Lots by Bloomberg

                      Odd Lots

                      1,942 Listeners

                      The Changelog: Software Development, Open Source by Changelog Media

                      The Changelog: Software Development, Open Source

                      289 Listeners

                      Data Skeptic by Kyle Polich

                      Data Skeptic

                      479 Listeners

                      Software Engineering Daily by Software Engineering Daily

                      Software Engineering Daily

                      626 Listeners

                      Talk Python To Me by Michael Kennedy

                      Talk Python To Me

                      585 Listeners

                      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                      Super Data Science: ML & AI Podcast with Jon Krohn

                      302 Listeners

                      Python Bytes by Michael Kennedy and Brian Okken

                      Python Bytes

                      215 Listeners

                      Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

                      Syntax - Tasty Web Development Treats

                      987 Listeners

                      DataFramed by DataCamp

                      DataFramed

                      269 Listeners

                      Practical AI by Practical AI LLC

                      Practical AI

                      210 Listeners

                      The Intelligence from The Economist by The Economist

                      The Intelligence from The Economist

                      2,552 Listeners

                      The Real Python Podcast by Real Python

                      The Real Python Podcast

                      142 Listeners

                      声动早咖啡 by 声动活泼

                      声动早咖啡

                      295 Listeners

                      The Foreign Affairs Interview by Foreign Affairs Magazine

                      The Foreign Affairs Interview

                      445 Listeners