The Real Python Podcast

What Is Data Engineering and Researching 10 Million Jupyter Notebooks


Listen Later

Are you familiar with the role data engineers play in the modern landscape of data science and Python? Data engineering is a sub-discipline that focuses on the transportation, transformation, and storage of data. This week on the show, David Amos is back, and he’s brought another batch of PyCoder’s Weekly articles and projects.

Along with the Real Python article on data engineering, we talk about a project where researchers downloaded 10 million Jupyter notebooks from Github to gather insights about the current state of data science technology.

We also discuss an article about validating data in Python with the package Cerberus. And this led us to a conversation about a set of coding challenges from Advent of Code.

We also cover several other articles and projects from the Python community including, building my own chess engine, the visual guide to NumPy, a free and open-source alternative to SAP, a library for working with STL files and 3D objects, and is Python really a bottleneck?

Course Spotlight: Building With Django REST Framework

This course will get you ready to build with Django REST Framework. The Django REST framework (DRF) is a toolkit built on top of the Django web framework that reduces the amount of code you need to write to create REST interfaces.

Topics:

  • 00:00:00 – Introduction
  • 00:01:51 – What Is Data Engineering and Is It Right for You?
  • 00:12:07 – Building My Own Chess Engine
  • 00:17:52 – We Downloaded 10,000,000 Jupyter Notebooks From Github: This Is What We Learned
  • 00:28:12 – Video Course Spotlight
  • 00:29:20 – Is Python Really a Bottleneck?
  • 00:34:01 – Validating Data in Python With Cerberus
  • 00:39:04 – NumPy Illustrated: The Visual Guide to NumPy
  • 00:42:54 – erpnext: Free and Open Source Alternative to SAP
  • 00:48:49 – numpy-stl: Library for Working With STL Files and 3D Objects
  • 00:54:54 – Thanks and goodbye
  • Show Links:

    What Is Data Engineering and Is It Right for You? — In this article, you’ll get an overview of the discipline of data engineering. You’ll learn what is and isn’t part of a data engineer’s job, who data engineers work with, and why data engineers play a crucial role in many industries.

    Building My Own Chess Engine — Writing your own chess engine is a great way to explore computational complexity and combinatorial aspects of programming. Not to mention it’s pretty fun! Follow along with this reflection on how one coder created his own Chess engine from scratch.

    We Downloaded 10,000,000 Jupyter Notebooks From Github: This Is What We Learned — The JetBrains Datalore team downloaded ten million Jupyter Notebooks and analyzed them to determine things like which languages were the most popular, what kinds of content are in notebook cells, and how consistently notebooks can be reproduced. It’s a fascinating look into trends in data science technology!

    Is Python Really a Bottleneck? — Python is slow. From one perspective, that is. But what are the true bottlenecks in the data engineering/data processing space, and how does Python compare to other technologies when those factors are considered?

    Validating Data in Python With Cerberus — Thanks to an Advent of Code challenge, author Hector Castro was exposed to the Cerberus Python package for data validation. Get a quick introduction to Cerberus and see Hector’s solution to an Advent of Code challenge in this quick-yet-informative read.

    NumPy Illustrated: The Visual Guide to NumPy — This illustrated guide to NumPy is a great way to learn NumPy or brush up on the package. Full of great visual aides, this tutorial covers all the basics and more!

    Projects:

    • erpnext: Free and Open Source Alternative to SAP
    • numpy-stl: Library for Working With STL Files and 3D Objects
    • Additional Links:

      • Range - Why Generalists Triumph In a Specialized World: David Epstein
      • Shannon number: Wikipedia article
      • Apple’s open source chess engine minimum response times: Twitter thread
      • Advent of Code
      • cerberus: Lightweight and Extensible Data Validation Library for Python
      • Cerberus - Greek Mythology: Wikipedia article
      • A Visual Intro to NumPy and Data Representation
      • Generating STL Models With Python
      • Level up your Python skills with our expert-led courses:

        • Getting Started With Django: Building a Portfolio App
        • Building HTTP APIs With Django REST Framework
        • Histogram Plotting in Python: NumPy, Matplotlib, Pandas & Seaborn
        • Support the podcast & join our community of Pythonistas

          ...more
          View all episodesView all episodes
          Download on the App Store

          The Real Python PodcastBy Real Python

          • 4.7
          • 4.7
          • 4.7
          • 4.7
          • 4.7

          4.7

          136 ratings


          More shows like The Real Python Podcast

          View all
          Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

          Software Engineering Radio - the podcast for professional software developers

          272 Listeners

          The Changelog: Software Development, Open Source by Changelog Media

          The Changelog: Software Development, Open Source

          283 Listeners

          Thoughtworks Technology Podcast by Thoughtworks

          Thoughtworks Technology Podcast

          41 Listeners

          Talk Python To Me by Michael Kennedy

          Talk Python To Me

          592 Listeners

          Software Engineering Daily by Software Engineering Daily

          Software Engineering Daily

          624 Listeners

          Soft Skills Engineering by Jamison Dance and Dave Smith

          Soft Skills Engineering

          269 Listeners

          Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

          Super Data Science: ML & AI Podcast with Jon Krohn

          298 Listeners

          Python Bytes by Michael Kennedy and Brian Okken

          Python Bytes

          213 Listeners

          Data Engineering Podcast by Tobias Macey

          Data Engineering Podcast

          142 Listeners

          Syntax - Tasty Web Development Treats by Wes Bos & Scott Tolinski - Full Stack JavaScript Web Developers

          Syntax - Tasty Web Development Treats

          982 Listeners

          DataFramed by DataCamp

          DataFramed

          266 Listeners

          Kubernetes Podcast from Google by Abdel Sghiouar, Kaslin Fields

          Kubernetes Podcast from Google

          181 Listeners

          Practical AI by Practical AI LLC

          Practical AI

          190 Listeners

          The Stack Overflow Podcast by The Stack Overflow Podcast

          The Stack Overflow Podcast

          64 Listeners

          The Pragmatic Engineer by Gergely Orosz

          The Pragmatic Engineer

          52 Listeners