DataTalks.Club

Build Your Own Data Pipeline - Andreas Kretz


Listen Later

We talked about:

  • Andreas’s background
  • Why data engineering is becoming more popular
  • Who to hire first – a data engineer or a data scientist?
  • How can I, as a data scientist, learn to build pipelines?
  • Don’t use too many tools
  • What is a data pipeline and why do we need it?
  • What is ingestion?
  • Can just one person build a data pipeline?
  • Approaches to building data pipelines for data scientists
  • Processing frameworks
  • Common setup for data pipelines — car price prediction
  • Productionizing the model with the help of a data pipeline
  • Scheduling
  • Orchestration
  • Start simple
  • Learning DevOps to implement data pipelines
  • How to choose the right tool
  • Are Hadoop, Docker, Cloud necessary for a first job/internship?
  • Is Hadoop still relevant or necessary?
  • Data engineering academy
  • How to pick up Cloud skills
  • Avoid huge datasets when learning
  • Convincing your employer to do data science
  • How to find Andreas

  • Links:

    • LinkedIn: https://www.linkedin.com/in/andreas-kretz
    • Data engieering cookbook: https://cookbook.learndataengineering.com/
    • Course: https://learndataengineering.com/

    • Join DataTalks.Club: https://datatalks.club/slack.html

      Our events: https://datatalks.club/events.html

      ...more
      View all episodesView all episodes
      Download on the App Store

      DataTalks.ClubBy DataTalks.Club

      • 5
      • 5
      • 5
      • 5
      • 5

      5

      7 ratings


      More shows like DataTalks.Club

      View all
      Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

      Software Engineering Radio - the podcast for professional software developers

      272 Listeners

      TED Talks Daily by TED

      TED Talks Daily

      11,110 Listeners

      The Changelog: Software Development, Open Source by Changelog Media

      The Changelog: Software Development, Open Source

      284 Listeners

      Freakonomics Radio by Freakonomics Radio + Stitcher

      Freakonomics Radio

      32,055 Listeners

      Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

      Super Data Science: ML & AI Podcast with Jon Krohn

      295 Listeners

      Machine Learning Guide by OCDevel

      Machine Learning Guide

      765 Listeners

      DataFramed by DataCamp

      DataFramed

      266 Listeners

      Learning Bayesian Statistics by Alexandre Andorra

      Learning Bayesian Statistics

      67 Listeners

      The Real Python Podcast by Real Python

      The Real Python Podcast

      139 Listeners

      声动早咖啡 by 声动活泼

      声动早咖啡

      261 Listeners