Data Engineering Podcast

Discover And De-Clutter Your Unstructured Data With Aparavi


Listen Later

Summary

Unstructured data takes many forms in an organization. From a data engineering perspective that often means things like JSON files, audio or video recordings, images, etc. Another category of unstructured data that every business deals with is PDFs, Word documents, workstation backups, and countless other types of information. Aparavi was created to tame the sprawl of information across machines, datacenters, and clouds so that you can reduce the amount of duplicate data and save time and money on managing your data assets. In this episode Rod Christensen shares the story behind Aparavi and how you can use it to cut costs and gain value for the long tail of your unstructured data.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
  • This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl
  • RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder.
  • Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
  • Your host is Tobias Macey and today I’m interviewing Rod Christensen about Aparavi, a platform designed to find and unlock the value of data, no matter where it lives
  • Interview
    • Introduction
    • How did you get involved in the area of data management?
    • Can you describe what Aparavi is and the story behind it?
    • Who are the target customers for Aparavi and how does that inform your product roadmap and messaging?
    • What are some of the insights that you are able to provide about an organization’s data?
      • Once you have generated those insights, what are some of the actions that they typically catalyze?
      • What are the types of storage and data systems that you integrate with?
      • Can you describe how the Aparavi platform is implemented?
        • How do the trends in cloud storage and data systems influence the ways that you evolve the system?
        • Can you describe a typical workflow for an organization using Aparavi?
        • What are the mechanisms that you use for categorizing data assets?
          • What are the interfaces that you provide for data owners and operators to provide heuristics to customize classification/cataloging of data?
          • How can teams integrate with Aparavi to expose its insights to other tools for uses such as automation or data catalogs?
          • What are the most interesting, innovative, or unexpected ways that you have seen Aparavi used?
          • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Aparavi?
          • When is Aparavi the wrong choice?
          • What do you have planned for the future of Aparavi?
          • Contact Info
            • LinkedIn
            • Parting Question
              • From your perspective, what is the biggest gap in the tooling or technology for data management today?
              • Closing Announcements
                • Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
                • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
                • If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.
                • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
                • Links
                  • Aparavi
                  • SHA-512
                  • The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

                    Support Data Engineering Podcast

                    ...more
                    View all episodesView all episodes
                    Download on the App Store

                    Data Engineering PodcastBy Tobias Macey

                    • 4.6
                    • 4.6
                    • 4.6
                    • 4.6
                    • 4.6

                    4.6

                    134 ratings


                    More shows like Data Engineering Podcast

                    View all
                    Software Engineering Radio - the podcast for professional software developers by se-radio@computer.org

                    Software Engineering Radio - the podcast for professional software developers

                    262 Listeners

                    The Changelog: Software Development, Open Source by Changelog Media

                    The Changelog: Software Development, Open Source

                    286 Listeners

                    The Cloudcast by Massive Studios

                    The Cloudcast

                    154 Listeners

                    Thoughtworks Technology Podcast by Thoughtworks

                    Thoughtworks Technology Podcast

                    42 Listeners

                    Data Skeptic by Kyle Polich

                    Data Skeptic

                    474 Listeners

                    Talk Python To Me by Michael Kennedy

                    Talk Python To Me

                    584 Listeners

                    Software Engineering Daily by Software Engineering Daily

                    Software Engineering Daily

                    630 Listeners

                    AWS Podcast by Amazon Web Services

                    AWS Podcast

                    200 Listeners

                    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

                    Super Data Science: ML & AI Podcast with Jon Krohn

                    293 Listeners

                    Python Bytes by Michael Kennedy and Brian Okken

                    Python Bytes

                    212 Listeners

                    DataFramed by DataCamp

                    DataFramed

                    270 Listeners

                    Practical AI by Practical AI LLC

                    Practical AI

                    196 Listeners

                    The Stack Overflow Podcast by The Stack Overflow Podcast

                    The Stack Overflow Podcast

                    63 Listeners

                    The Real Python Podcast by Real Python

                    The Real Python Podcast

                    137 Listeners

                    Latent Space: The AI Engineer Podcast by swyx + Alessio

                    Latent Space: The AI Engineer Podcast

                    64 Listeners