Data Engineering Podcast

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov


Listen Later

Summary
One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same hardware. In this episode the CEO of MemSQL describes how the company and database got started, how it is architected for scale and speed, and how it is being used in production. This was a deep dive on how to build a successful company around a powerful platform, and how that platform simplifies operations for enterprise grade data management.
Preamble
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science.
  • And the team at Metis Machine has shipped a proof-of-concept integration between the Skafos machine learning platform and the Tableau business intelligence tool, meaning that your BI team can now run the machine learning models custom built by your data science team. If you think that sounds awesome (and it is) then join the free webinar with Metis Machine on October 11th at 2 PM ET (11 AM PT). Metis Machine will walk through the architecture of the extension, demonstrate its capabilities in real time, and illustrate the use case for empowering your BI team to modify and run machine learning models directly from Tableau. Go to metismachine.com/webinars now to register.
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • Your host is Tobias Macey and today I’m interviewing Nikita Shamgunov about MemSQL, a newSQL database built for simultaneous transactional and analytic workloads
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing what MemSQL is and how the product and business first got started?
  • What are the typical use cases for customers running MemSQL?
  • What are the benefits of integrating the ingestion pipeline with the database engine? 
    • What are some typical ways that the ingest capability is leveraged by customers?
  • How is MemSQL architected and how has the internal design evolved from when you first started working on it?
    • Where does it fall on the axes of the CAP theorem?
    • How much processing overhead is involved in the conversion from the column oriented data stored on disk to the row oriented data stored in memory?
    • Can you describe the lifecycle of a write transaction?
  • Can you discuss the techniques that are used in MemSQL to optimize for speed and overall system performance?
    • How do you mitigate the impact of network latency throughout the cluster during query planning and execution?
  • How much of the implementation of MemSQL is using custom built code vs. open source projects?
  • What are some of the common difficulties that your customers encounter when building on top of or migrating to MemSQL?
  • What have been some of the most challenging aspects of building and growing the technical and business implementation of MemSQL?
  • When is MemSQL the wrong choice for a data platform?
  • What do you have planned for the future of MemSQL?

Contact Info
  • @nikitashamgunov on Twitter
  • LinkedIn
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
  • MemSQL
  • NewSQL
  • Microsoft SQL Server
  • St. Petersburg University of Fine Mechanics And Optics
  • C
  • C++
  • In-Memory Database
  • RAM (Random Access Memory)
  • Flash Storage
  • Oracle DB
  • PostgreSQL
    • Podcast Episode
  • Kafka
  • Kinesis
  • Wealth Management
  • Data Warehouse
  • ODBC
  • S3
  • HDFS
  • Avro
  • Parquet
  • Data Serialization Podcast Episode
  • Broadcast Join
  • Shuffle Join
  • CAP Theorem
  • Apache Arrow
  • LZ4
  • S2 Geospatial Library
  • Sybase
  • SAP Hana
  • Kubernetes

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
...more
View all episodesView all episodes
Download on the App Store

Data Engineering PodcastBy Tobias Macey

  • 4.5
  • 4.5
  • 4.5
  • 4.5
  • 4.5

4.5

142 ratings


More shows like Data Engineering Podcast

View all
The Changelog: Software Development, Open Source by Changelog Media

The Changelog: Software Development, Open Source

289 Listeners

Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

624 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

583 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

343 Listeners

Practical AI by Practical AI LLC

Practical AI

204 Listeners

AWS Podcast by Amazon Web Services

AWS Podcast

205 Listeners

Last Week in AI by Skynet Today

Last Week in AI

305 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

523 Listeners

The Data Engineering Show by The Firebolt Data Bros

The Data Engineering Show

8 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

129 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

92 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

227 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

633 Listeners

AI + a16z by a16z

AI + a16z

36 Listeners