Data on Kubernetes Community

Dok Talks #111 - Scheduled Scaling with Dask and Argo Workflows


Listen Later

https://go.dok.community/slack

https://dok.community/

ABSTRACT OF THE TALK

Complex computational workloads in Python are a common sight these days, especially in the context of processing large and complex datasets. Battle-hardened modules such as Numpy, Pandas, and Scikit-Learn can perform low-level tasks, while tools like Dask makes it easy to parallelize these workloads across distributed computational environments. Meanwhile, Argo Workflows offers a Kubernetes-native solution to provisioning cloud resources in Kubernetes and triggering workflows on a regular schedule. Being Kubernetes-native, Argo Workflows also meshes nicely with other Kubernetes tools. This talk discusses the combination of these two worlds by showcasing a set-up for Argo-managed workflows which schedule and automatically scale-out Dask-powered data pipelines in Python.

BIO

Former academic in the field of renewable energy simulation and energy systems analysis. Currently responsible for architecting and maintaining the cloud- and data strategy at ACCURE Battery Intelligence

KEY TAKE-AWAYS FROM THE TALK

Argo Workflows + Dask is a nice combination for data-processing pipelines. There are a a few "gotchyas" to be on the look-out for, but in nevertheless this is still a generally-applicable and powerful combination.

https://github.com/sevberg

...more
View all episodesView all episodes
Download on the App Store

Data on Kubernetes CommunityBy Data on Kubernetes Community


More shows like Data on Kubernetes Community

View all
Software Engineering Daily by Software Engineering Daily

Software Engineering Daily

621 Listeners