Skip to content

Instantly share code, notes, and snippets.

@leriomaggio
Last active September 4, 2019 13:53
Show Gist options
  • Save leriomaggio/6d3c869ff1286d9105600003574e503f to your computer and use it in GitHub Desktop.
Save leriomaggio/6d3c869ff1286d9105600003574e503f to your computer and use it in GitHub Desktop.
Kubeflow Kale: from Jupyter Notebook to Complex Pipelines

Kubeflow Kale: from Jupyter Notebook to Complex Pipelines

Abstract

In this talk I will present a new solution to automatically scale Jupyter notebooks to complex and reproducibility pipelines based on Kubernetes and KubeFlow.

Description

Nowadays, most of the High Performance Computing (HPC) tasks are carried out in the Cloud, and this is as much as in industry as in research.

Main advantages provided by the adoption of Cloud services include (a) constant up-to-date hardware resources; (b) automated infrastructure setup; (c) simplified resource management. Therefore, new solutions have been recently released to the community (e.g. Kubernetes by Google) providing custom integrations to specifically support the migration of existing Machine/Deep Learning pipelines to the Cloud.

However, a shift towards a complete Cloud-based computational paradigm imposes new challenges in terms of data and model reproducibility, privacy, accountability, and (efficient) resource configuration and monitoring. Moreover, the adoption of these technologies still imposes additional workloads requiring significant software and system engineering expertise (e.g. set up of containerised environments, storage volumes, clusters nodes).

In this talk, I will present kale (/ˈkeɪliː/) - a new Python solution to ease and support ML workloads for HPC in the Cloud is presented.

Kale leverages on the combination of Jupyter notebooks, and Kubernetes/Kubeflow Pipelines (KFP) as core components in order to:

  • (R1) automate the setup and deployment procedures by automating the creation of (distributed) computation environments in the Cloud;

  • (R2) democratise the execution of machine learning models at scale by instrumented and reusable environments;

  • (R3) provide a simple interface (UI, and SDK) to enable researchers to deploy ML models without requiring extensive engineering expertise.

Technical features of Kale as well as open challenges and future development will be presented, along with working examples integrating kale with the complete ML/DL workflows for pipeline reproducibility.

Domains:

  • Jupyter
  • Machine Learning
  • DevOps
  • Parallel Computing/HPC

GitHub:

https://github.com/orgs/kubeflow-kale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment