Skip to content

Instantly share code, notes, and snippets.

@smurching
smurching / Train.py
Last active July 18, 2022 18:09
Databricks model training and registration with MLflow pipelines
# Databricks notebook source
##################################################################################
# Model Training Notebook
##
# This notebook runs the MLflow Regression Pipeline to train and registers an MLflow model in the model registry.
#
# It's run as part of CI (to integration-test model training logic) and by an automated model training job
# defined under ``databricks-config``
#
# NOTE: In general, we recommend that you do not modify this notebook directly, and instead update data-loading
@smurching
smurching / README.md
Last active June 14, 2022 01:36
Databricks run and await job: shell scripts

This gist contains some example bash scripts for triggering and awaiting a one-time job run using existing Databricks CLI APIs.

Rough edges include:

  1. Parameter substitution into job JSON (need to implement this ourselves)
  2. Writing logic to trigger and await job status
  3. Updatability of shell script logic. Any customers that rely on this script would need to update it themselves, whereas updates could easily be pushed to an existing databricks runs submit CLI command with a --wait option (e.g. updating the default job polling interval). However, since we use the Databricks CLI for all API requests, any security/auth patches can be fetched by updating the version of the CLI used in the script.

#2 can be addressed through a --wait option to databricks runs submit. #1 requires implementing parameter substitution and so may be more work, but also isn't as complex - there isn't any branching logic to test, just that parameters are properly passed through.

@smurching
smurching / create-react-app.sh
Created February 25, 2022 20:21
Create react app getting started
~ ❯ npx create-react-app my-app Py universe Node 16.13.1 03:18:59 PM
Need to install the following packages:
create-react-app
Ok to proceed? (y) y
npm WARN deprecated [email protected]: This version of tar is no longer supported, and will not receive security updates. Please upgrade asap.
Creating a new React app in /Users/sid.murching/my-app.
Installing packages. This might take a couple of minutes.
@smurching
smurching / parent-and-child-runs.py
Last active February 29, 2024 13:30
creating-child-runs-in-mlflow
import mlflow
# There are two ways to create parent/child runs in MLflow.
# (1) The most common way is to use the fluent
# mlflow.start_run API, passing nested=True:
with mlflow.start_run():
num_trials = 10
mlflow.log_param("num_trials", num_trials)
best_loss = 1e100
@smurching
smurching / databricks_run_context_provider.py
Last active May 7, 2020 21:02
OSS MLflow post-run-creation hook
from mlflow.tracking.context.abstract_context import RunContextProvider
from mlflow.utils import databricks_utils
from mlflow.entities import SourceType
from mlflow.utils.mlflow_tags import (
MLFLOW_SOURCE_TYPE,
MLFLOW_SOURCE_NAME,
MLFLOW_DATABRICKS_WEBAPP_URL,
MLFLOW_DATABRICKS_NOTEBOOK_PATH,
MLFLOW_DATABRICKS_NOTEBOOK_ID
)
@smurching
smurching / project-backend.py
Last active April 10, 2020 22:01
MLflow project backend execution flows
### Two different flows
"""
Flow 1: ensure there's a run accessible to the local machine prior to running
the project.
Pros:
1. User guaranteed that they can access the run created for running the project from the machine that triggered
project execution.
@smurching
smurching / 0_api.py
Last active June 1, 2020 05:50
Model deployment API example
"""
Core deployment plugin API methods.
These methods declared under ``mlflow.deployments``, with an analogous ``mlflow deployments`` CLI
"""
from abc import ABC
class BaseDeploymentClient(ABC):
"""
Base class exposing Python model deployment APIs. Plugin implementors should define target-specific
@smurching
smurching / 0_description.md
Last active April 4, 2019 18:18
Migrating the MLflow Python RunData interface to expose metrics, params, tags as dicts

Background

With the next MLflow release, we'd like to update the Python RunData interface to better support querying metrics, params, and tags. In particular, the current RunData interface, which exposes flat lists of Metric, Param, and RunTag instances, falls short in that it:

  1. requires a complicated O(n) scan to find a metric/param/tag by key, and
  2. forces users to then make an additional field-access within a Metric/Param/RunTag instance to get at the actual metric/param/tag value.

Design Decisions & Proposed API

We can address point a) by migrating the metrics, params, tags fields of

# Copyright 2018 Uber Technologies, Inc. All Rights Reserved.
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# Copyright 2018 Uber Technologies, Inc. All Rights Reserved.
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software