Skip to content

Instantly share code, notes, and snippets.

@Ben-Epstein
Created July 17, 2020 17:32
Show Gist options
  • Save Ben-Epstein/594dc5ac1c8316e20ad9c03939d52735 to your computer and use it in GitHub Desktop.
Save Ben-Epstein/594dc5ac1c8316e20ad9c03939d52735 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{"cells":[{"metadata":{},"cell_type":"markdown","source":"# Splice + MLflow: What you need to know\n## Just like the last notebook, maybe an image here of the Splice Logo with a <3 and the MLflow Logo\n<blockquote><p class='quotation'><span style='font-size:15px'>Mlflow allows you to track experiments and share results with teammates easily.<br>At Splice Machine, MLflow is embedded directly into your database (MLManager). This means that all of the configuration is taken care of for you, and <b>everything</b> you track in mlflow is persisted to the database.<br><br>\n MLflow requires the NSDS (or ExtNSDS) as a parameter to connect to the database. If are unfamliar with our NSDS, check out the <a href=\"./7.1 Splice and Spark.ipynb\">previous notebook</a> on using Splice Machine and Spark.<footer>Splice Machine</footer>\n</blockquote>\n\n#### Let's start our Spark Session"},{"metadata":{"trusted":true},"cell_type":"code","source":"# Setup\nfrom pyspark.sql import SparkSession\nfrom splicemachine.spark import PySpliceContext\n\nspark = SparkSession.builder.getOrCreate()\nsplice = PySpliceContext(spark)\n","execution_count":1,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Importing MLflow Support\n<blockquote><p class='quotation'><span style='font-size:15px'>Using MLflow on Splice is as easy as a single import. After imporing, you immediately have access to the <code>mlflow</code> module. <br>You will have access to all of the functions in the standard MLflow API as well as some extra ones that are custom to Splice Machine.<br> You can check out our full <a href='https://pysplice.readthedocs.io/en/dbaas-4100/splicemachine.mlflow_support.html'>documentation</a> for everything available and our <a href=\"https://www.github.com/splicemachine/pysplice\">GitHub</a> repo to raise issues and ask questions. <br>After importing, you can register your Splice Context for access to even more functions.<br><br><footer>Splice Machine</footer>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"# MLFlow Setup\nfrom splicemachine.mlflow_support import *\nmlflow.register_splice_context(splice)","execution_count":3,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Step 0: The MLflow UI\n<blockquote> You can access the MLflow UI in 2 ways:\n <ul>\n <li>From the url at <a href=/mlflow>/mlflow</a></li>\n <li>From the Notebook as an IFrame using the <code>get_mlflow_ui</code> function. You can also pass in an optional experiment ID and/or run ID to open the IFrame directly to your experiment/run.</li>\n </ul>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"from splicemachine.notebook import get_mlflow_ui\nget_mlflow_ui()","execution_count":4,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/0>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":4,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d86fa0c50>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/0\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"## MLflow concepts\n<blockquote>MLflow Tracking is organized around the concept <code>experiments</code> and <code>runs</code>:<br> \n <ul>\n <li>Experiments can be thought of as the problem you are trying to track or solve (ie Performance Testing TPC-C</li>\n <li>Runs are single executions of some piece of code (ie a single full execution of TPC-C with some database configuration). Experiments have multiple runs (1-to-many).</li>\n </ul>\n</blockquote>"},{"metadata":{},"cell_type":"markdown","source":"### Setting an Experiment\n<blockquote>To start an Experiment, you can call <code>mlflow.set_experiment('EXP_NAME')</code> and pass in an experiment name.<br> \n If the Experiment exists, it will be set to the <code>active</code> experiment. Otherwise, mlflow will create the Experiment for you and set it to active.\n\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"help(mlflow.set_experiment)","execution_count":5,"outputs":[{"output_type":"stream","text":"Help on function set_experiment in module mlflow.tracking.fluent:\n\nset_experiment(experiment_name)\n Set given experiment as active experiment. If experiment does not exist, create an experiment\n with provided name.\n \n :param experiment_name: Name of experiment to be activated.\n\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.set_experiment('mlflow_api_demo')","execution_count":6,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"#### View Your [Experiment](/mlflow)"},{"metadata":{"trusted":true},"cell_type":"code","source":"exp_id = mlflow.client.get_experiment_by_name('mlflow_api_demo').experiment_id\nget_mlflow_ui(exp_id)","execution_count":7,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":7,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6dee735b10>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"### Starting a run\n<blockquote>Once you have an Experiment, you can start your run by calling <code>mlflow.start_run(run_name='RUN_NAME')</code> and pass in a run name. You can also pass in the optional <code>tags</code> parameter as a dictionary and store key value pairs associated to the run.<br> \nWhen you start a run, MLFlow (MLManager) automatically logs some information for you:\n <ul>\n <li>Start Date</li>\n <li>Current User</li>\n <li>Run ID</li>\n <li>DB Transaction ID</li>\n <li>Source (where the run came from)</li>\n </ul>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"help(mlflow.start_run)","execution_count":8,"outputs":[{"output_type":"stream","text":"Help on function _start_run in module splicemachine.mlflow_support.mlflow_support:\n\n_start_run(run_id=None, tags=None, experiment_id=None, run_name=None, nested=False)\n Start a new run\n :param tags: a dictionary containing metadata about the current run.\n For example:\n {\n 'team': 'pd',\n 'purpose': 'r&d'\n }\n :param run_name: an optional name for the run to show up in the MLFlow UI\n :param run_id: if you want to reincarnate an existing run, pass in the run id\n :param experiment_id: if you would like to create an experiment/use one for this run\n :param nested: Controls whether run is nested in parent run. True creates a nest run\n\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.start_run(run_name='First_pass_default_settings', tags={'team': 'pd', 'purpose':'performance testing'})","execution_count":16,"outputs":[{"output_type":"execute_result","execution_count":16,"data":{"text/plain":"<ActiveRun: >"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"### Tracking Concepts\n<blockquote>There are 4 main conepts when tracking a run:<br>\n <ul>\n <li><b>Tags</b>: Any key value pair that likely won't be used for comparison between runs (non-measurable items). Only tags can be overwritten</li>\n <li><b>Parameters</b>: Configuration options that were made before starting the run that may have a measurable effect on the outcome</li>\n <li><b>Metrics</b>: The measured outcomes between runs that can be compared. Metrics have an optional <code>step</code> parmeter if you want to track metrics over time for a specific run</li>\n <li><b>Artifacts</b>: Objects (files, images, notebooks, etc) to be associated with a run</li>\n </ul>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"help(mlflow.set_tag)\nprint('---------------------------------------------------------------------------------')\nhelp(mlflow.lp)\nprint('---------------------------------------------------------------------------------')\nhelp(mlflow.lm)\nprint('---------------------------------------------------------------------------------')\nhelp(mlflow.log_artifact)","execution_count":17,"outputs":[{"output_type":"stream","text":"Help on function set_tag in module mlflow.tracking.fluent:\n\nset_tag(key, value)\n Set a tag under the current run. If no run is active, this method will create a\n new active run.\n \n :param key: Tag name (string)\n :param value: Tag value (string, but will be string-ified if not)\n\n---------------------------------------------------------------------------------\nHelp on function _lp in module splicemachine.mlflow_support.mlflow_support:\n\n_lp(key, value)\n Add a shortcut for logging parameters in MLFlow.\n Accessible from mlflow.lp\n :param key: key for the parameter\n :param value: value for the parameter\n\n---------------------------------------------------------------------------------\nHelp on function _lm in module splicemachine.mlflow_support.mlflow_support:\n\n_lm(key, value, step=None)\n Add a shortcut for logging metrics in MLFlow.\n Accessible from mlflow.lm\n :param key: key for the parameter\n :param value: value for the parameter\n\n---------------------------------------------------------------------------------\nHelp on function _log_artifact in module splicemachine.mlflow_support.mlflow_support:\n\n_log_artifact(file_name, name, run_uuid=None)\n Log an artifact for the active run\n :param file_name: (str) the name of the file name to log\n :param name: (str) the name of the run relative name to store the model under\n :param run_uuid: the run uuid of a previous run, if none, defaults to current run\n NOTE: We do not currently support logging directories. If you would like to log a directory, please zip it first\n and log the zip file\n\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.set_tag('teammates', 'carol, daniel')","execution_count":18,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.lp('spark executors', '5')","execution_count":19,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.lm('execution time sec', 25)","execution_count":20,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Setting metrics over \"steps\"\nfor i in range(10):\n mlflow.lm('Build time', i*3, step=i)","execution_count":21,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"get_mlflow_ui(mlflow.current_exp_id(), mlflow.current_run_id())","execution_count":22,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26/runs/8b0159a70a5c>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":22,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d8638c2d0>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26/runs/8b0159a70a5c\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"### End Run\n<blockquote>When you finish a run, you call <code>mlflow.end_run()</code>.<br> You know a run is ended in the MLFlow UI because there is a green check mark next to it</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.end_run()","execution_count":23,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Artifacts\n<blockquote>Artifacts can be any file type. The artifact is serialized as a BLOB and stored in the database. When storing artifacts in the database, files with file extensions such as <code>.txt</code>, <code>.pdf</code>, <code>.yaml</code>, <code>.pdf</code>, <code>.jpeg</code> etc. will be available for preview in the mlflow ui <br>We can use some neat Jupyter tricks like <code>writefile</code> to make artifacts even more useful.\n</blockquote>\n\n#### Write a yaml file"},{"metadata":{"trusted":true},"cell_type":"code","source":"%%writefile my_env.yaml\n\nname: datatest \nchannels:\n- defaults\n- conda-forge\n- ericmjl\ndependencies:\n- python=3.6\n- colorama=0.3.9\n- jupyter=1.0.0\n- ipykernel=4.6.1\n- jupyterlab=0.25.2\n- pytest=3.1.3\n- pytest-cov=2.5.1\n- tinydb=3.3.1\n- pyyaml=3.12\n- pandas-summary=0.0.41\n- environment_kernels=1.1\n- missingno=0.3.7\n","execution_count":24,"outputs":[{"output_type":"stream","text":"Overwriting my_env.yaml\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Write a code snippet"},{"metadata":{"trusted":true},"cell_type":"code","source":"%%writefile harm_mean.py\ndef harm_mean(nums, rnd=4):\n \"\"\"\n Calculates the harmonic mean of n numbers rounded to rnd decimal places\n :param nums: List of numbers\n :param rnd: Number of decimal places to round the result\n \"\"\"\n return round(len(nums)/sum([1/i for i in nums]),rnd)","execution_count":25,"outputs":[{"output_type":"stream","text":"Overwriting harm_mean.py\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"## Put it together\n#### Start a run, log our artifacts, view the results"},{"metadata":{"trusted":true},"cell_type":"code","source":"!jupyter nbconvert --to html '7.2 Splice MLflow Support.ipynb'\nwith mlflow.start_run(run_name='environment_requirements'):\n run_id = mlflow.current_run_id()\n exp_id = mlflow.current_exp_id()\n mlflow.log_artifact('my_env.yaml', name='my_env.yaml')\n mlflow.log_artifact('harm_mean.py', name='harm_mean.py')\n mlflow.log_artifact('7.2 Splice MLflow Support.ipynb', name='training_notebook.ipynb')\n mlflow.log_artifact('7.2 Splice MLflow Support.html', name='training_notebook.html')","execution_count":26,"outputs":[{"output_type":"stream","text":"[NbConvertApp] Converting notebook 7.2 Splice MLflow Support.ipynb to html\n[NbConvertApp] Writing 358977 bytes to 7.2 Splice MLflow Support.html\nSaving artifact of size: 0.349 KB to Splice Machine DB\nSaving artifact of size: 0.328 KB to Splice Machine DB\nSaving artifact of size: 53.573 KB to Splice Machine DB\nSaving artifact of size: 359.035 KB to Splice Machine DB\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Click on one of your artifacts to render the results!"},{"metadata":{"trusted":true,"scrolled":false},"cell_type":"code","source":"get_mlflow_ui(exp_id, run_id)","execution_count":27,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26/runs/26342e253ff0>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":27,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d86390290>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26/runs/26342e253ff0\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"#### Another Artifact Example"},{"metadata":{"trusted":true},"cell_type":"code","source":"import matplotlib.pyplot as plt\nfrom random import random\nwith mlflow.start_run(run_name='my_plot'):\n plt.rcParams.update({\n \"pgf.texsystem\": \"pdflatex\",\n \"pgf.preamble\": [\n r\"\\usepackage[utf8x]{inputenc}\",\n r\"\\usepackage[T1]{fontenc}\",\n r\"\\usepackage{cmbright}\",\n ]\n })\n\n plt.figure(figsize=(4.5, 2.5))\n plt.plot([random()*19 for _ in range(10)])\n plt.text(0.5, 3., \"serif\", family=\"serif\")\n plt.text(0.5, 2., \"monospace\", family=\"monospace\")\n plt.text(2.5, 2., \"sans-serif\", family=\"sans-serif\")\n plt.xlabel(r\"µ is not $\\mu$\")\n plt.tight_layout(.5)\n\n plt.savefig(\"pgf_texsystem.png\")\n mlflow.log_artifact('pgf_texsystem.png', 'results.png')\n display(get_mlflow_ui(mlflow.current_exp_id(), mlflow.current_run_id()))","execution_count":28,"outputs":[{"output_type":"stream","text":"/opt/conda/lib/python3.7/site-packages/matplotlib/backend_bases.py:57: DeprecationWarning: PILLOW_VERSION is deprecated and will be removed in a future release. Use __version__ instead.\n from PIL import PILLOW_VERSION\n","name":"stderr"},{"output_type":"stream","text":"Saving artifact of size: 11.711 KB to Splice Machine DB\n","name":"stdout"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26/runs/7c10c68d11b3>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d85bab550>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26/runs/7c10c68d11b3\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<Figure size 324x180 with 1 Axes>","image/png":"\n"},"metadata":{"needs_background":"light"}}]},{"metadata":{},"cell_type":"markdown","source":"### Context Managers in Runs\n<blockquote>There are 2 Context Managers in MLManager/MLflow. <code>start_run</code> and <code>timer</code>.<br>\nContext managers enable some autologging and cleanup functions for you. To use a Context Manager, prepend the command with the <code>with</code> call append a <code>:</code> after the call, and indent all lines after it.<br>\nAnother great feature is if the run fails for some reason MLflow will track that for you</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"with mlflow.start_run(run_name='run with context manager'):\n mlflow.lp('foo','bar')\n mlflow.lm('score', 92)","execution_count":29,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"with mlflow.start_run(run_name='a run that failed'):\n raise Exception","execution_count":30,"outputs":[{"output_type":"error","ename":"Exception","evalue":"","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mException\u001b[0m Traceback (most recent call last)","\u001b[0;32m<ipython-input-30-583755d4e51d>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mmlflow\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstart_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrun_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'a run that failed'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mException\u001b[0m: "]}]},{"metadata":{"trusted":true},"cell_type":"code","source":"from time import sleep\n# Multiple context managers\nwith mlflow.start_run(run_name='using the timer too'):\n with mlflow.timer('run time'):\n sleep(2)\n print('done!')","execution_count":31,"outputs":[{"output_type":"stream","text":"Starting Code Block run time... Done.\nCode Block run time:\nRan in 2.001 secs\nRan in 0.033 mins\ndone!\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Timers are default stored as parameters, but can also be stored as metrics"},{"metadata":{"trusted":true},"cell_type":"code","source":"from time import sleep\n# Multiple context managers\nwith mlflow.start_run(run_name='using the timer as a metric'):\n with mlflow.timer('run time', param=False):\n sleep(2)\n print('done!')","execution_count":32,"outputs":[{"output_type":"stream","text":"Starting Code Block run time... Done.\nCode Block run time:\nRan in 2.002 secs\nRan in 0.033 mins\ndone!\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"### Nested Runs\n<blockquote>MLFlow supports the concept of <code>nested</code> runs. A nested run is a run that occurs underneath a parent run. In machine learning, this could be used for hyperparmeter tuning (like choosing K in a k-means clustering algorithm). But it can be used for anything you find useful.<br> To use it, simply pass <code>nested=True</code> to the <code>start_run</code> function</blockquote>"},{"metadata":{"trusted":true,"scrolled":false},"cell_type":"code","source":"from random import randint, sample\nfrom time import sleep\nfrom tqdm.notebook import tqdm\nexec_time = [1,3,5,2]\nnum_execs = []\nwith mlflow.start_run(run_name='parent run'):\n for i in tqdm(range(4)):\n with mlflow.start_run(run_name=f'child {i+1}', nested=True):\n with mlflow.timer('run time', param=False):\n sleep(exec_time[i])\n mlflow.set_tag('child', 'yes')\n mlflow.lp('num_executors', i+1)\n num_execs.append(i+1)\n # Plot results\n plt.figure(figsize=(4.5, 2.5))\n plt.plot(num_execs, exec_time)\n\n plt.ylabel('exec time')\n plt.xlabel('num executors')\n plt.tight_layout(.5)\n plt.savefig(\"spark_results.png\")\n mlflow.log_artifact('spark_results.png','spark_results.png')\n display(get_mlflow_ui(mlflow.current_exp_id()))","execution_count":33,"outputs":[{"output_type":"display_data","data":{"text/plain":"HBox(children=(FloatProgress(value=0.0, max=4.0), HTML(value='')))","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"b6847259d96749388c62fea2fd98f271"}},"metadata":{}},{"output_type":"stream","text":"Starting Code Block run time... Done.\nCode Block run time:\nRan in 1.001 secs\nRan in 0.017 mins\nStarting Code Block run time... Done.\nCode Block run time:\nRan in 3.002 secs\nRan in 0.05 mins\nStarting Code Block run time... Done.\nCode Block run time:\nRan in 5.004 secs\nRan in 0.083 mins\nStarting Code Block run time... Done.\nCode Block run time:\nRan in 2.002 secs\nRan in 0.033 mins\n\nSaving artifact of size: 8.37 KB to Splice Machine DB\n","name":"stdout"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d856c2c10>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<Figure size 324x180 with 1 Axes>","image/png":"\n"},"metadata":{"needs_background":"light"}}]},{"metadata":{},"cell_type":"markdown","source":"## Storing ML Models\n<blockquote><p class='quotation'><span style='font-size:15px'>Just like everything else we've tracked so far, tracking ML Models is easy with Splice Machine's MLManager. The <code>log_model</code> and <code>load_model</code> functions are all you need. \n <footer>Splice Machine</footer> \n</blockquote>\n\n#### Let's try it out"},{"metadata":{"trusted":true},"cell_type":"code","source":"from sklearn import svm\nfrom sklearn import datasets\nfrom sklearn.metrics import accuracy_score\n\n# Start a run\nwith mlflow.start_run(run_name='my first model'):\n # Load some sklearn data\n digits = datasets.load_digits()\n\n # Build a simple model\n clf = svm.SVC(gamma=0.001, C=100.)\n # Log parameters to mlflow\n mlflow.lp('gamma', 0.001)\n mlflow.lp('C', 100.0)\n\n # Train the model\n with mlflow.timer('train_time'):\n clf.fit(digits.data[:-1], digits.target[:-1])\n\n # Predict with some data\n preds = clf.predict(digits.data[:-1])\n\n # Measure accuracy\n acc = accuracy_score(digits.target[:-1], preds)\n print('Accuracy:',acc)\n # Log metric to mlflow\n mlflow.lm('accuracy', acc)\n \n # Save model\n mlflow.log_model(clf, 'clf_model')\n rid = mlflow.current_run_id()","execution_count":45,"outputs":[{"output_type":"stream","text":"Starting Code Block train_time... Done.\nCode Block train_time:\nRan in 0.156 secs\nRan in 0.003 mins\nAccuracy: 1.0\nSaving artifact of size: 473.58 KB to Splice Machine DB\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Load our model back and make new predictions"},{"metadata":{"trusted":true,"scrolled":true},"cell_type":"code","source":"loaded_model = mlflow.load_model(run_id=rid, name='clf_model')\ndisplay(loaded_model)\n# Make a new prediction\nnew_data = [ \n 0., 0., 12., 10., 0., 0., 0., 0., 0., 0., 14., 16., 16.,\n 14., 0., 0., 0., 0., 13., 16., 15., 10., 1., 0., 0., 0.,\n 11., 16., 16., 7., 0., 0., 0., 0., 0., 4., 7., 16., 7.,\n 0., 0., 0., 0., 0., 4., 16., 9., 0., 0., 0., 5., 4.,\n 12., 16., 4., 0., 0., 0., 9., 16., 16., 10., 0., 0.\n]\nprint('Prediction on new data:', loaded_model.predict([new_data])[0])","execution_count":56,"outputs":[{"output_type":"display_data","data":{"text/plain":"SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,\n decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',\n max_iter=-1, probability=False, random_state=None, shrinking=True,\n tol=0.001, verbose=False)"},"metadata":{}},{"output_type":"stream","text":"Prediction on new data: 5\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"markdown","source":"# Fantastic!\n<blockquote> \nNow you have all of the tools necessary to start accessing and manipulating your Big Data with Spark and Splice Machine. Again, feel free to check out our <a href=\"https://pysplice.readthedocs.io/en/dbaas-4100/splicemachine.mlflow_support.html\">documentation</a>!<br><br>\n Next Up: <a href='./7.3 Data Exploration.ipynb'>Using MLManager to explore and analyze your data</a>\n<footer>Splice Machine</footer>\n</blockquote>"}],"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.7.6","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"toc":{"nav_menu":{},"number_sections":false,"sideBar":true,"skip_h1_title":false,"base_numbering":1,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{"height":"calc(100% - 180px)","width":"242px","left":"10px","top":"150px"},"toc_section_display":true,"toc_window_display":true}},"nbformat":4,"nbformat_minor":4}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment