Created
July 17, 2020 17:32
-
-
Save Ben-Epstein/594dc5ac1c8316e20ad9c03939d52735 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"cells":[{"metadata":{},"cell_type":"markdown","source":"# Splice + MLflow: What you need to know\n## Just like the last notebook, maybe an image here of the Splice Logo with a <3 and the MLflow Logo\n<blockquote><p class='quotation'><span style='font-size:15px'>Mlflow allows you to track experiments and share results with teammates easily.<br>At Splice Machine, MLflow is embedded directly into your database (MLManager). This means that all of the configuration is taken care of for you, and <b>everything</b> you track in mlflow is persisted to the database.<br><br>\n MLflow requires the NSDS (or ExtNSDS) as a parameter to connect to the database. If are unfamliar with our NSDS, check out the <a href=\"./7.1 Splice and Spark.ipynb\">previous notebook</a> on using Splice Machine and Spark.<footer>Splice Machine</footer>\n</blockquote>\n\n#### Let's start our Spark Session"},{"metadata":{"trusted":true},"cell_type":"code","source":"# Setup\nfrom pyspark.sql import SparkSession\nfrom splicemachine.spark import PySpliceContext\n\nspark = SparkSession.builder.getOrCreate()\nsplice = PySpliceContext(spark)\n","execution_count":1,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Importing MLflow Support\n<blockquote><p class='quotation'><span style='font-size:15px'>Using MLflow on Splice is as easy as a single import. After imporing, you immediately have access to the <code>mlflow</code> module. <br>You will have access to all of the functions in the standard MLflow API as well as some extra ones that are custom to Splice Machine.<br> You can check out our full <a href='https://pysplice.readthedocs.io/en/dbaas-4100/splicemachine.mlflow_support.html'>documentation</a> for everything available and our <a href=\"https://www.github.com/splicemachine/pysplice\">GitHub</a> repo to raise issues and ask questions. <br>After importing, you can register your Splice Context for access to even more functions.<br><br><footer>Splice Machine</footer>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"# MLFlow Setup\nfrom splicemachine.mlflow_support import *\nmlflow.register_splice_context(splice)","execution_count":3,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Step 0: The MLflow UI\n<blockquote> You can access the MLflow UI in 2 ways:\n <ul>\n <li>From the url at <a href=/mlflow>/mlflow</a></li>\n <li>From the Notebook as an IFrame using the <code>get_mlflow_ui</code> function. You can also pass in an optional experiment ID and/or run ID to open the IFrame directly to your experiment/run.</li>\n </ul>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"from splicemachine.notebook import get_mlflow_ui\nget_mlflow_ui()","execution_count":4,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/0>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":4,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d86fa0c50>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/0\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"## MLflow concepts\n<blockquote>MLflow Tracking is organized around the concept <code>experiments</code> and <code>runs</code>:<br> \n <ul>\n <li>Experiments can be thought of as the problem you are trying to track or solve (ie Performance Testing TPC-C</li>\n <li>Runs are single executions of some piece of code (ie a single full execution of TPC-C with some database configuration). Experiments have multiple runs (1-to-many).</li>\n </ul>\n</blockquote>"},{"metadata":{},"cell_type":"markdown","source":"### Setting an Experiment\n<blockquote>To start an Experiment, you can call <code>mlflow.set_experiment('EXP_NAME')</code> and pass in an experiment name.<br> \n If the Experiment exists, it will be set to the <code>active</code> experiment. Otherwise, mlflow will create the Experiment for you and set it to active.\n\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"help(mlflow.set_experiment)","execution_count":5,"outputs":[{"output_type":"stream","text":"Help on function set_experiment in module mlflow.tracking.fluent:\n\nset_experiment(experiment_name)\n Set given experiment as active experiment. If experiment does not exist, create an experiment\n with provided name.\n \n :param experiment_name: Name of experiment to be activated.\n\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.set_experiment('mlflow_api_demo')","execution_count":6,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"#### View Your [Experiment](/mlflow)"},{"metadata":{"trusted":true},"cell_type":"code","source":"exp_id = mlflow.client.get_experiment_by_name('mlflow_api_demo').experiment_id\nget_mlflow_ui(exp_id)","execution_count":7,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":7,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6dee735b10>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"### Starting a run\n<blockquote>Once you have an Experiment, you can start your run by calling <code>mlflow.start_run(run_name='RUN_NAME')</code> and pass in a run name. You can also pass in the optional <code>tags</code> parameter as a dictionary and store key value pairs associated to the run.<br> \nWhen you start a run, MLFlow (MLManager) automatically logs some information for you:\n <ul>\n <li>Start Date</li>\n <li>Current User</li>\n <li>Run ID</li>\n <li>DB Transaction ID</li>\n <li>Source (where the run came from)</li>\n </ul>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"help(mlflow.start_run)","execution_count":8,"outputs":[{"output_type":"stream","text":"Help on function _start_run in module splicemachine.mlflow_support.mlflow_support:\n\n_start_run(run_id=None, tags=None, experiment_id=None, run_name=None, nested=False)\n Start a new run\n :param tags: a dictionary containing metadata about the current run.\n For example:\n {\n 'team': 'pd',\n 'purpose': 'r&d'\n }\n :param run_name: an optional name for the run to show up in the MLFlow UI\n :param run_id: if you want to reincarnate an existing run, pass in the run id\n :param experiment_id: if you would like to create an experiment/use one for this run\n :param nested: Controls whether run is nested in parent run. True creates a nest run\n\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.start_run(run_name='First_pass_default_settings', tags={'team': 'pd', 'purpose':'performance testing'})","execution_count":16,"outputs":[{"output_type":"execute_result","execution_count":16,"data":{"text/plain":"<ActiveRun: >"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"### Tracking Concepts\n<blockquote>There are 4 main conepts when tracking a run:<br>\n <ul>\n <li><b>Tags</b>: Any key value pair that likely won't be used for comparison between runs (non-measurable items). Only tags can be overwritten</li>\n <li><b>Parameters</b>: Configuration options that were made before starting the run that may have a measurable effect on the outcome</li>\n <li><b>Metrics</b>: The measured outcomes between runs that can be compared. Metrics have an optional <code>step</code> parmeter if you want to track metrics over time for a specific run</li>\n <li><b>Artifacts</b>: Objects (files, images, notebooks, etc) to be associated with a run</li>\n </ul>\n</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"help(mlflow.set_tag)\nprint('---------------------------------------------------------------------------------')\nhelp(mlflow.lp)\nprint('---------------------------------------------------------------------------------')\nhelp(mlflow.lm)\nprint('---------------------------------------------------------------------------------')\nhelp(mlflow.log_artifact)","execution_count":17,"outputs":[{"output_type":"stream","text":"Help on function set_tag in module mlflow.tracking.fluent:\n\nset_tag(key, value)\n Set a tag under the current run. If no run is active, this method will create a\n new active run.\n \n :param key: Tag name (string)\n :param value: Tag value (string, but will be string-ified if not)\n\n---------------------------------------------------------------------------------\nHelp on function _lp in module splicemachine.mlflow_support.mlflow_support:\n\n_lp(key, value)\n Add a shortcut for logging parameters in MLFlow.\n Accessible from mlflow.lp\n :param key: key for the parameter\n :param value: value for the parameter\n\n---------------------------------------------------------------------------------\nHelp on function _lm in module splicemachine.mlflow_support.mlflow_support:\n\n_lm(key, value, step=None)\n Add a shortcut for logging metrics in MLFlow.\n Accessible from mlflow.lm\n :param key: key for the parameter\n :param value: value for the parameter\n\n---------------------------------------------------------------------------------\nHelp on function _log_artifact in module splicemachine.mlflow_support.mlflow_support:\n\n_log_artifact(file_name, name, run_uuid=None)\n Log an artifact for the active run\n :param file_name: (str) the name of the file name to log\n :param name: (str) the name of the run relative name to store the model under\n :param run_uuid: the run uuid of a previous run, if none, defaults to current run\n NOTE: We do not currently support logging directories. If you would like to log a directory, please zip it first\n and log the zip file\n\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.set_tag('teammates', 'carol, daniel')","execution_count":18,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.lp('spark executors', '5')","execution_count":19,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.lm('execution time sec', 25)","execution_count":20,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Setting metrics over \"steps\"\nfor i in range(10):\n mlflow.lm('Build time', i*3, step=i)","execution_count":21,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"get_mlflow_ui(mlflow.current_exp_id(), mlflow.current_run_id())","execution_count":22,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26/runs/8b0159a70a5c>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":22,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d8638c2d0>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26/runs/8b0159a70a5c\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"### End Run\n<blockquote>When you finish a run, you call <code>mlflow.end_run()</code>.<br> You know a run is ended in the MLFlow UI because there is a green check mark next to it</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"mlflow.end_run()","execution_count":23,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Artifacts\n<blockquote>Artifacts can be any file type. The artifact is serialized as a BLOB and stored in the database. When storing artifacts in the database, files with file extensions such as <code>.txt</code>, <code>.pdf</code>, <code>.yaml</code>, <code>.pdf</code>, <code>.jpeg</code> etc. will be available for preview in the mlflow ui <br>We can use some neat Jupyter tricks like <code>writefile</code> to make artifacts even more useful.\n</blockquote>\n\n#### Write a yaml file"},{"metadata":{"trusted":true},"cell_type":"code","source":"%%writefile my_env.yaml\n\nname: datatest \nchannels:\n- defaults\n- conda-forge\n- ericmjl\ndependencies:\n- python=3.6\n- colorama=0.3.9\n- jupyter=1.0.0\n- ipykernel=4.6.1\n- jupyterlab=0.25.2\n- pytest=3.1.3\n- pytest-cov=2.5.1\n- tinydb=3.3.1\n- pyyaml=3.12\n- pandas-summary=0.0.41\n- environment_kernels=1.1\n- missingno=0.3.7\n","execution_count":24,"outputs":[{"output_type":"stream","text":"Overwriting my_env.yaml\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Write a code snippet"},{"metadata":{"trusted":true},"cell_type":"code","source":"%%writefile harm_mean.py\ndef harm_mean(nums, rnd=4):\n \"\"\"\n Calculates the harmonic mean of n numbers rounded to rnd decimal places\n :param nums: List of numbers\n :param rnd: Number of decimal places to round the result\n \"\"\"\n return round(len(nums)/sum([1/i for i in nums]),rnd)","execution_count":25,"outputs":[{"output_type":"stream","text":"Overwriting harm_mean.py\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"## Put it together\n#### Start a run, log our artifacts, view the results"},{"metadata":{"trusted":true},"cell_type":"code","source":"!jupyter nbconvert --to html '7.2 Splice MLflow Support.ipynb'\nwith mlflow.start_run(run_name='environment_requirements'):\n run_id = mlflow.current_run_id()\n exp_id = mlflow.current_exp_id()\n mlflow.log_artifact('my_env.yaml', name='my_env.yaml')\n mlflow.log_artifact('harm_mean.py', name='harm_mean.py')\n mlflow.log_artifact('7.2 Splice MLflow Support.ipynb', name='training_notebook.ipynb')\n mlflow.log_artifact('7.2 Splice MLflow Support.html', name='training_notebook.html')","execution_count":26,"outputs":[{"output_type":"stream","text":"[NbConvertApp] Converting notebook 7.2 Splice MLflow Support.ipynb to html\n[NbConvertApp] Writing 358977 bytes to 7.2 Splice MLflow Support.html\nSaving artifact of size: 0.349 KB to Splice Machine DB\nSaving artifact of size: 0.328 KB to Splice Machine DB\nSaving artifact of size: 53.573 KB to Splice Machine DB\nSaving artifact of size: 359.035 KB to Splice Machine DB\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Click on one of your artifacts to render the results!"},{"metadata":{"trusted":true,"scrolled":false},"cell_type":"code","source":"get_mlflow_ui(exp_id, run_id)","execution_count":27,"outputs":[{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26/runs/26342e253ff0>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"execute_result","execution_count":27,"data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d86390290>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26/runs/26342e253ff0\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"#### Another Artifact Example"},{"metadata":{"trusted":true},"cell_type":"code","source":"import matplotlib.pyplot as plt\nfrom random import random\nwith mlflow.start_run(run_name='my_plot'):\n plt.rcParams.update({\n \"pgf.texsystem\": \"pdflatex\",\n \"pgf.preamble\": [\n r\"\\usepackage[utf8x]{inputenc}\",\n r\"\\usepackage[T1]{fontenc}\",\n r\"\\usepackage{cmbright}\",\n ]\n })\n\n plt.figure(figsize=(4.5, 2.5))\n plt.plot([random()*19 for _ in range(10)])\n plt.text(0.5, 3., \"serif\", family=\"serif\")\n plt.text(0.5, 2., \"monospace\", family=\"monospace\")\n plt.text(2.5, 2., \"sans-serif\", family=\"sans-serif\")\n plt.xlabel(r\"µ is not $\\mu$\")\n plt.tight_layout(.5)\n\n plt.savefig(\"pgf_texsystem.png\")\n mlflow.log_artifact('pgf_texsystem.png', 'results.png')\n display(get_mlflow_ui(mlflow.current_exp_id(), mlflow.current_run_id()))","execution_count":28,"outputs":[{"output_type":"stream","text":"/opt/conda/lib/python3.7/site-packages/matplotlib/backend_bases.py:57: DeprecationWarning: PILLOW_VERSION is deprecated and will be removed in a future release. Use __version__ instead.\n from PIL import PILLOW_VERSION\n","name":"stderr"},{"output_type":"stream","text":"Saving artifact of size: 11.711 KB to Splice Machine DB\n","name":"stdout"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26/runs/7c10c68d11b3>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d85bab550>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26/runs/7c10c68d11b3\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<Figure size 324x180 with 1 Axes>","image/png":"\n"},"metadata":{"needs_background":"light"}}]},{"metadata":{},"cell_type":"markdown","source":"### Context Managers in Runs\n<blockquote>There are 2 Context Managers in MLManager/MLflow. <code>start_run</code> and <code>timer</code>.<br>\nContext managers enable some autologging and cleanup functions for you. To use a Context Manager, prepend the command with the <code>with</code> call append a <code>:</code> after the call, and indent all lines after it.<br>\nAnother great feature is if the run fails for some reason MLflow will track that for you</blockquote>"},{"metadata":{"trusted":true},"cell_type":"code","source":"with mlflow.start_run(run_name='run with context manager'):\n mlflow.lp('foo','bar')\n mlflow.lm('score', 92)","execution_count":29,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"with mlflow.start_run(run_name='a run that failed'):\n raise Exception","execution_count":30,"outputs":[{"output_type":"error","ename":"Exception","evalue":"","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mException\u001b[0m Traceback (most recent call last)","\u001b[0;32m<ipython-input-30-583755d4e51d>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mmlflow\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstart_run\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrun_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'a run that failed'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;31mException\u001b[0m: "]}]},{"metadata":{"trusted":true},"cell_type":"code","source":"from time import sleep\n# Multiple context managers\nwith mlflow.start_run(run_name='using the timer too'):\n with mlflow.timer('run time'):\n sleep(2)\n print('done!')","execution_count":31,"outputs":[{"output_type":"stream","text":"Starting Code Block run time... Done.\nCode Block run time:\nRan in 2.001 secs\nRan in 0.033 mins\ndone!\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Timers are default stored as parameters, but can also be stored as metrics"},{"metadata":{"trusted":true},"cell_type":"code","source":"from time import sleep\n# Multiple context managers\nwith mlflow.start_run(run_name='using the timer as a metric'):\n with mlflow.timer('run time', param=False):\n sleep(2)\n print('done!')","execution_count":32,"outputs":[{"output_type":"stream","text":"Starting Code Block run time... Done.\nCode Block run time:\nRan in 2.002 secs\nRan in 0.033 mins\ndone!\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"### Nested Runs\n<blockquote>MLFlow supports the concept of <code>nested</code> runs. A nested run is a run that occurs underneath a parent run. In machine learning, this could be used for hyperparmeter tuning (like choosing K in a k-means clustering algorithm). But it can be used for anything you find useful.<br> To use it, simply pass <code>nested=True</code> to the <code>start_run</code> function</blockquote>"},{"metadata":{"trusted":true,"scrolled":false},"cell_type":"code","source":"from random import randint, sample\nfrom time import sleep\nfrom tqdm.notebook import tqdm\nexec_time = [1,3,5,2]\nnum_execs = []\nwith mlflow.start_run(run_name='parent run'):\n for i in tqdm(range(4)):\n with mlflow.start_run(run_name=f'child {i+1}', nested=True):\n with mlflow.timer('run time', param=False):\n sleep(exec_time[i])\n mlflow.set_tag('child', 'yes')\n mlflow.lp('num_executors', i+1)\n num_execs.append(i+1)\n # Plot results\n plt.figure(figsize=(4.5, 2.5))\n plt.plot(num_execs, exec_time)\n\n plt.ylabel('exec time')\n plt.xlabel('num executors')\n plt.tight_layout(.5)\n plt.savefig(\"spark_results.png\")\n mlflow.log_artifact('spark_results.png','spark_results.png')\n display(get_mlflow_ui(mlflow.current_exp_id()))","execution_count":33,"outputs":[{"output_type":"display_data","data":{"text/plain":"HBox(children=(FloatProgress(value=0.0, max=4.0), HTML(value='')))","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"b6847259d96749388c62fea2fd98f271"}},"metadata":{}},{"output_type":"stream","text":"Starting Code Block run time... Done.\nCode Block run time:\nRan in 1.001 secs\nRan in 0.017 mins\nStarting Code Block run time... Done.\nCode Block run time:\nRan in 3.002 secs\nRan in 0.05 mins\nStarting Code Block run time... Done.\nCode Block run time:\nRan in 5.004 secs\nRan in 0.083 mins\nStarting Code Block run time... Done.\nCode Block run time:\nRan in 2.002 secs\nRan in 0.033 mins\n\nSaving artifact of size: 8.37 KB to Splice Machine DB\n","name":"stdout"},{"output_type":"display_data","data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\"><a target=\"_blank\" href=/mlflow/#/experiments/26>MLFlow UI</a></font>"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<IPython.lib.display.IFrame at 0x7f6d856c2c10>","text/html":"\n <iframe\n width=\"100%\"\n height=\"500px\"\n src=\"/mlflow/#/experiments/26\"\n frameborder=\"0\"\n allowfullscreen\n ></iframe>\n "},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<Figure size 324x180 with 1 Axes>","image/png":"\n"},"metadata":{"needs_background":"light"}}]},{"metadata":{},"cell_type":"markdown","source":"## Storing ML Models\n<blockquote><p class='quotation'><span style='font-size:15px'>Just like everything else we've tracked so far, tracking ML Models is easy with Splice Machine's MLManager. The <code>log_model</code> and <code>load_model</code> functions are all you need. \n <footer>Splice Machine</footer> \n</blockquote>\n\n#### Let's try it out"},{"metadata":{"trusted":true},"cell_type":"code","source":"from sklearn import svm\nfrom sklearn import datasets\nfrom sklearn.metrics import accuracy_score\n\n# Start a run\nwith mlflow.start_run(run_name='my first model'):\n # Load some sklearn data\n digits = datasets.load_digits()\n\n # Build a simple model\n clf = svm.SVC(gamma=0.001, C=100.)\n # Log parameters to mlflow\n mlflow.lp('gamma', 0.001)\n mlflow.lp('C', 100.0)\n\n # Train the model\n with mlflow.timer('train_time'):\n clf.fit(digits.data[:-1], digits.target[:-1])\n\n # Predict with some data\n preds = clf.predict(digits.data[:-1])\n\n # Measure accuracy\n acc = accuracy_score(digits.target[:-1], preds)\n print('Accuracy:',acc)\n # Log metric to mlflow\n mlflow.lm('accuracy', acc)\n \n # Save model\n mlflow.log_model(clf, 'clf_model')\n rid = mlflow.current_run_id()","execution_count":45,"outputs":[{"output_type":"stream","text":"Starting Code Block train_time... Done.\nCode Block train_time:\nRan in 0.156 secs\nRan in 0.003 mins\nAccuracy: 1.0\nSaving artifact of size: 473.58 KB to Splice Machine DB\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"#### Load our model back and make new predictions"},{"metadata":{"trusted":true,"scrolled":true},"cell_type":"code","source":"loaded_model = mlflow.load_model(run_id=rid, name='clf_model')\ndisplay(loaded_model)\n# Make a new prediction\nnew_data = [ \n 0., 0., 12., 10., 0., 0., 0., 0., 0., 0., 14., 16., 16.,\n 14., 0., 0., 0., 0., 13., 16., 15., 10., 1., 0., 0., 0.,\n 11., 16., 16., 7., 0., 0., 0., 0., 0., 4., 7., 16., 7.,\n 0., 0., 0., 0., 0., 4., 16., 9., 0., 0., 0., 5., 4.,\n 12., 16., 4., 0., 0., 0., 9., 16., 16., 10., 0., 0.\n]\nprint('Prediction on new data:', loaded_model.predict([new_data])[0])","execution_count":56,"outputs":[{"output_type":"display_data","data":{"text/plain":"SVC(C=100.0, cache_size=200, class_weight=None, coef0=0.0,\n decision_function_shape='ovr', degree=3, gamma=0.001, kernel='rbf',\n max_iter=-1, probability=False, random_state=None, shrinking=True,\n tol=0.001, verbose=False)"},"metadata":{}},{"output_type":"stream","text":"Prediction on new data: 5\n","name":"stdout"}]},{"metadata":{"trusted":true},"cell_type":"markdown","source":"# Fantastic!\n<blockquote> \nNow you have all of the tools necessary to start accessing and manipulating your Big Data with Spark and Splice Machine. Again, feel free to check out our <a href=\"https://pysplice.readthedocs.io/en/dbaas-4100/splicemachine.mlflow_support.html\">documentation</a>!<br><br>\n Next Up: <a href='./7.3 Data Exploration.ipynb'>Using MLManager to explore and analyze your data</a>\n<footer>Splice Machine</footer>\n</blockquote>"}],"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.7.6","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"toc":{"nav_menu":{},"number_sections":false,"sideBar":true,"skip_h1_title":false,"base_numbering":1,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{"height":"calc(100% - 180px)","width":"242px","left":"10px","top":"150px"},"toc_section_display":true,"toc_window_display":true}},"nbformat":4,"nbformat_minor":4} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment