HiPlot is a tool for high-dimentiomnal visualisation, that comes handy for example when running hyperparameter optimisation of machine learning models. This readme collects some usage notes to make most of both experiment tracking on the Faculty Platform and HiPLot.
See the HiPlot documentation for more details on any HiPLot related information.
This readme has two main sections: using HiPlot in notebooks and as an app.
Here's an example, that can be used in a Jupyter notebook, which plots all the logged parameters and metrics. To use:
- install
hiplot
, - paste this section into a notebook,
- ajust the number in the
experiments
variable (which can be retrieved fromt the "export to code" section in the experiments view)
This below prints absolutely everything (params, metrics, tags, etc). Most likely not what is needed, but might be good to start exploring:
import uuid
import hiplot as hip
import mlflow
# Change this for your experiment
experiments = [0]
df = mlflow.search_runs(experiments)
exp = hip.Experiment()
for _, row in df.iterrows():
dp = hip.Datapoint(uid=str(uuid.UUID(row["run_id"])), values=row.to_dict())
exp.datapoints.append(dp)
exp.display(force_full_width=True)
Also included in a notebook.
Note, that columns can be dragged to the side of the plot to remove them from the display.
This below only plots parameters and metrics logged (ignoring all tags, etc), which might be a better starting point:
import uuid
import hiplot as hip
import mlflow
# Change the numerical value for your experiment
experiments = [0]
df = mlflow.search_runs(experiments)
exp = hip.Experiment()
for index, row in df.iterrows():
values = {}
# Add parameters first
params = [p.replace("params.", "") for p in row.keys() if p.startswith("params.")]
for p in params:
values[p] = row[f"params.{p}"]
# Add metrics next
metrics = [
m.replace("metrics.", "") for m in row.keys() if m.startswith("metrics.")
]
for m in metrics:
if row[f"metrics.{m}"] < 10:
values[m] = row[f"metrics.{m}"]
dp = hip.Datapoint(uid=str(uuid.UUID(row["run_id"])), values=values,)
exp.datapoints.append(dp)
exp.display(force_full_width=True)
Also included in a notebook.
After removing some columns, reorganising the rest, and set colouring to the "test_rmse" column (test set root-mean-squared error for our dataset):
Filtering by one of the columns:
Here's an example of setting up exactly what entries to plot:
import hiplot as hip
import mlflow
# Change this for your experiment
experiments = [0]
df = mlflow.search_runs(experiments)
exp = hip.Experiment()
variables = [
"params.learning_rate",
"params.momentum",
"metrics.train_rmse",
"metrics.test_rmse",
"metrics.val_rmse",
]
for index, row in df.iterrows():
values = {}
for v in variables:
if v in row and row[v]:
values[v] = float(row[v])
dp = hip.Datapoint(uid=str(row["run_id"]), values=values)
exp.datapoints.append(dp)
exp.display(force_full_width=True)
Also included in a notebook.
A lot of the settings can be adjusted to be fixed as well (e.g. the order of the columns on the parallel plot), see this part of the docs.
To run HiPlot as an app:
- Save
faculty_hiplot_fetcher.py
andstart-hiplot-server.sh
from this gist into your workspace, in the same folder. - Create an environment that installs
hiplot
- Set up the app to use that environment, and run the
start-hiplot-server.sh
script
If you navigate to the app's interface, can load experiments directly by using a single experiment's ID number with the
faculty://
prefix:
faculty://2
or name:
faculty://Training
Can also use it in a multi-experiment setting as well:
multi://{
"new model": "faculty://3",
"old model": "faculty://Training"
}
After removing some of the columns (by dragging them to the left or right side), and reorganizing, it's easy to compare the two experiments (the "exp" column automatically added to device the two set of experiments).
Chek the HiPLot docs regarding multiple experiments.