Created
August 1, 2022 01:25
-
-
Save JosiahParry/26c5d4b073dec31cea5e4d90cea7d071 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Okay, so the goal is to have access to tidymodels from Databricks. My answer is a more general approach to package access in Databricks. This approach will lead to slightly slower spin up time. | |
The idea is to have persistent storage in the form of a ADLS blob storage container where packages are installed to. Then, when you spin up a cluster, install any required system deps and change your `options("repos")` to the ADLS container. | |
You can mount the container using one of these two approaches: | |
- [directly to the workspace](https://docs.microsoft.com/en-us/azure/databricks/data/mounts) | |
- [using blobfuse](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux) | |
If using blobfuse, it needs to be mounted in an init script. | |
Then, in a databricks notebook with the mounted storage container install packages like so | |
```r | |
install.packages( | |
pkgs = c("tidymodels", "other", "pkgs"), | |
repos = "https://packagemanager.rstudio.com/cran/__linux__/focal/latest", | |
lib = "/mnt/blob/container/pack" | |
) | |
``` | |
Then, you will need to ensure that your cluster has the required system dependencies upon start up. I personally use a `install-system-requirements.sh` script which I created using {pak}. Find the system requirements with pak for desired packages like so. | |
```r | |
pak::pkg_system_requirements("tidymodels", "ubuntu", "20.04") | |
``` | |
If you have more packages, iterate over it. | |
```{r} | |
installs <- vapply(c("tidymodels", "stringr"), | |
pak::pkg_system_requirements, | |
character(1), | |
"ubuntu", "20.04") | |
``` | |
Then write the results to a shell script with `writeLines(c("#!/bin/bash", installs, ""), "install-system-requirements.sh")`. Make that one of your init scripts. | |
Additionally, you'll need to change your `.libPaths()` either in some Rprofile whether `.Rprofile` or `Rprofile.site` (what I use). | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment