Skip to content

Instantly share code, notes, and snippets.

@jacobtomlinson
Last active January 15, 2025 16:01
Show Gist options
  • Save jacobtomlinson/2481ecf2e1d2787ae2864a6712eef97b to your computer and use it in GitHub Desktop.
Save jacobtomlinson/2481ecf2e1d2787ae2864a6712eef97b to your computer and use it in GitHub Desktop.
Using Coiled to run cudf.pandas workloads on the cloud

cudf.pandas Coiled Demo

The RAPIDS cudf.pandas accelerator allows you to leverage the power of NVIDIA GPU acceleration in your pandas workflows.

Scripts that use pandas can be run via the cudf.pandas module to accelerate your code with zero-code change.

python my_code.py  # Uses the CPU
python -m cudf.pandas my_code.py  # Same pandas code uses the GPU

But what if you don't have a GPU? That's where Coiled comes in. With the coiled run tool you can execute scripts from your local machine on a cloud VM with whatever hardware you choose. You will be billed only for what you use and the VM will shut down again when the script completes.

coiled run python my_code.py  # Boots a VM on the cloud, runs the scripts, then shuts down again

We can tie these two tools together neatly to GPU accelerate your code on the cloud without having to change your code or where it lives.

To demonstrate this let's run the attached Pandas code to load some data and perform some standard dataframe operations including join, groupby, sort_values, count, etc.

$ coiled run --gpu --name rapids-demo --keepalive 5m --container nvcr.io/nvidia/rapidsai/base:24.10-cuda12.5-py3.12 -- python cudf_pandas_coiled_demo.py
╭────────── Running python -m cudf.pandas cudf_pandas_coiled_demo.py ──────────╮
│                                                                              │
│ Details: https://cloud.coiled.io/clusters/xxxxxx?account=xxxxxxxx            │
│                                                                              │
│ Ready  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━          │
│                                                                              │
│ Environment:                                                                 │
│ base_24_10-cuda12_5-py3_12-x86_64-xxxxxx                                     │
│ Region:   us-east-1                 Uptime:                           3m 19s │
│ VM Type:  g4dn.xlarge               Approx cloud cost:              $0.53/hr │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

Output
------

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.download.nvidia.com/licenses/NVIDIA_Deep_Learning_Container_License.pdf

Calculate violations by state took: 25.128 seconds
Calculate violations by vehicle type took: 7.279 seconds
Calculate violations by day of week took: 22.253 seconds

In our coiled run command we specified --gpu to select a GPU VM type, this selected a g4dn.xlarge on AWS but we could also specify this manually. We set --keepalive 5m to tell Coiled to keep our VM around for 5 mins after the script completes. This makes it easy to reuse the VM by running another script. We also explcitly specified the latest RAPIDS container with --container nvcr.io/nvidia/rapidsai/base:24.10-cuda12.5-py3.12, by default Coiled will sync your local software environment to the remote machine, but in this case we explicitly want a GPU software environment and not our local environment.

The first time we run this script it takes a couple of minutes to boot up the VM. But then we can see our pandas computations runs in just under a minute. We can run this script as many more times as we like in the next 5 mins and it will reuse that same VM.

Next let's add the python -m cudf.pandas flag to tell pandas to use the GPU.

$ coiled run --gpu --name rapids-demo --keepalive 5m --container nvcr.io/nvidia/rapidsai/base:24.10-cuda12.5-py3.12 -- python -m cudf.pandas cudf_pandas_coiled_demo.py
╭────────── Running python -m cudf.pandas cudf_pandas_coiled_demo.py ──────────╮
│                                                                              │
│ Details: https://cloud.coiled.io/clusters/xxxxxx?account=xxxxxxxx            │
│                                                                              │
│ Ready  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━          │
│                                                                              │
│ Environment:                                                                 │
│ base_24_10-cuda12_5-py3_12-x86_64-xxxxxx                                     │
│ Region:   us-east-1                 Uptime:                           8m 55s │
│ VM Type:  g4dn.xlarge               Approx cloud cost:              $0.53/hr │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

Output
------

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.download.nvidia.com/licenses/NVIDIA_Deep_Learning_Container_License.pdf

Calculate violations by state took: 3.470 seconds
Calculate violations by vehicle type took: 0.145 seconds
Calculate violations by day of week took: 1.238 seconds

This time we can see that our code took less than 5 seconds to run!

coiled run ... -- python cudf_pandas_coiled_demo.py  # 60 seconds of computation
coiled run --gpu ... -- python -m cudf.pandas cudf_pandas_coiled_demo.py  # 5 seconds of computation
import pandas as pd
from time import perf_counter
from contextlib import contextmanager
@contextmanager
def timeit(name):
"""Utility context manager to print out how long things take."""
start = perf_counter()
yield lambda: perf_counter() - start
print(f'{name} took: {perf_counter() - start:.3f} seconds')
# Read in the NYC parking violations dataset for 2022
df = pd.read_parquet(
"s3://rapidsai-data/datasets/nyc_parking/nyc_parking_violations_2022.parquet",
columns=["Registration State", "Violation Description", "Vehicle Body Type", "Issue Date", "Summons Number"]
)
with timeit("Calculate violations by state"):
for _ in range(10):
(df[["Registration State", "Violation Description"]] # get only these two columns
.value_counts() # get the count of offences per state and per type of offence
.groupby("Registration State") # group by state
.head(1) # get the first row in each group (the type of offence with the largest count)
.sort_index() # sort by state name
.reset_index()
)
with timeit("Calculate violations by vehicle type"):
for _ in range(10):
(df
.groupby(["Vehicle Body Type"])
.agg({"Summons Number": "count"})
.rename(columns={"Summons Number": "Count"})
.sort_values(["Count"], ascending=False)
)
with timeit("Calculate violations by day of week"):
for _ in range(10):
weekday_names = {
0: "Monday",
1: "Tuesday",
2: "Wednesday",
3: "Thursday",
4: "Friday",
5: "Saturday",
6: "Sunday",
}
df["Issue Date"] = df["Issue Date"].astype("datetime64[ms]")
df["issue_weekday"] = df["Issue Date"].dt.weekday.map(weekday_names)
df.groupby(["issue_weekday"])["Summons Number"].count().sort_values()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment