This file contains is additional guidance for AI agents and other AI editors.
These principles reduce common LLM coding mistakes. Apply them to every task.
| import numpy as np | |
| from sklearn.ensemble import RandomForestRegressor | |
| from sklearn.model_selection import cross_validate | |
| from skrub import tabular_pipeline | |
| from skrub.datasets import fetch_employee_salaries | |
| def main(): | |
| print("Loading employee salaries dataset ...") | |
| data = fetch_employee_salaries() |
| #!/usr/bin/env python3 | |
| """ | |
| Employee Salary Prediction | |
| ========================== | |
| Predict current annual salary for Montgomery County employees | |
| using a scikit-learn pipeline with mixed feature types. | |
| """ | |
| try: | |
| import cuml.accel |
| """ | |
| Pipeline(StandardScaler, PCA, RandomForest) with GridSearchCV | |
| ============================================================= | |
| GridSearchCV over an all-proxy Pipeline on the full Forest Cover Type | |
| dataset (581K samples, 54 features, 7 classes). | |
| Pipeline: StandardScaler -> PCA -> RandomForestClassifier | |
| All three steps are cuml.accel proxies, so the GridSearchCV patch |
| """ | |
| Benchmark: numpy-to-cupy (CPU-to-GPU) transfer times on this machine. | |
| Target: GPU 1 (NVIDIA RTX A6000, 48 GB, PCIe Gen4 x16 slot) | |
| """ | |
| import time | |
| import statistics | |
| import numpy as np | |
| import cupy as cp |
| import tarfile | |
| import time | |
| import urllib.request | |
| from collections import OrderedDict | |
| from pathlib import Path | |
| import numpy as np | |
| import scipy.io | |
| import scipy.sparse | |
| from sklearn.decomposition import TruncatedSVD |
| """ | |
| Test: Validate that cuml native estimators can be converted to ONNX | |
| via as_sklearn() -> skl2onnx -> onnxruntime. | |
| Unlike cuml.accel proxies (which skl2onnx recognizes directly), native cuml | |
| estimators must first be converted to sklearn via as_sklearn() before | |
| skl2onnx.convert_sklearn() will accept them. | |
| Run without cuml.accel: | |
| python test_onnx_as_sklearn.py |
Created: 2026-01-07 Last Updated: 2026-01-07
Scikit-learn's Array API support enables estimators and functions to work with arrays from different libraries (NumPy, CuPy, PyTorch) without modification. This allows computations to run on GPUs when using GPU-backed array libraries.
The implementation follows the Array API Standard, a specification that defines a common API for array manipulation libraries.
| #!/usr/bin/env python3 | |
| """ | |
| Ray + RandomForestClassifier with max_calls=1 | |
| Demonstrates the impact of max_calls=1 on Ray task execution when using | |
| scikit-learn's RandomForestClassifier. | |
| """ | |
| import time | |
| import ray | |
| from sklearn.datasets import make_classification |