Skip to content

Instantly share code, notes, and snippets.

View betatim's full-sized avatar
🤠
Not my first rodeo

Tim Head betatim

🤠
Not my first rodeo
View GitHub Profile
@betatim
betatim / README.md
Created November 6, 2025 15:06
Show recent Pull Request activity for a user.

GitHub PR Activity Tracker

A Python script that tracks Pull Request activity for a specific user over a configurable time period using the PyGithub library.

Features

  • Tracks PRs where the user:
    • Created the PR
    • Added comments
  • Submitted reviews

AI-Assisted PR Review Checklist for Scikit-learn

Purpose: This checklist is optimized for AI assistants (like Cursor) to perform automated PR reviews. It separates automatable checks from those requiring human judgment, provides specific patterns to detect, and includes commands to run.


How to Use This Checklist

For AI Agents:

  1. Run all AUTOMATED checks first and report findings with severity levels

Summary of Issues

  • Classification Metrics Sparse Support Bug (Issue #32036): A bug where classification metrics in scikit-learn claim sparse matrix support in docstrings but raise an error when used with sparse inputs. The issue is reliably reproducible with provided code steps, expected (support) vs. actual behavior (TypeError), and environment details in the traceback. No major missing elements. Link

  • RandomizedSearchCV Feature Request (Issue #32032): A proposal to add weights for controlling the probability of selecting items in a list of parameter distributions, useful for complex pipelines with interdependent hyperparameters. This is a feature enhancement, not a bug, and includes clear examples and rationale. Link

  • CI Failure on Linux Build (Issue #32022): Reported CI failure on a specific build configuration, with a reference to logs but no detailed steps to rep

@betatim
betatim / log-test.py
Created July 17, 2025 08:17
Associate log output with the (top level) line of a script. Makes it easier to know which log messages where output while a particular line was running. Try it with `python -m log_tracer -o traced_output.txt log-test.py`
import numpy as np
import logging
import time
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('calculations.log'),
import sklearn
import numpy as np
import torch
sklearn.set_config(array_api_dispatch=True)
def my_code(X, cdist=False):
if cdist:
dist = torch.cdist(X, X, p=2)
[215/275] Linking CXX shared library libcuml++.so
FAILED: libcuml++.so
: && /datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/bin/x86_64-conda-linux-gnu-c++ -fPIC -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/include -I/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/targets/x86_64-linux/include -L/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/targets/x86_64-linux/lib -L/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/targets/x86_64-linux/lib/stubs -O3 -DNDEBUG -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/lib -Wl,-rpath-link,/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/lib -L/datasets/thead/mambaforge/envs/cuml-dev-23.12-dgx15/lib -L/datasets/thead/mamb
@betatim
betatim / mamba.log
Created April 8, 2024 12:27
Result of running `mamba env create -n sklearn-min-docs -f .build_tools/circle/doc_min_dependencies_environment.yml`
conda-forge/osx-arm64 Using cache
conda-forge/noarch Using cache
Looking for: ['python=3.9', 'numpy=1.19.5', 'blas', 'scipy=1.6.0', 'cython=3.0.10', 'joblib', 'threadpoolctl', 'matplotlib=3.3.4', 'pandas=1.1.5', 'pyamg', "pytest[version='<8']", 'pytest-xdist', 'pillow', 'pip', 'ninja', 'meson-python', 'scikit-image=0.17.2', 'seaborn', 'memory_profiler', 'compilers', 'sphinx=6.0.0', 'sphinx-gallery=0.15.0', 'sphinx-copybutton=0.5.2', 'numpydoc=1.2.0', 'sphinx-prompt=1.3.0', 'plotly=5.14.0', 'polars=0.19.12', 'pooch', 'pip']
Could not solve for environment specs
The following packages are incompatible
├─ numpy 1.19.5** is installable with the potential options

Building for bare metal

Checkout https://github.com/rapidsai/gpu-xb-ai:

git clone https://github.com/rapidsai/gpu-xb-ai

Create a conda environment from conda/environments/gpu-xb-ai-legate-all.yaml:

conda env create -f conda/environments/gpu-xb-ai-legate-all.yaml
==================================================================================== short test summary info ====================================================================================
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-cupy.array_api-None-None] - ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy-None-None] - ValueError: kind can only be None or 'stable'
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy.array_api-None-None] - ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-cupy.array_api-None-None] - ValueError: The truth value of an array with
@betatim
betatim / myscript.py
Last active September 22, 2023 07:35
Working out how to `mpirun` dask with cuda
from dask_mpi import initialize
from dask import distributed
def dask_info():
distributed.print("woah i'm running!")
distributed.print("ncores:", client.ncores())
distributed.print()
distributed.print(client.scheduler_info())