Skip to content

Instantly share code, notes, and snippets.

View rjurney's full-sized avatar

Russell Jurney rjurney

View GitHub Profile
@rjurney
rjurney / groupage.md
Last active October 2, 2025 22:24
Claude Code command to group data, count the size of the groups, look and display high / low superkeys and sample and display grouped records
allowed-tools description
pyspark-mcp, WebFetch, Web Search, Bash(python:*), Bash(poetry:*), Bash(pyspark:*)
Groupage command is used to group data, count the group size, plit a histogram and display both the keys of the largest groups and those of groups in a middle range. Arguments include the column to group by,

Groupage Command

Description

Realize that the unique values of fields of real world datasets often have long-tail, log scale distributions. This creates 'superkeys' that can cause problems in downstream code. The groupage command is used to identify and mitigate these superkeys.

@rjurney
rjurney / Pynvml_Rich_GPU_Monitor.sql
Created October 2, 2025 17:51
Pynvml / Rich GPU Monitor with min/max
┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ GPU ID ┃ GPU Util % ┃ Mem Used (MB) ┃ Mem Total (MB) ┃ Mem % ┃ Min Mem (MB) ┃ Max Mem (MB) ┃
┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ 0 │ 100.0 │ 9856.0 │ 12288.0 │ 80.2 │ 9816.0 │ 9856.0 │
│ 1 │ 75.0 │ 4534.4 │ 12288.0 │ 36.9 │ 4534.4 │ 4534.4 │
└────────┴────────────┴───────────────┴────────────────┴───────┴──────────────┴──────────────┘
{
"hooks": {
"PostToolUse": [
{
"matcher": "*",
"hooks": [
{
"type": "command",
"command": "if command -v osascript >/dev/null 2>&1; then osascript -e 'beep 1'; elif command -v notify-send >/dev/null 2>&1; then notify-send 'Claude Code' \"Tool: $CLAUDE_TOOL_NAME completed\"; fi"
}
@rjurney
rjurney / README.md
Created August 29, 2025 21:00
Graphlet AI Claude Code PySpark Guide - customized Palantir-PySpark-Guide for effective PySpark in Claude Code

Note: this style guide is an edit of the Palantir Style guide, for which I am very grateful! You may use this one or edit theirs as a starting point for your own agent-based PySpark code.

Palantir PySpark Style Guide

PySpark Style Guide

PySpark is a wrapper language that allows users to interface with an Apache Spark backend to quickly process data. Spark can operate on massive datasets across a distributed network of servers, providing major performance and reliability benefits when utilized correctly. It presents challenges, even for experienced Python developers, as the PySpark syntax draws on the JVM heritage of Spark and therefore implements code patterns that may be unfamiliar.

This opinionated guide to PySpark code style presents common situations we've encountered and the associated best practices based on the most frequent recurring topics across PySpark repos.

@rjurney
rjurney / merged.baml
Created August 16, 2025 06:28
Merged record includes full corporate name, as determined by the 'name' field @description :)
{
"name": "Nvidia Corporation",
"ticker": {
"symbol": "NVDA",
"exchange": "NASDAQ"
},
"description": "An American technology company, founded in 1993, specializing in GPUs (e.g., Blackwell), SoCs, and full-stack AI computing platforms like DGX Cloud. A dominant player in the AI, gaming, and data center markets, it is led by CEO Jensen Huang and headquartered in Santa Clara, California.",
"website_url": "null",
"headquarters_location": "Santa Clara, California, USA",
"revenue_usd": 10918000000,
@rjurney
rjurney / before.json
Created August 16, 2025 06:27
Company records to be merged with field description as metadata for guidance...
{
"name": "Nvidia Corporation",
"ticker": {
"symbol": "NVDA",
"exchange": "NASDAQ"
},
"description": "An American technology company, founded in 1993, specializing in GPUs (e.g., Blackwell), SoCs, and full-stack AI computing platforms like DGX Cloud. A dominant player in the AI, gaming, and data center markets, it is led by CEO Jensen Huang and headquartered in Santa Clara, California.",
"website_url": "null",
"headquarters_location": "Santa Clara, California, USA",
"revenue_usd": 10918000000,
@rjurney
rjurney / company.baml
Created August 16, 2025 06:23
BAML field annotations guide extraction, matching and merging!
class Company {
name string
@description("Formal name of the company with corporate suffix")
...
}
@rjurney
rjurney / settings.json
Created August 10, 2025 07:53
Claude Settings for my Project - do not actually result in autonomically running these commands...
{
"permissions": {
"allow": [
"Bash(git status:*)",
"Bash(git diff:*)",
"Bash(git log:*)",
"Bash(git add:*)",
"Bash(git grep:*)",
"Bash(poetry update:*)",
"Bash(pip show:*)",
@rjurney
rjurney / clients.baml
Created August 9, 2025 04:51
LLM Entity Matching + Merging Proof-of-Concept with BAML + Gemini 2.5 Pro
client<llm> Gemini25Pro {
provider google-ai
retry_policy ThreeRetries
options {
model "gemini-2.5-pro"
api_key env.GEMINI_API_KEY
}
}
retry_policy ThreeRetries {
@rjurney
rjurney / clients.baml
Last active August 6, 2025 12:54
BAML for basic information extraction of company records
client<llm> Gemini25Pro {
provider google-ai
retry_policy ThreeRetries
options {
model "gemini-2.5-pro"
api_key env.GEMINI_API_KEY
}
}
retry_policy ThreeRetries {