grahama1970’s gists

grahama1970 / rag_classifer_unified.py

Created January 31, 2025 16:19

RAG based Classifier for determining sentence complexity (proof of concept only)

	#!/usr/bin/env python3

	import os
	import time
	from typing import List, Dict, Any
	from functools import partial
	from concurrent.futures import ThreadPoolExecutor, as_completed

	import torch
	import torch.nn.functional as F

grahama1970 / 01_inference.py

Last active January 31, 2025 13:35

Training a Distilbert model to determine question complexity before sent to a smolagent

	from complexity.file_utils import get_project_root, load_env_file
	import torch
	from transformers import (
	DistilBertTokenizerFast,
	DistilBertForSequenceClassification
	)
	from loguru import logger
	import os
	import time # Add this at the top with other imports

grahama1970 / ast_output.json

Last active January 26, 2025 22:24

Deepseek Structured vs (hacked) unstructured outputs: This project evaluates Deepseek's string-based (Markdown) and JSON-based outputs to determine which approach is better suited for structured storage and processing. By converting the Markdown output into an Abstract Syntax Tree (AST) and storing the results in ArangoDB, we test whether the ad…

	[
	{
	"type": "Heading",
	"children": [
	{
	"type": "RawText",
	"children": [],
	"content": "Paris: The Enchanting Capital of France"
	}
	],

grahama1970 / ask-obsidian-result.tsx

Last active January 23, 2025 17:20

Debug for an Raycast extension that tries to access Obsidian's Smart Chat Conversations (within Raycast) to ask questions of local documents

	import { Action, Detail, LaunchProps } from "@raycast/api";
	import { getConfig } from "./utils/preferences";

	interface Preferences {
	obsidianVaultPath: string;
	}

	export default function ResultView(props: LaunchProps<{ context: { answer: string } }>) {
	const { obsidianVaultPath } = getConfig();

grahama1970 / globi_tab.applescript

Created January 22, 2025 21:12

GlobiTab is a Raycast script that intelligently manages Chrome tabs using quicklinks (like 'gh' for GitHub). Unlike Raycast's built-in quicklinks that always create new tabs, GlobiTab first checks if the target URL already exists in any window. If found, it switches to that tab instead of creating a duplicate.

	#!/usr/bin/osascript

	# Required parameters:
	# @raycast.schemaVersion 1
	# @raycast.title GlobiTab
	# @raycast.mode silent
	# @raycast.icon 🔍
	# @raycast.packageName GlobiTab
	# @raycast.argument1 { "type": "text", "placeholder": "Tab Name/URL/Keyword", "optional": false }

grahama1970 / get_env_value.sh

Last active January 22, 2025 01:10

Efficient .env Key Retrieval for Raycast A Raycast Script Command to search and copy environment variable values from a .env file. Supports exact matching, abbreviation shortcuts, and fuzzy search with fzf. Outputs the value to the terminal and clipboard for seamless workflows.

	#!/bin/bash

	# Description:
	# A Raycast script for quickly finding environment variables in .env files.
	# Matches keys in three ways:
	# 1. Abbreviations: "aak" → "AWS_ACCESS_KEY", "gpt" → "GITHUB_PAT_TOKEN"
	# 2. Partial matches: "shap" → "SHAPE", "aws" → "AWS_KEY"
	# 3. Fuzzy finding: "ath" → "AUTH_TOKEN"

	# Matched values are copied to clipboard and printed to terminal using pbcopy (brew install pbcopy or similar)

grahama1970 / bm25_embedding_keyword_combined.aql

Last active January 16, 2025 14:03

ArangoDB hybrid search implementation combining BM25 text search, embedding similarity (using sentence-transformers), and keyword matching. Includes Python utilities and AQL query for intelligent document retrieval with configurable thresholds and scoring. Perhaps, use RapidFuzz for post-processing later

	LET results = (
	// Get embedding results
	LET embedding_results = (
	FOR doc IN glossary_view
	LET similarity = COSINE_SIMILARITY(doc.embedding, @embedding_search)
	FILTER similarity >= @embedding_similarity_threshold
	SORT similarity DESC
	LIMIT @top_n
	RETURN {
	doc: doc,

grahama1970 / aql_utils.py

Last active January 11, 2025 14:59

This script implements a hybrid search system using ArangoDB that combines: 1. Vector similarity search using COSINE_SIMILARITY 2. BM25 text search with custom text analyzer 3. Fuzzy string matching using Levenshtein distance

	import os
	from loguru import logger
	from typing import List, Dict

	def load_aql_query(filename: str) -> str:
	"""
	Load an AQL query from a file.
	"""
	try:
	file_path = os.path.join("app/backend/vllm/beta/utils/aql", filename)

grahama1970 / hf_only_inference_sanity_check.py.py

Last active December 27, 2024 21:07

For dynamic adaptor loading and inferencing, the Unsloth Inference works fine--using Hugging Face does not work--outputs garbled

	# Doesn't Work. Outputs are garbled
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	from loguru import logger

	# Configuration
	BASE_MODEL_NAME = "unsloth/Phi-3.5-mini-instruct"
	ADAPTER_PATH = "/home/grahama/dev/vllm_lora/training_output/Phi-3.5-mini-instruct_touch-rugby-rules_adapter/final_model"

grahama1970 / tinyllama_custom_adaptor.py

Last active December 21, 2024 22:26

tinyllama_model_merge_wip: well I thought I could make this work....maybe leave for another time

	import os
	import torch
	import gc
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	from peft import PeftModel
	from huggingface_hub import snapshot_download
	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer,
	pipeline,

Graham grahama1970