Skip to content

Instantly share code, notes, and snippets.

@grahama1970
grahama1970 / rag_classifer_unified.py
Created January 31, 2025 16:19
RAG based Classifier for determining sentence complexity (proof of concept only)
#!/usr/bin/env python3
import os
import time
from typing import List, Dict, Any
from functools import partial
from concurrent.futures import ThreadPoolExecutor, as_completed
import torch
import torch.nn.functional as F
@grahama1970
grahama1970 / 01_inference.py
Last active January 31, 2025 13:35
Training a Distilbert model to determine question complexity before sent to a smolagent
from complexity.file_utils import get_project_root, load_env_file
import torch
from transformers import (
DistilBertTokenizerFast,
DistilBertForSequenceClassification
)
from loguru import logger
import os
import time # Add this at the top with other imports
@grahama1970
grahama1970 / ast_output.json
Last active January 26, 2025 22:24
Deepseek Structured vs (hacked) unstructured outputs: This project evaluates Deepseek's string-based (Markdown) and JSON-based outputs to determine which approach is better suited for structured storage and processing. By converting the Markdown output into an Abstract Syntax Tree (AST) and storing the results in ArangoDB, we test whether the ad…
[
{
"type": "Heading",
"children": [
{
"type": "RawText",
"children": [],
"content": "Paris: The Enchanting Capital of France"
}
],
@grahama1970
grahama1970 / ask-obsidian-result.tsx
Last active January 23, 2025 17:20
Debug for an Raycast extension that tries to access Obsidian's Smart Chat Conversations (within Raycast) to ask questions of local documents
import { Action, Detail, LaunchProps } from "@raycast/api";
import { getConfig } from "./utils/preferences";
interface Preferences {
obsidianVaultPath: string;
}
export default function ResultView(props: LaunchProps<{ context: { answer: string } }>) {
const { obsidianVaultPath } = getConfig();
@grahama1970
grahama1970 / globi_tab.applescript
Created January 22, 2025 21:12
GlobiTab is a Raycast script that intelligently manages Chrome tabs using quicklinks (like 'gh' for GitHub). Unlike Raycast's built-in quicklinks that always create new tabs, GlobiTab first checks if the target URL already exists in any window. If found, it switches to that tab instead of creating a duplicate.
#!/usr/bin/osascript
# Required parameters:
# @raycast.schemaVersion 1
# @raycast.title GlobiTab
# @raycast.mode silent
# @raycast.icon πŸ”
# @raycast.packageName GlobiTab
# @raycast.argument1 { "type": "text", "placeholder": "Tab Name/URL/Keyword", "optional": false }
@grahama1970
grahama1970 / get_env_value.sh
Last active January 22, 2025 01:10
Efficient .env Key Retrieval for Raycast A Raycast Script Command to search and copy environment variable values from a .env file. Supports exact matching, abbreviation shortcuts, and fuzzy search with fzf. Outputs the value to the terminal and clipboard for seamless workflows.
#!/bin/bash
# Description:
# A Raycast script for quickly finding environment variables in .env files.
# Matches keys in three ways:
# 1. Abbreviations: "aak" β†’ "AWS_ACCESS_KEY", "gpt" β†’ "GITHUB_PAT_TOKEN"
# 2. Partial matches: "shap" β†’ "SHAPE", "aws" β†’ "AWS_KEY"
# 3. Fuzzy finding: "ath" β†’ "AUTH_TOKEN"
# Matched values are copied to clipboard and printed to terminal using pbcopy (brew install pbcopy or similar)
@grahama1970
grahama1970 / bm25_embedding_keyword_combined.aql
Last active January 16, 2025 14:03
ArangoDB hybrid search implementation combining BM25 text search, embedding similarity (using sentence-transformers), and keyword matching. Includes Python utilities and AQL query for intelligent document retrieval with configurable thresholds and scoring. Perhaps, use RapidFuzz for post-processing later
LET results = (
// Get embedding results
LET embedding_results = (
FOR doc IN glossary_view
LET similarity = COSINE_SIMILARITY(doc.embedding, @embedding_search)
FILTER similarity >= @embedding_similarity_threshold
SORT similarity DESC
LIMIT @top_n
RETURN {
doc: doc,
@grahama1970
grahama1970 / aql_utils.py
Last active January 11, 2025 14:59
This script implements a hybrid search system using ArangoDB that combines: 1. Vector similarity search using COSINE_SIMILARITY 2. BM25 text search with custom text analyzer 3. Fuzzy string matching using Levenshtein distance
import os
from loguru import logger
from typing import List, Dict
def load_aql_query(filename: str) -> str:
"""
Load an AQL query from a file.
"""
try:
file_path = os.path.join("app/backend/vllm/beta/utils/aql", filename)
@grahama1970
grahama1970 / hf_only_inference_sanity_check.py.py
Last active December 27, 2024 21:07
For dynamic adaptor loading and inferencing, the Unsloth Inference works fine--using Hugging Face does not work--outputs garbled
# Doesn't Work. Outputs are garbled
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from loguru import logger
# Configuration
BASE_MODEL_NAME = "unsloth/Phi-3.5-mini-instruct"
ADAPTER_PATH = "/home/grahama/dev/vllm_lora/training_output/Phi-3.5-mini-instruct_touch-rugby-rules_adapter/final_model"
@grahama1970
grahama1970 / tinyllama_custom_adaptor.py
Last active December 21, 2024 22:26
tinyllama_model_merge_wip: well I thought I could make this work....maybe leave for another time
import os
import torch
import gc
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
from huggingface_hub import snapshot_download
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
pipeline,