Skip to content

Instantly share code, notes, and snippets.

@grahama1970
grahama1970 / bm25_embedding_keyword_combined.aql
Last active January 16, 2025 14:03
ArangoDB hybrid search implementation combining BM25 text search, embedding similarity (using sentence-transformers), and keyword matching. Includes Python utilities and AQL query for intelligent document retrieval with configurable thresholds and scoring. Perhaps, use RapidFuzz for post-processing later
LET results = (
// Get embedding results
LET embedding_results = (
FOR doc IN glossary_view
LET similarity = COSINE_SIMILARITY(doc.embedding, @embedding_search)
FILTER similarity >= @embedding_similarity_threshold
SORT similarity DESC
LIMIT @top_n
RETURN {
doc: doc,
@grahama1970
grahama1970 / get_env_value.sh
Last active January 22, 2025 01:10
Efficient .env Key Retrieval for Raycast A Raycast Script Command to search and copy environment variable values from a .env file. Supports exact matching, abbreviation shortcuts, and fuzzy search with fzf. Outputs the value to the terminal and clipboard for seamless workflows.
#!/bin/bash
# Description:
# A Raycast script for quickly finding environment variables in .env files.
# Matches keys in three ways:
# 1. Abbreviations: "aak" → "AWS_ACCESS_KEY", "gpt" → "GITHUB_PAT_TOKEN"
# 2. Partial matches: "shap" → "SHAPE", "aws" → "AWS_KEY"
# 3. Fuzzy finding: "ath" → "AUTH_TOKEN"
# Matched values are copied to clipboard and printed to terminal using pbcopy (brew install pbcopy or similar)
@grahama1970
grahama1970 / globi_tab.applescript
Created January 22, 2025 21:12
GlobiTab is a Raycast script that intelligently manages Chrome tabs using quicklinks (like 'gh' for GitHub). Unlike Raycast's built-in quicklinks that always create new tabs, GlobiTab first checks if the target URL already exists in any window. If found, it switches to that tab instead of creating a duplicate.
#!/usr/bin/osascript
# Required parameters:
# @raycast.schemaVersion 1
# @raycast.title GlobiTab
# @raycast.mode silent
# @raycast.icon 🔍
# @raycast.packageName GlobiTab
# @raycast.argument1 { "type": "text", "placeholder": "Tab Name/URL/Keyword", "optional": false }
@grahama1970
grahama1970 / ask-obsidian-result.tsx
Last active January 23, 2025 17:20
Debug for an Raycast extension that tries to access Obsidian's Smart Chat Conversations (within Raycast) to ask questions of local documents
import { Action, Detail, LaunchProps } from "@raycast/api";
import { getConfig } from "./utils/preferences";
interface Preferences {
obsidianVaultPath: string;
}
export default function ResultView(props: LaunchProps<{ context: { answer: string } }>) {
const { obsidianVaultPath } = getConfig();
@grahama1970
grahama1970 / ast_output.json
Last active January 26, 2025 22:24
Deepseek Structured vs (hacked) unstructured outputs: This project evaluates Deepseek's string-based (Markdown) and JSON-based outputs to determine which approach is better suited for structured storage and processing. By converting the Markdown output into an Abstract Syntax Tree (AST) and storing the results in ArangoDB, we test whether the ad…
[
{
"type": "Heading",
"children": [
{
"type": "RawText",
"children": [],
"content": "Paris: The Enchanting Capital of France"
}
],
@grahama1970
grahama1970 / 01_inference.py
Last active January 31, 2025 13:35
Training a Distilbert model to determine question complexity before sent to a smolagent
from complexity.file_utils import get_project_root, load_env_file
import torch
from transformers import (
DistilBertTokenizerFast,
DistilBertForSequenceClassification
)
from loguru import logger
import os
import time # Add this at the top with other imports
@grahama1970
grahama1970 / rag_classifer_unified.py
Created January 31, 2025 16:19
RAG based Classifier for determining sentence complexity (proof of concept only)
#!/usr/bin/env python3
import os
import time
from typing import List, Dict, Any
from functools import partial
from concurrent.futures import ThreadPoolExecutor, as_completed
import torch
import torch.nn.functional as F
from arango import ArangoClient
from loguru import logger
def validate_embeddings(db, collection_name, dimension):
"""Validate embeddings with boolean AQL result."""
try:
# Collection name interpolation (must be sanitized)
query = f"""
RETURN COUNT(
from arango import ArangoClient
from loguru import logger
def validate_embeddings(db, collection_name, dimension):
"""Validate embeddings with boolean AQL result."""
try:
# Collection name interpolation (must be sanitized)
query = f"""
RETURN COUNT(
@grahama1970
grahama1970 / 01_docker-compose-sglang.yml
Last active February 5, 2025 21:19
Trying to get Qwen2.5-VL-7B to work with CUDA 12.8.
services:
# ---------------------------
# SGLang Service
# ---------------------------
sglang-service:
# image: lmsysorg/sglang:latest
build:
context: .
dockerfile: Dockerfile_v2.sglang
container_name: sglang-service