Skip to content

Instantly share code, notes, and snippets.

View CoffeeVampir3's full-sized avatar

Z CoffeeVampir3

View GitHub Profile
@CoffeeVampir3
CoffeeVampir3 / sadadjnsauhd897uashd98sahd9a897udshq.ts
Created April 29, 2025 03:45
wefjuejwuiofwjeuifchwie9ofh8i0923hf89023hf892h89fh298fh289f3h892hf.ts
import { StreamingWebSocket, CancellationError } from "./streamingWebSocket.ts";
/**
* Example of client-side streaming inference implementation
* Following the exact flow described in the specification
*/
async function streamInference(
socket: StreamingWebSocket,
onContent: (content: string) => void
): Promise<void> {
[TOOL_CALLS][{"name": "add", "arguments": {"a": 5, "b": 3}}][/TOOL_CALLS]
import express from "npm:express";
import { randomUUID } from "node:crypto";
import { McpServer } from "npm:@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "npm:@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "npm:zod";
const PORT = 3000;
const MCP_ENDPOINT = "/mcp";
const app = express();
@CoffeeVampir3
CoffeeVampir3 / TverskyBench.py
Created September 27, 2025 04:01
Fast VS Slow Tversky Multihead Bench
import torch
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange
def tversky_multihead_similarity_vectorized_simple(x, features, prototypes, theta, alpha, beta, n_heads):
batch_size, total_dim = x.shape
d_head = total_dim // n_heads
x = rearrange(x, 'b (h d) -> b h d', h=n_heads)
@CoffeeVampir3
CoffeeVampir3 / v.py
Last active September 29, 2025 21:49
Model Versioning
from safetensors.torch import save_model, load_model
from safetensors import safe_open
import json
config_dict = {'version': '1.0'}
metadata = {k: json.dumps(v) if not isinstance(v, str) else v for k, v in config_dict.items()}
save_model(model, "model.safetensors", metadata=metadata)
# ...
@CoffeeVampir3
CoffeeVampir3 / CLAUDE.md
Created October 24, 2025 00:58
AMX CLAUDE.md

AMX (Advanced Matrix Extensions) Programming Guide

Overview

AMX is a SIMD extension for x86-64 that provides hardware-accelerated matrix operations using tile registers. It's designed for high-performance matrix multiplication in AI/ML workloads.

Key Features:

  • 8 tile registers (TMM0-TMM7)
  • Each tile: up to 16 rows × 64 bytes
  • Native support for INT8, BF16, and FP16 operations