Skip to content

Instantly share code, notes, and snippets.

View avifenesh's full-sized avatar
🦥
Just hanging around

Avi Fenesh avifenesh

🦥
Just hanging around
View GitHub Profile
@avifenesh
avifenesh / qwen_gpu_fp16_oom_repro.mojo
Created May 21, 2026 23:00
Mojo 1.0.0b1 OOM repro: qwen_gpu_fp16_oom_repro.mojo (md5 ecfae96bc418257d4b151c1709a244fd)
# Phase 4a — FP16 storage / FP32 accumulate GPU kernels for Qwen.
#
# Mix: existing qwen_gpu_primitives tile/block_reduce shape (proven 15/15
# FP32 pass) + handover FP16-V1 contract + EXL2 split matvec/matmul (decode
# vs prefill) + selective fused projections (Phase 2 arena groups gate/up,
# q/k/v contiguous so kernels see one weight pointer per fused group).
#
# Storage = DType.float16. Accumulate = Float32 (numerical stability through
# K reduction on attention/FFN dims of 4096-12288).
@avifenesh
avifenesh / README.md
Created May 18, 2026 23:00
llama.cpp MoE CPU-offload prompt benchmark harness

llama.cpp MoE CPU-offload prompt benchmark harness

This is the small llama-server harness I used to compare MoE CPU-offload behavior across baseline and patched llama.cpp builds.

The benchmark is intentionally narrow: it measures sequential chat-completion latency and token throughput while exercising the CPU-MoE host-to-device expert staging path. It is not a model-quality eval, and the prompt strings are synthetic systems-engineering workload data used only to reproduce timing.

@avifenesh
avifenesh / mcp-sanitized.json
Created July 23, 2025 06:36
MCP (Model Context Protocol) server configuration template
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": [
"-y",
"@modelcontextprotocol/server-filesystem",
"/home/ubuntu"
],
"disabled": false,
@avifenesh
avifenesh / copilot-instructions.md
Last active July 5, 2025 13:10
Valkey GLIDE AI Coding Assistant Instructions - Comprehensive guide for AI agents working on the multi-language Valkey/Redis client library

Valkey GLIDE AI Coding Assistant Instructions

Project Overview

Multi-language Valkey/Redis client with Rust core (glide-core) and language wrappers. Uses protobuf for cross-language communication and FFI/UDS for performance.

Architecture

Core Components

  • glide-core/: Rust core (protocol, connections, business logic)
@avifenesh
avifenesh / switch_valkey_version.sh
Last active March 17, 2025 16:36
A script for easily switching betwen ValKey version
#!/bin/bash
#remove comment to print the lines running.
#set -x
set -e
valkey_version=
should_clone_repo=false
valkey_clone_git=https://github.com/valkey-io/valkey.git