Skip to content

Instantly share code, notes, and snippets.

@peppergrayxyz
peppergrayxyz / qemu-vulkan-virtio.md
Last active March 31, 2025 21:19
QEMU with VirtIO GPU Vulkan Support

QEMU with VirtIO GPU Vulkan Support

With its latest reales qemu added the Venus patches so that virtio-gpu now support venus encapsulation for vulkan. This is one more piece to the puzzle towards full Vulkan support.

An outdated blog post on clollabora described in 2021 how to enable 3D acceleration of Vulkan applications in QEMU through the Venus experimental Vulkan driver for VirtIO-GPU with a local development environment. Following up on the outdated write up, this is how its done today.

Definitions

Let's start with the brief description of the projects mentioned in the post & extend them:

@dvessel
dvessel / 06476883eab5710f7a0bede57a90cc050a86f10f.asset--AssetData--metadata.json
Last active November 4, 2024 09:43
Metadata for AppleIntelligence (Sequoia 15.1)
{
"model_type": "mlm",
"tamm_id": "afm-text-30b-instruct-v5-astc-6x6-20240709",
"checkpoint": "model.mlm",
"tokenizer": "afm-text-instruct-multilingual-100k-20240701",
"original_checkpoint": "bolttorchmodel://x5bhyxgsn7/440",
"export_date": "07/22/2024-11:36:33",
"mlm_config": {
"model_name": "ajax",
"backend": "metal",
@fxkamd
fxkamd / TinyGrad-notes.md
Last active March 20, 2025 13:37
Observations about HSA and KFD backends in TinyGrad

This is Felix Kuehling, long time KFD driver architect. I started looking into the TinyGrad source code yesterday, focusing on ops_kfd.py, ops_hsa.py and driver/hsa.py, to understand how TinyGrad talks to our HW and help with the ongoing debugging effort from the top down. This analysis is based on this commit: https://github.com/tinygrad/tinygrad/tree/3de855ea50d72238deac14fc05cda2a611497778

I'm intrigued by the use of Python for low-level programming. I think I can learn something from your use of ctypes and clang2py for fast prototyping and test development. I want to share some observations based on my initial review.

ops_kfd looks pretty new, and I see many problems with it based on my long experience working on KFD. I think it's interesting, but probably not relevant for the most pressing problems at hand, so I'll cover that last.

ops_hsa uses ROCr APIs to manage GPU memory, create a user mode AQL queue for GPU kernel dispatch, async SDMA copies, and signal-based synchronization with barrier packets

@bjacob
bjacob / README.md
Last active August 9, 2024 08:30
Exploring IREE CPU microkernels on a simple matmul example

Exploring IREE CPU microkernels on a simple matmul example

Basic setup, command lines

Source file: matmul.mlir:

func.func @matmul_dynamic(%lhs: tensor<?x?xf32>, %rhs: tensor<?x?xf32>, %acc: tensor<?x?xf32>) -> tensor<?x?xf32> {
  %result = linalg.matmul ins(%lhs, %rhs: tensor<?x?xf32>, tensor<?x?xf32>) outs(%acc: tensor<?x?xf32>) -> tensor<?x?xf32>
  return %result: tensor<?x?xf32>
@yoavg
yoavg / LLMs.md
Last active February 6, 2025 02:39

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We

@Chillee
Chillee / 1-pw_op_fusion.py
Last active February 4, 2025 17:56
PT 2.0 Benchmarks
import torch
import torch._inductor.config
import time
torch._inductor.config.triton.cudagraphs = False
torch.set_float32_matmul_precision('high')
def bench(f, name=None, iters=100, warmup=5, display=True, profile=False):
for _ in range(warmup):
f()
@Narsil
Narsil / pure_torch.py
Created November 10, 2022 15:06
Loading a safetensors file with pure torch only
import mmap
import torch
import json
import os
from huggingface_hub import hf_hub_download
def load_file(filename, device):
with open(filename, mode="r", encoding="utf8") as file_obj:
with mmap.mmap(file_obj.fileno(), length=0, access=mmap.ACCESS_READ) as m:
@res0nat0r
res0nat0r / colima.md
Last active September 24, 2024 02:08
Set proxy in Colima Docker container

abiosoft/colima#294 (comment)


Note: this assumes Colima v0.4.0 or newer.

SSH into the VM colima ssh

Edit docker init script sudo vi /etc/init.d/docker.

@b01
b01 / download-vs-code-server.sh
Last active March 5, 2025 18:12
Linux script to download latest VS Code Server, good for Docker (tested in Alpine).
#!/bin/sh
# Copyright 2023 Khalifah K. Shabazz
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the “Software”),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
@x0nu11byt3
x0nu11byt3 / elf_format_cheatsheet.md
Created February 27, 2021 05:26
ELF Format Cheatsheet

ELF Format Cheatsheet

Introduction

Executable and Linkable Format (ELF), is the default binary format on Linux-based systems.

ELF

Compilation