Skip to content

Instantly share code, notes, and snippets.

View LunNova's full-sized avatar
❄️
flake.lock

Luna LunNova

❄️
flake.lock
View GitHub Profile
@fxkamd
fxkamd / bert-tiny-amd.md
Created October 1, 2024 19:06
Solutions to problems with BERT training with tinygrad on AMD GPUs

Thank you to tiny corp for pointing out some problems running BERT training with Tinygrad on AMD GPUs in this Tweet. We had a few engineers at AMD take a look at the problem and they were quickly able to reproduce it.

What they found was an issue related to CWSR (compute wave save restore), which is a mechanism that allows our driver and firmware to preempt and reschedule long-running compute waves on our GPUs. The GFXv11 GPU line requires a workaround to set COMPUTE_PGM_RSRC1.PRIV=1 when dispatching a compute kernel. Normally this is handled by the AQL DISPATCH packet. However, since the Tinygrad implementation leverages a custom runtime, it requires this workaround in its PM4-based dispatch. This patch is specific to GFXv11 GPUs. Other GPUs do not require it and should not use this workaround. The following KFDTest patch can be used as a reference: https://github.com/ROCm/ROCT-Thunk-Interface/commit/507637ed5b82197eecbf483cdc1234939766549a

While inv

@fxkamd
fxkamd / TinyGrad-notes.md
Last active November 14, 2024 08:25
Observations about HSA and KFD backends in TinyGrad

This is Felix Kuehling, long time KFD driver architect. I started looking into the TinyGrad source code yesterday, focusing on ops_kfd.py, ops_hsa.py and driver/hsa.py, to understand how TinyGrad talks to our HW and help with the ongoing debugging effort from the top down. This analysis is based on this commit: https://github.com/tinygrad/tinygrad/tree/3de855ea50d72238deac14fc05cda2a611497778

I'm intrigued by the use of Python for low-level programming. I think I can learn something from your use of ctypes and clang2py for fast prototyping and test development. I want to share some observations based on my initial review.

ops_kfd looks pretty new, and I see many problems with it based on my long experience working on KFD. I think it's interesting, but probably not relevant for the most pressing problems at hand, so I'll cover that last.

ops_hsa uses ROCr APIs to manage GPU memory, create a user mode AQL queue for GPU kernel dispatch, async SDMA copies, and signal-based synchronization with barrier packets

@Mnkai
Mnkai / README.md
Last active November 4, 2024 11:37
TDP and turbo parameter modification with MSR on non-overclockable Intel CPU (such as Intel i7-8550U)

TDP and Turbo Parameter Modification with MSR on Non-Overclockable CPUs

Disclaimer

  • Modifying MSR may void your CPU's (or system board's) warranty. Proceed with caution. I am not responsible for any damage caused by this article.
  • MSR addresses vary significantly between CPUs. Check your CPU's MSR address using Intel's documentation.
  • This has only been tested on the Intel i7-8550U (Kaby Lake R).
  • This article is a translation of this article. If you can read Korean, I recommend reading that article instead.

Introduction

@braian87b
braian87b / mwan3-notes.md
Last active October 22, 2023 12:33
How to get MWAN3 Working Properly on OpenWRT / LEDE

In experience to get a proper working multiple wan configuration using mwan3 starting from scratch you should:

Important: this works well on OpenWRT 15.05.1, on newer versions there was some breaking changes, for example, the wan ifaces have ipv6 capability and now are named with letters ("wan, wanb... , wanc" instead of "wan, wan2... wan3" so wanb6 means 2nd wan ipv6.): https://github.com/openwrt/packages/blob/master/net/mwan3/files/etc/config/mwan3

The official documentation seems to be very detailed and up to date, I recommend reading those first: https://openwrt.org/docs/guide-user/network/wan/multiwan/mwan3 but I recommend to give a look at my config file below, since my approach for policyes is very nice.

First of all: Activate conntrack, docs says that is important and neccesary to get MWAN3 work properly, and it is needed to reboot:

@searls
searls / octohooks.rb
Last active May 13, 2017 14:57
Use Octokit to add a particular webhook to all of your repos (handy for things like chat integration)
# This is just a scratchpad after I hacked what I needed in an irb session
require 'octokit'
Octokit.configure do |c|
c.login = 'searls'
c.password = 'c0d3b4ssssss!'
end
client = Octokit::Client.new
repos = client.repos #Note, for an org's repos, see `client.orgs.first.rels[:repos].get.data`
@mariusGundersen
mariusGundersen / gist:6925246
Last active May 8, 2022 20:38
Programmer collective nouns
@spikegrobstein
spikegrobstein / nginx.conf
Last active August 9, 2024 13:42
nginx config for proxying requests for plex over a hostname-based virtualhost.
upstream plex-upstream {
# change plex-server.example.com:32400 to the hostname:port of your plex server.
# this can be "localhost:32400", for instance, if Plex is running on the same server as nginx.
server plex-server.example.com:32400;
}
server {
listen 80;
# server names for this server.
@jlong
jlong / uri.js
Created April 20, 2012 13:29
URI Parsing with Javascript
var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";
parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port; // => "3000"
parser.pathname; // => "/pathname/"
parser.search; // => "?search=test"
parser.hash; // => "#hash"
parser.host; // => "example.com:3000"
@rofl0r
rofl0r / gist:1073739
Created July 9, 2011 16:53 — forked from angavrilov/gist:926972
mmap injection on linux (emulation of VirtualAllocEx)
/* Support for executing system calls in the context of the game process. */
static const int injection_size = 4;
static const char nop_code_bytes[injection_size] = {
/* This is the byte pattern used to pad function
addresses to multiples of 16 bytes. It consists
of RET and a sequence of NOPs. The NOPs are not
supposed to be used, so they can be overwritten. */
0xC3, 0x90, 0x90, 0x90