Skip to content

Instantly share code, notes, and snippets.

View LunNova's full-sized avatar
🌐
worldbuilding in progress

Luna LunNova

🌐
worldbuilding in progress
View GitHub Profile
@LunNova
LunNova / bert-tiny-amd.md
Created October 10, 2024 16:47 — forked from fxkamd/bert-tiny-amd.md
Solutions to problems with BERT training with tinygrad on AMD GPUs

Thank you to tiny corp for pointing out some problems running BERT training with Tinygrad on AMD GPUs in this Tweet. We had a few engineers at AMD take a look at the problem and they were quickly able to reproduce it.

What they found was an issue related to CWSR (compute wave save restore), which is a mechanism that allows our driver and firmware to preempt and reschedule long-running compute waves on our GPUs. The GFXv11 GPU line requires a workaround to set COMPUTE_PGM_RSRC1.PRIV=1 when dispatching a compute kernel. Normally this is handled by the AQL DISPATCH packet. However, since the Tinygrad implementation leverages a custom runtime, it requires this workaround in its PM4-based dispatch. This patch is specific to GFXv11 GPUs. Other GPUs do not require it and should not use this workaround. The following KFDTest patch can be used as a reference: https://github.com/ROCm/ROCT-Thunk-Interface/commit/507637ed5b82197eecbf483cdc1234939766549a

While inv

@LunNova
LunNova / time.js
Last active August 29, 2015 14:05 — forked from anonymous/time.js
<script type="text/javascript">
(function() {
"use strict";
/* From https://github.com/dperini/ContentLoaded/blob/master/src/contentloaded.js
* Author: Diego Perini (diego.perini at gmail.com)
* License: MIT
*/
function contentLoaded(win, fn) {
var done = false, top = true,