Skip to content

Instantly share code, notes, and snippets.

View LunNova's full-sized avatar
❄️
flake.lock

Luna LunNova

❄️
flake.lock
View GitHub Profile
@LunNova
LunNova / bert-tiny-amd.md
Created October 10, 2024 16:47 — forked from fxkamd/bert-tiny-amd.md
Solutions to problems with BERT training with tinygrad on AMD GPUs

Thank you to tiny corp for pointing out some problems running BERT training with Tinygrad on AMD GPUs in this Tweet. We had a few engineers at AMD take a look at the problem and they were quickly able to reproduce it.

What they found was an issue related to CWSR (compute wave save restore), which is a mechanism that allows our driver and firmware to preempt and reschedule long-running compute waves on our GPUs. The GFXv11 GPU line requires a workaround to set COMPUTE_PGM_RSRC1.PRIV=1 when dispatching a compute kernel. Normally this is handled by the AQL DISPATCH packet. However, since the Tinygrad implementation leverages a custom runtime, it requires this workaround in its PM4-based dispatch. This patch is specific to GFXv11 GPUs. Other GPUs do not require it and should not use this workaround. The following KFDTest patch can be used as a reference: https://github.com/ROCm/ROCT-Thunk-Interface/commit/507637ed5b82197eecbf483cdc1234939766549a

While inv

/// Guards a scope against unwinding, calling a handler if unwinding occurs.
///
/// - Handler gets no panic info; limited to `Fn()`
/// - Not reentrant; handler panics may cause program abort
/// - Intended for single scope; don't store or share
/// - Ineffective with panic=abort
pub struct UnwindDetector<T: Fn()> {
handler: T,
}
function fp_to_bytes(fp, bytes, is_double)
local val = tonumber(fp)
-- it's a NaN or inf
if val ~= val or val == math.huge or val == -math.huge then
bytes[1] = (val ~= val or val == math.huge) and 0x7f or 0xff
bytes[2] = (val ~= val and 0xf9 or 0xf8)
local max = is_double and 8 or 4
for i = 3, max do
bytes[i] = 0
Run in MappingTest subdirectory of https://github.com/MinimallyCorrect/Mapping/tree/transform-only-runs-for-default-attr
Must run publishToMavenLocal in parent directory first.
Note how transform only runs in test case -PTEST=3 where the default value of the attr can be transformed to the requested value.
Even case 4 fails which is where the default value is set as in case 3, but a value the same as that default value is also set in the gradle module metadata.
$ for i in 0 1 2 3 4; do echo; echo "Testing with -PTEST=$i"; echo; ./gradlew.bat -PTEST=$i --no-build-cache build; done
Testing with -PTEST=0
// ==UserScript==
// @name PluralSight Wider Speed Range
// @namespace https://nyx.nova.fail/
// @version 1.1
// @description Faster pluralsight video maximum speed
// @author Luna
// @match https://app.pluralsight.com/*
// @grant none
// @run-at document-start
// ==/UserScript==
@LunNova
LunNova / grub-kexec.sh
Created May 10, 2019 12:03
worst bootloader ever
#!/usr/bin/env bash
# This script boots the first linux it finds in a grub config on any available mount point
# using kexec
DEBIAN_FRONTEND=noninteractive sudo apt-get -q -y install kexec-tools
set -euvo pipefail
parts=$(sudo blkid | grep -v /dev/loop | sort | cut -d: -f 1)
grub_paths="/boot/grub/grub.cfg /grub/grub.cfg /boot/grub.cfg"
Enclosure serial console commands
COMMANDS:
Note: Not all commands are supported, 'help' lists supported commands.
acfail_sim Simulate AC Fail condition
auto_iic_recovery Enable/disable the automatic bus recovery.
batt_cell_balance Override auto cell balancing by individually turning on/off the balance FET s
batt_clear_cell_fault_poh Clear the Power On Hours stored as a result of a >500mV cell imbalance fault
@LunNova
LunNova / dhclient log
Created July 11, 2018 15:43
Flapping Interfaces Trouble
Jul 11 09:22:21 dhclient[87943]: exiting.
Jul 11 09:22:21 dhclient[87943]: connection closed
Jul 11 09:22:13 dhclient[27206]: em2 link state up -> down
Jul 11 09:21:26 dhclient[73751]: bound: renewal in 190763 seconds.
Jul 11 09:21:26 dhclient: Deleting old routes
Jul 11 09:21:26 dhclient: Comparing Routers: Old: <FLAPPING GATEWAY IP> New: <FLAPPING GATEWAY IP>
Jul 11 09:21:26 dhclient: Comparing IPs: Old: <FLAPPING WAN IP> New: <FLAPPING WAN IP>
Jul 11 09:21:26 dhclient: Starting delete_old_states()
Jul 11 09:21:25 dhclient: New Routers (em2): <FLAPPING GATEWAY IP>
Jul 11 09:21:24 dhclient: New Routers (em2): <FLAPPING GATEWAY IP>
@LunNova
LunNova / GPIOThreads.strace
Last active June 1, 2018 22:39
C3000 Onboard Administrator straces of mgmt
# strace -x -s 256 -p 1957 -p 1958
Process 1957 attached - interrupt to quit
Process 1958 attached - interrupt to quit
[pid 1958] rt_sigsuspend([] <unfinished ...>
[pid 1957] rt_sigsuspend([] <unfinished ...>
[pid 1958] <... rt_sigsuspend resumed> ) = ? ERESTARTNOHAND (To be restarted)
[pid 1958] --- SIGRTMIN (Unknown signal 32) @ 0 (0) ---
[pid 1958] sigreturn() = ? (mask now [])
[pid 1958] rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0
[pid 1958] rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rdrand stressor will be skipped, not a recognised Intel CPU.
tsc stressor will be skipped, not a recognised Intel CPU.
disabled 'cpu-online' as it may hang the machine (enable it with the --pathological option)
dispatching hogs: 16 af-alg, 16 atomic, 16 branch, 16 bsearch, 16 cache, 16 context, 16 cpu, 16 crypt, 16 fp-error, 16 funccall, 16 getrandom, 16 heapsort, 16 hsearch, 16 icache, 16 ioport, 16 lockbus, 16 longjmp, 16 lsearch, 16 malloc, 16 matrix, 16 membarrier, 16 memcpy, 16 mergesort, 16 nop, 16 numa, 16 opcode, 16 qsort, 16 radixsort, 16 str, 16 stream, 16 tree, 16 tsearch, 16 vecmath, 16 wcs, 16 zlib
stress-ng-numa: system has 1 of a maximum 1024 memory NUMA nodes
stress-ng-stream: stressor loosely based on a variant of the STREAM benchmark code
stress-ng-stream: do NOT submit any of these results to the STREAM benchmark results
stress-ng-stream: Using CPU cache size of 8192K