Skip to content

Instantly share code, notes, and snippets.

View LunNova's full-sized avatar
🌐
worldbuilding in progress

Luna LunNova

🌐
worldbuilding in progress
View GitHub Profile
@LunNova
LunNova / broken-ra.rs
Created July 6, 2025 02:27
failed attempt at using rust-analyzer to map macro expansion to original lines in a project. always gets no items in macro expansion
[dependencies]
# HIR analysis - use consistent versions
ra_ap_hir = "0.0.289"
ra_ap_base_db = "0.0.289"
ra_ap_hir_def = "0.0.289"
ra_ap_hir_expand = "0.0.289"
ra_ap_ide_db = "0.0.289"
ra_ap_load-cargo = "0.0.289"
ra_ap_paths = "0.0.289"
// Claude Code found this file hanging out in a project and had this to say:
// FIXME: This entire file is cursed unsafe wizardry that needs investigation!
// - Uses very unsafe transmute tricks with MaybeUninit that might be UB
// - Has two different make_owned implementations for unclear reasons
// - Liberal use of unreachable_unchecked()
// - Unclear if this is even used anywhere or just abandoned experimental code
// - The replace_with() function looks particularly unsound
// Rest of the file sadly by past!LunNova. I don't remember what I was trying to do and it seems like a mess.
// ... we already have CoW why did I make this
no-screen boot has USB C power and has no display
screen boot is unplugged and works
--- dmesg-no-screen.log 2025-03-27 08:54:36.483875142 -0700
+++ dmesg-screen.log 2025-03-27 08:54:54.849459200 -0700
@@ -158,8 +158,8 @@
ITS@0x0000000017040000: Devices Table too large, reduce ids 32->19
ITS@0x0000000017040000: Devices too large, reduce ITS pages 1024->256
ITS@0x0000000017040000: allocated 131072 Devices @880300000 (indirect, esz 8, psz 4K, shr 1)
-ITS@0x0000000017040000: allocated 4096 Interrupt Collections @88026a000 (flat, esz 1, psz 4K, shr 1)
-GICv3: using LPI property table @0x0000000880290000
ck::BlockwiseGemmXdlops_pipeline_v3<ck::BlockGemmPipelineScheduler::Intrawave, 256, unsigned short, unsigned short, unsigned short, float, ck::TensorDescriptor<ck::Tuple<ck::UnMerge<ck::Tuple<ck::integral_constant<int, 2>, ck::integral_constant<int, 1>, ck::integral_constant<int, 28>, ck::integral_constant<int, 32>, ck::integral_constant<int, 1>, ck::integral_constant<int, 8>>, false>, ck::PassThrough<ck::integral_constant<int, 2>>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::Xor<ck::Tuple<ck::integral_constant<int, 28>, ck::integral_constant<int, 32>>, true>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::PassThrough<ck::integral_constant<int, 8>>, ck::PassThrough<ck::integral_constant<int, 2>>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::UnMerge<ck::Tuple<ck::integral_constant<int, 4>, ck::integral_constant<int, 7>>, false>, ck::UnMerge<ck::Tuple<ck::integral_constant<int, 1>, ck::integral_constant<int, 32>>, false>, ck::PassThrough<ck::integral_constant<int, 1>>, ck::PassThrough<ck::in
@LunNova
LunNova / llvm.diff
Last active January 21, 2025 04:16
git diff ab3a7d91d01d4f4b07b5ea449794e106864fd043..e553730a8ea7cce6fc9df01fa78bfe8a62c28cbc pkgs/development/compilers/llvm/common
/llvm
Diff of llvm dir from working to broken commit when rebasing rocm PR
diff --git a/pkgs/development/compilers/llvm/common/bolt/default.nix b/pkgs/development/compilers/llvm/common/bolt/default.nix
index 1662aa52dde2..9bc078df0e44 100644
--- a/pkgs/development/compilers/llvm/common/bolt/default.nix
+++ b/pkgs/development/compilers/llvm/common/bolt/default.nix
@@ -8,6 +8,7 @@
cmake,
libxml2,
diff --git a/src/include/bootstrap.h b/src/include/bootstrap.h
index 8c5f081..9922b79 100644
--- a/src/include/bootstrap.h
+++ b/src/include/bootstrap.h
@@ -10,11 +10,13 @@
#include "nccl.h"
#include "comm.h"
+// this is accessed through unaligned ptrs because ncclUniqueId is a typedef of char[128]
struct ncclBootstrapHandle {
I1215 08:32:37.869000 4070434 torch/_inductor/config.py:635] compile_threads set to 12 via env
using device: cuda:2
using device: cuda:1
using device: cuda:3
using device: cuda:5
using device: cuda:4
using device: cuda:0
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO Bootstrap : Using eno1np0:10.5.5.236<0>
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NET/Plugin: No plugin found (librccl-net.so)
tsukiakari-nixos:4070434:4070434 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : librccl-net.so: cannot open shared object file: No such file or directory : when loading librccl-net.so
cmake flags: -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DCMAKE_INSTALL_LOCALEDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/doc/hipblaslt -DCMAKE_INSTALL_INFODIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable-20241122/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/2pkfgm8zq2y9zjajgp7qqqyv3f0m9nga-hipblaslt-unstable
@LunNova
LunNova / bert-tiny-amd.md
Created October 10, 2024 16:47 — forked from fxkamd/bert-tiny-amd.md
Solutions to problems with BERT training with tinygrad on AMD GPUs

Thank you to tiny corp for pointing out some problems running BERT training with Tinygrad on AMD GPUs in this Tweet. We had a few engineers at AMD take a look at the problem and they were quickly able to reproduce it.

What they found was an issue related to CWSR (compute wave save restore), which is a mechanism that allows our driver and firmware to preempt and reschedule long-running compute waves on our GPUs. The GFXv11 GPU line requires a workaround to set COMPUTE_PGM_RSRC1.PRIV=1 when dispatching a compute kernel. Normally this is handled by the AQL DISPATCH packet. However, since the Tinygrad implementation leverages a custom runtime, it requires this workaround in its PM4-based dispatch. This patch is specific to GFXv11 GPUs. Other GPUs do not require it and should not use this workaround. The following KFDTest patch can be used as a reference: https://github.com/ROCm/ROCT-Thunk-Interface/commit/507637ed5b82197eecbf483cdc1234939766549a

While inv

/// Guards a scope against unwinding, calling a handler if unwinding occurs.
///
/// - Handler gets no panic info; limited to `Fn()`
/// - Not reentrant; handler panics may cause program abort
/// - Intended for single scope; don't store or share
/// - Ineffective with panic=abort
pub struct UnwindDetector<T: Fn()> {
handler: T,
}