Skip to content

Instantly share code, notes, and snippets.

ubergarm /
Last active February 26, 2025 04:19
Aggregate throughput just over 2 tok/sec on R1 671B with 8 concurrent generations.


You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most used. I can see the model slow down and cache dump and refill when the model switches over to counting words for example.

It is faster on my system using the GPU, but not by much. It may be overall faster to dedicate the GPU PCIe lanes to more NVMe storage in the theory. Curious if anyone has such a fast read IOPS array to try?

Notes and example generations below.

Model Reference

skeeto / triangle.c
Last active April 27, 2024 06:22
Draw a triangle on Windows using OpenGL 1.1
// Draw a triangle on Windows using OpenGL 1.1
// $ gcc -mwindows -o triangle triangle.c -lopengl32
// This is free and unencumbered software released into the public domain.
#include <windows.h>
#include <GL/gl.h>
#define countof(a) (int)(sizeof(a) / (sizeof(*(a))))
static LRESULT CALLBACK handler(HWND h, UINT msg, WPARAM wparam, LPARAM lparam)
thesamesam /
Last active February 26, 2025 01:17
xz-utils backdoor situation (CVE-2024-3094)

FAQ on the xz-utils backdoor (CVE-2024-3094)

This is a living document. Everything in this document is made in good faith of being accurate, but like I just said; we don't yet know everything about what's going on.

Update: I've disabled comments as of 2025-01-26 to avoid everyone having notifications for something a year on if someone wants to suggest a correction. Folks are free to email to suggest corrections still, of course.


skeeto / persona.c
Last active March 25, 2024 06:51
Playing around with a little database
// $ cc -o persona persona.c
// $ ./persona <test.txt
// Ref:
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#define assert(c) while (!(c)) *(volatile int *)0 = 0
#define countof(a) (ptrdiff_t)(sizeof(a) / sizeof(*(a)))
#define new(a, t, n) (t *)alloc(a, sizeof(t), _Alignof(t), n)
skeeto / demo.c
Last active March 8, 2024 03:11
Font rendering demo in SDL2
// Font rendering demo
// $ cc -o demo demo.c $(sdl2-config --cflags --libs)
// Ref:
// This is free and unencumbered software released into the public domain.
#include "SDL.h"
#define FONTW 72
#define FONTH 143
#define CHARW 9
liviaerxin /
Last active February 22, 2025 13:49
FastAPI and Uvicorn Logging #python #fastapi #uvicorn #logging

FastAPI and Uvicorn Logging

When running FastAPI app, all the logs in console are from Uvicorn and they do not have timestamp and other useful information. As Uvicorn applies python logging module, we can override Uvicorn logging formatter by applying a new logging configuration.

Meanwhile, it's able to unify the your endpoints logging with the Uvicorn logging by configuring all of them in the config file log_conf.yaml.

Before overriding:

uvicorn main:app --reload
yaauie /
Created September 6, 2022 15:30
2022 high-level docs for logstash-to-logstash using the HTTP input/output pair

We have had some success using LS-to-LS over HTTP(S), which supports an HTTP(s) Load Balancer or Proxy in the middle, and can be secured with TLS/SSL. It can be made to be quite performant, but doing so requires some specific tuning.

Upstream (HTTP Output)

The upstream pipelie would contain a single HTTP output plugin aimed either directly at a downstream Logstash or at a Load Balancer, importantly configured with:

  • format => json_batch (for performance; without this one event will be sent at a time) and
  • retry_non_idempotent => true (for resilience; without this, some failures cannot be safely retried).

Depending on whether we ar sending directly to another Logstash or through an SSL-terminating Load Balancer or proxy, the output may need to be configured

  • with HTTP Basic credentials (user/password),
raysan5 /
Last active February 22, 2025 05:09
raylib vs SDL - A libraries comparison


In the last years I've been asked multiple times about the comparison between raylib and SDL libraries. Unfortunately, my experience with SDL was quite limited so I couldn't provide a good comparison. In the last two years I've learned about SDL and used it to teach at University so I feel that now I can provide a good comparison between both.

Hope it helps future users to better understand this two libraries internals and functionality.

Table of Content

daqi / rebuild-uos.js
Created July 17, 2022 03:39
UOS 或 deepin 打包流程
// UOS 或 deepin 打包流程
// 参考
// sudo apt-get install dh-make
// sudo apt-get install build-essential
const fs = require('fs-extra');
const path = require('path');
const { spawn } = require('child_process');
const globby = require('globby');
niklaskeerl /
Created May 24, 2021 08:50
Notability local webdav backup

Backup your Notability notes on your machine using webdav


  1. Prepare a folder where you want your backup to be.

  2. Install rclone for your system

  3. Run the webdav server using rclone