Skip to content

Instantly share code, notes, and snippets.

View craftgear's full-sized avatar
🏠
Working from home

Captain Emo craftgear

🏠
Working from home
View GitHub Profile
@ubergarm
ubergarm / DeepSeek-R1-Quantized-GGUF-Gaming-Rig-Inferencing-Fast-NVMe-SSD.md
Last active April 17, 2025 16:55
Run DeepSeek R1 671B unsloth GGUF locally with ktransformers or llama.cpp on high end gaming rig!

tl;dr;

UPDATE Mon Mar 10 10:51:31 AM EDT 2025 Check out the newer ktransformers guide for how to get it running faster! About 3.5 tok/sec on this same gaming rig. Big thanks to Supreeth Koundinya with analyticsindiamag.com for the article!

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most u

@kalomaze
kalomaze / llm_samplers_explained.md
Last active May 5, 2025 04:39
LLM Samplers Explained

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

  • Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
  • 1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
  • Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4
@kalomaze
kalomaze / local_llm_glossary.md
Last active April 20, 2025 17:30
Local LLM Glossary v2

Kalomaze's Local LLM Glossary

Not super comprehensive (yet), but I think having up to date documentation like this should be quite helpful for those out of the loop. Things change all the time in local AI circles, and it can be dizzying to catch up from an outsider's perspective, especially if you are new to the more technical aspects of language models in general (and not just locally hosted LLMs).

Available Models

Llama

  • A language model series created by Meta. Llama 1 was originally leaked in February 2023; Llama 2 then officially released later that year with openly available model weights & a permissive license. Kicked off the initial wave of open source developments that have been made when it comes to open source language modeling. The Llama series comes in four distinct sizes: 7b, 13b, 34b (only Code Llama was released for Llama 2 34b), and 70b. As of writing, the hotly anticipated Llama 3 has yet to arrive.

Mistral

  • Mistral AI is a French company that also distributes open weight
@slightfoot
slightfoot / transitions_fun.dart
Created January 8, 2020 21:31
Fun with Route Animations - by Simon Lightfoot - #HumpDayQandA - 8th Janurary 2020
// MIT License
//
// Copyright (c) 2019 Simon Lightfoot
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
@slightfoot
slightfoot / page_turn.dart
Created September 9, 2019 21:28
Page Turn Effect - By Simon Lightfoot. Replicating this behaviour. https://www.youtube.com/watch?v=JqvtZwIJMLo
// MIT License
//
// Copyright (c) 2019 Simon Lightfoot
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
@pingrishabh
pingrishabh / .gitignore
Created March 3, 2019 04:18
Standardised gitignore for your Flutter apps & Dart projects
### Flutter Generated
# Miscellaneous
*.class
*.lock
*.log
*.pyc
*.swp
.DS_Store
.atom/
@busypeoples
busypeoples / Simplified-Redux-Reducers.js
Created January 2, 2018 23:21
Simplified Redux Reducers
import Maybe from 'folktale/maybe';
const Inc = 'Inc';
const Dec = 'Dec';
const IncBy = 'IncBy';
const IncFn = state => state + 1;
const DecFn = state => state - 1;
const IncByFn = (state, action) => state + action.incBy;
@vbsessa
vbsessa / firefox-devtools.md
Last active April 28, 2025 06:23
How to customize Firefox devtools fonts
  1. Open ~/.mozilla/firefox/<your_profile>/chrome/userContent.css (create it if does not exist).

  2. Paste the following content in it.

     @namespace url(http://www.w3.org/1999/xhtml);
     @-moz-document regexp("chrome://browser/content/devtools/**/.*"){
         .devtools-monospace {
             font-family: Consolas, monospace !important;
             font-size: 8pt !important;
         }
    

}

@rrag
rrag / README.md
Last active March 19, 2024 16:11
Yet another tutorial and Cheat sheet to Functional programming

There are many tutorials and articles available online which explain functional programming. Examples show small functions, which are composed into others which again get composed. It is hard to imagine how it would all work, then come the analogies and then the math. While the math is necessary to understand it can be difficult to grasp initially. The analogies on the other hand, (at least for me) are not relatable. Some articles assume the reader knows the different terminologies of FP. Over all I felt it is not inviting to learn.

This introduction is for those who have had a tough time understanding those analogies, taken the plunge to functional programming but still have not been able to swim. This is yet another tutorial on functional programming

Terminology

Functions as first class citizens

Functions are first class means they are just like anyone else, or rather they are not special, they behave the same as say primitives or strings or objects.

@DavidYKay
DavidYKay / simple_cb.py
Last active February 13, 2025 13:54
Simple color balance algorithm using Python 2.7.8 and OpenCV 2.4.10. Ported from: http://www.morethantechnical.com/2015/01/14/simplest-color-balance-with-opencv-wcode/
import cv2
import math
import numpy as np
import sys
def apply_mask(matrix, mask, fill_value):
masked = np.ma.array(matrix, mask=mask, fill_value=fill_value)
return masked.filled()
def apply_threshold(matrix, low_value, high_value):