Skip to content

Instantly share code, notes, and snippets.

View Ryu1845's full-sized avatar
🎯
Focusing

Sofian Mejjoute Ryu1845

🎯
Focusing
View GitHub Profile
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@LiquidityC
LiquidityC / Makefile
Last active March 21, 2025 07:48
Generic drop in Makefile
VERSION = \"1.0.0\"
PREFIX ?= out
INCDIR = include
SRCDIR = src
LANG = c
OBJDIR = .obj
MODULE ?= binary_name
CC ?= gcc
@sekstini
sekstini / Residual_FSQ_Example.ipynb
Last active April 23, 2024 07:41
Residual FSQ MNIST Example
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@OhadRubin
OhadRubin / combine_txt.md
Created September 28, 2023 13:13
combine_txt_prompt

Instructions

Your task: Combine multiple texts into one detailed document. Include every piece of information from each source. The goal is to avoid repetition while being thorough and exhaustive.

Essential steps:

  1. Organize structure carefully.
  2. Integrate all details.
  3. Avoid redundancy.

Warnings:

  • Be precise, not general.
@ChrisHayduk
ChrisHayduk / merge_qlora_with_quantized_model.py
Last active April 18, 2025 08:23
Merging QLoRA weights with quantized model
"""
The code below combines approaches published by both @eugene-yh and @jinyongyoo on Github.
Thanks for the contributions guys!
"""
import torch
import peft
@KohakuBlueleaf
KohakuBlueleaf / retention.py
Created July 20, 2023 09:36
A simple implementation of retention (from https://arxiv.org/pdf/2307.08621.pdf)
import torch
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange
def parallel_retention(
q, k, v, # bsz, heads, seq_len, dim
decay_mask = None # heads, seq_len, seq_len
@cloneofsimo
cloneofsimo / flash.py
Created June 22, 2023 07:51
FlashAttention comparison
import pytest
import torch
import triton
import triton.language as tl
@triton.jit
def _fwd_kernel(
Q, K, V, sm_scale,
@Birch-san
Birch-san / opencv-cuda.md
Last active June 5, 2024 13:24
Building OpenCV with CUDA acceleration

For CUDA 12, see Installing CUDA 12.1.1 + PyTorch nightly + Python 3.10 on Ubuntu 22.10 for how to install Nvidia driver 530, gcc 12 and CUDA 12.1.1 libraries.
If you want CUDA 11.8, then you can use latest Nvidia driver from Production branch, 525, with gcc 11.

Activate your conda environment, if you haven't done so already.

CUDA 11:
Make sure gcc 11 is the default gcc for your OS, or select gcc 11 explicitly.
CUDA 12:
Make sure gcc 12 is the default gcc for your OS, or select gcc 12 explicitly.
Check CUDA_DIR below points to the CUDA installation you wish to use.

@Chillee
Chillee / mfu_compute.py
Last active March 2, 2025 22:10
Compute Flop Utilization in PyTorch
import torch
from torch.utils.flop_counter import FlopCounterMode
from triton.testing import do_bench
def get_flops_achieved(f):
flop_counter = FlopCounterMode(display=False)
with flop_counter:
f()
total_flops = flop_counter.get_total_flops()
ms_per_iter = do_bench(f)
from dataclasses import dataclass
from functools import partial
from itertools import cycle
import logging
import multiprocessing as std_mp
import socket
import warnings
import dill
import os