Skip to content

Instantly share code, notes, and snippets.

View crazyguitar's full-sized avatar
🎯
Focusing

CHANG-NING TSAI crazyguitar

🎯
Focusing
View GitHub Profile
@ruvnet
ruvnet / MoE.py
Last active October 20, 2025 19:50
A PyTorch implementation of a Mixture of Experts (MoE) model resembling the Mixtral 8x7B architecture, with detailed inline comments. This model combines transformer layers with an MoE layer consisting of 8 experts, aiming for high efficiency by activating only 2 experts per token. It's configured with dimensions reflecting the operational effic…
"""
This model integrates the MoE concept within a Transformer architecture. Each token's
representation is processed by a subset of experts, determined by the gating mechanism.
This architecture allows for efficient and specialized handling of different aspects of the
data, aiming for the adaptability and efficiency noted in the Mixtral 8x7B model's design
philosophy. The model activates only a fraction of the available experts for each token,
significantly reducing the computational resources needed compared to activating all experts
for all tokens.
"""
@crazyguitar
crazyguitar / m256.cc
Created October 5, 2021 20:18 — forked from kaityo256/m256.cc
//----------------------------------------------------------------------
#include <stdio.h>
#include <emmintrin.h>
#include <immintrin.h>
//----------------------------------------------------------------------
void
printm256(__m256d r){
double *a = (double*)(&r);
printf("%f %f %f %f\n",a[0],a[1],a[2],a[3]);
}
@ih2502mk
ih2502mk / list.md
Last active October 31, 2025 21:32
Quantopian Lectures Saved
@mcarilli
mcarilli / nsight.sh
Last active October 29, 2025 07:13
Favorite nsight systems profiling commands for Pytorch scripts
# This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting.
# https://developer.nvidia.com/nsight-systems
# https://docs.nvidia.com/nsight-systems/profiling/index.html
# My preferred nsys (command line executable used to create profiles) commands
#
# In your script, write
# torch.cuda.nvtx.range_push("region name")
# ...
@crazyguitar
crazyguitar / mapread.c
Created January 17, 2020 16:43 — forked from marcetcheverry/mapread.c
mmap and read/write string to file
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, const char *argv[])
{
@mcarilli
mcarilli / commands.md
Last active October 16, 2025 08:53
Single- and multiprocess profiling workflow with nvprof and NVVP (Nsight Systems coming soon...)

Ordinary launch commands (no profiling):

Single-process:

python main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/

Multi-process:

python -m torch.distributed.launch  --nproc_per_node=2 main_amp.py -a resnet50 --b 224 --deterministic --workers 4 --opt-level O1 ./bare_metal_train_val/
@dideler
dideler / bot.rb
Last active October 23, 2025 16:33
Sending a notification message to Telegram using its HTTP API via cURL
# Use this script to test that your Telegram bot works.
#
# Install the dependency
#
# $ gem install telegram_bot
#
# Run the bot
#
# $ ruby bot.rb
#
@mbinna
mbinna / effective_modern_cmake.md
Last active October 28, 2025 09:48
Effective Modern CMake

Effective Modern CMake

Getting Started

For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.

After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft

@mdonkers
mdonkers / server.py
Last active October 23, 2025 16:32
Simple Python 3 HTTP server for logging all GET and POST requests
#!/usr/bin/env python3
"""
License: MIT License
Copyright (c) 2023 Miel Donkers
Very simple HTTP server in python for logging requests
Usage::
./server.py [<port>]
"""
from http.server import BaseHTTPRequestHandler, HTTPServer
@kbarbary
kbarbary / simd-vmv.c
Created October 8, 2016 18:29
Vector-matrix-vector multiplication with SIMD (AVX) intrinsics
// Doing the operation:
//
// | a a a a | | y |
// x * A * y = [ x x x x ] | a a a a | | y |
// | a a a a | | y |
// | a a a a | | y |
//
// with SIMD intrinics (specifically AVX).
//
// adapted from https://gist.github.com/rygorous/4172889