Skip to content

Instantly share code, notes, and snippets.

View Artem-B's full-sized avatar
⚠️
Busy-ish. Will be slow to respond.

Artem Belevich Artem-B

⚠️
Busy-ish. Will be slow to respond.
View GitHub Profile
@Artem-B
Artem-B / godbolt-templates.json
Created February 25, 2025 18:46
godbolt templates.
{
"t1730931331339": {
"title": "CCCL nvcc+clang_libc++",
"data": "z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIAruiakl9ZATwDKjdAGFUtEywZ7HAGTwNMAHLuAEaYxCAA7ACspAAOqAqEdgwubh568Ym2Ar7%2BQSyh4dGWmNbZDEIETMQEqe6eXCVlyZXVBLmBIWGRMQpVNXXpjX1tHfmFPQCUlqgmxMjsHACkAEwAzH7IblgA1EtrTqbmAPR96McA%2BhdsLCQAnhf72EsaAIIv7%2BtYNP47ACrYIR/C5OADyASB2AAGsoAEogqFQlYaD6rNbfPyYf6A4FgiEAmGw1FvAiYFixAykvYHLZMBQKf5PD59YgmGw7AKoAAimBZqDuqIiACEPjsdgQ7rFMN9/jsAG5iEyYC4SqX7EVvUU7K4IBIEK7ai5YOV4BYGgFAkHgyGEhFIjT/ABUOzEtFQolJECOTBAIDOvsSAC9lQQdgxJlqlsKtWLiJgCHMGDtvQGCOhfa73UwiMR9k4/k8IJMAHSZj2YCDh9WRiJc4mvMU6vUGq7G03Ki7Yy14m1wu3I%2BWoPDoHZYMvZit/Z2xUjJsw%2Bv1pgN4YMqsMRt5iqMahtinZxhPEJMpxfpkDjnN5gtrbBF4tj2hu8sQGfr6ubva1wV1tY7j6k8lKSxPNaXpRkb2ZAhWXZFwTEEHk%2BQFTVow/Js%2BhbI1MBNM1OwtXFrQJOEdkwVQKVNQgdlg%2BDeSg/kID8AhnTQODSWIDddz3HYQGTWZBDCC4vV41j2K3aMv01VCLl1dDO1bLD23NHErXxaE%2BycREBwAPyoggENou4i0/EVxPeD9VWlTAqFlBU3BDSVMHfUzdzQ/VZMw7COy7fCVNtdT7SdF1HyzT0T39P0VxDN8P23GN
@Artem-B
Artem-B / monstral.sh
Last active February 15, 2025 07:29
Download monstral model on runpod.
export HF_HUB_ENABLE_HF_TRANSFER=1 &&
apt update &&
apt install -y nvtop &&
pip install hf_transfer &&
export HF_HUB_ENABLE_HF_TRANSFER=1
MODEL=MikeRoz/MarsupialAI_Dumbstral-169B-5.0bpw-h6-exl2
MODEL=BigHuggyD/MarsupialAI_Monstral-123B-v2_exl2_8.0bpw_h8
MODEL BigHuggyD/TheDrummer_Behemoth-123B-v1.2_exl2_7.0bpw_h8
@Artem-B
Artem-B / pow.log
Last active November 7, 2024 22:41
pow test log
CUDA devices found: 2
Device 0: "NVIDIA RTX 5000 Ada Generation", Selected, SM89, 33937293312 [bytes]
Device 1: "Quadro GP100", Unused, SM60, 17064263680 [bytes]
Testing void test_edges() [T = double] on host
Testing void test_edges() [T = __half] on host
Testing void test_edges() [T = __nv_bfloat16] on host
Testing void test_edges() [T = double] on GPU
imag 0:1 i=1.000000e-02:1.000000e-02 j=-1.000000e-02:1.000000e-02 r=1.034025e+00:-5.220046e-02 z=1.034025e+00:-5.220046e-02 diff=0.000000e+00:6.938894e-18
imag 0:3 i=1.000000e-02:1.000000e-02 j=1.000000e-02:-1.000000e-02 r=9.646358e-01:4.869749e-02 z=9.646358e-01:4.869749e-02 diff=0.000000e+00:-6.938894e-18
real 0:5 i=1.000000e-02:1.000000e-02 j=-1.000000e+02:1.000000e-02 r=-8.804310e+184:3.751669e+183 z=-8.804310e+184:3.751669e+183 diff=-6.038340e+169:-1.251069e+171
@Artem-B
Artem-B / runpod-llm.sh
Last active August 29, 2024 07:23
runpod-llm.sh
#! /bin/sh
MODEL=${}
apt update
apt install -y nvtop htop btop tmux
tmux new-session 'nvtop' \; \
split-window 'python3 download-model.py --threads 15 ${1:-}'
This file has been truncated, but you can view the full file.
sccache stats: N/A No new compilation requests
+ PRESET=thrust-cpp20
+ test_preset Thrust thrust-cpp20
+ local BUILD_NAME=Thrust
+ local PRESET=thrust-cpp20
+ pushd ..
+ ctest --preset=thrust-cpp20
Test project /usr/local/google/home/tra/work/cccl/build/thrust-cpp20
Start 1: thrust.cpp.cuda.cpp20.test.adjacent_difference
1/362 Test #1: thrust.cpp.cuda.cpp20.test.adjacent_difference ...........................***Failed 0.90 sec
from itertools import product
from string import Template
from itertools import product, count
from string import Template
from absl import app
from typing import Sequence
types = [
"char", "signed char", "char1", "char2", "char4", "unsigned char", "uchar1",
"uchar2", "uchar4", "short", "short1", "short2", "short4", "ushort",
@Artem-B
Artem-B / ConvFwd_Add_Add_ReluFwd_eng15_k5=1_k6=0_k7=1_k10=1.log
Created May 11, 2023 17:24
ConvFwd_Add_Add_ReluFwd_eng15_k5=1_k6=0_k7=1_k10=1
Do cudnn execution plan with plan tag: ConvFwd_Add_Add_ReluFwd_eng15_k5=1_k6=0_k7=1_k10=1
Workspace size in bytes: 409856
VariantPack: CUDNN_BACKEND_VARIANT_PACK_DESCRIPTOR : has 5 data pointers
I0509 11:27:06.030540 1504356 cuda_dnn.cc:4319]
Tensor_x: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 120 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,7,9 ] Str [ 4032,63,9,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
Tensor_y: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 121 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,3,4 ] Str [ 768,12,4,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
Tensor_z: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 122 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,3,4 ] Str [ 768,12,4,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
Tensor_w: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_
@Artem-B
Artem-B / ConvFwd_Add_Add_eng15_k5=1_k6=0_k7=1_k10=1.log
Created May 11, 2023 17:20
ConvFwd_Add_Add_eng15_k5=1_k6=0_k7=1_k10=1 log
1687-Tag: ConvFwd_Add_Add_
1688-
1689-[cudnn_frontend] CUDNN_BACKEND_ENGINE_DESCRIPTOR : ID: 15 Has 4 knobs
1690-[cudnn_frontend] CUDNN_BACKEND_ENGINECFG_DESCRIPTOR : Number of knobs: 4
1691:[cudnn_frontend] CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR : ConvFwd_Add_Add_eng15_k5=1_k6=0_k7=1_k10=1, numeric_notes:[CUDNN_NUMERICAL_NOTE_WINOGRAD,CUDNN_NUMERICAL_NOTE_WINOGRAD_TILE_4x4,] behavior_notes:[] workSpaceSize: 895504
1692-[cudnn_frontend] CUDNN_BACKEND_VARIANT_PACK_DESCRIPTOR : has 5 data pointers
1693-[cudnn_frontend] CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_HALF Id: 120 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,96,44,60 ] Str [ 253440,2640,60,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
1694-[cudnn_frontend] CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_HALF Id: 121 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,32,44,60 ] Str [ 84480,2640,60,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
1695-
@Artem-B
Artem-B / check_crash.sh
Created February 4, 2023 02:52
glxinfo crash stats collection script.
#! /usr/bin/bash
GOOD=0
BAD=0
N=100
for ((i=0; i<N; i++)); do
glxinfo -B > /dev/null 2>/dev/null |:
if [[ ${PIPESTATUS[0]} = 0 ]]; then
((GOOD++))