Skip to content

Instantly share code, notes, and snippets.

View Artem-B's full-sized avatar
⚠️
Busy-ish. Will be slow to respond.

Artem Belevich Artem-B

⚠️
Busy-ish. Will be slow to respond.
View GitHub Profile
// ABI compatibility shims for CUDA-11.7.
// Patch affected libraries with:
// objcopy \
// --redefine-sym cudaCreateTextureObject=cudaCreateTextureObject_v115 \
// --redefine-sym cudaGetTextureObjectTextureDesc=cudaGetTextureObjectTextureDesc_v115 \
// --redefine-sym cublasGetVersion_v2=cublasGetVersion_v2_v115 \
// --redefine-sym cublasLtGetVersion=cublasLtGetVersion_v115 \
// libnvinfer_static.a libcudnn_static.a
//
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#include "cuda_runtime.h"
#include "helper_cuda.h" // from cuda_samples
// Copy of texture descriptor from CUDA-11.7, so we can build the sample
// in a way that simulates compilation with an older CUDA version.
@Artem-B
Artem-B / stutter.md
Last active December 11, 2021 07:55
Windows audio/video stutter.md

Now and then video/sound stutters for about a second or two. I do have Hyper-V enabled and do use WSL2 (and docker, configured to use it), but in my case the issue does seem to happen when none of WSL2 VMs are running, so it's possible that it may be more of a Hyper-V issue than WSL2 itself.

I've managed to capture it with ETW trace. As far as I can tell, during this time, everything stalls for about 50ms, then resumes for 10ms and this cycle continues.

image

Absolutely no events get captured during the quiet periods. Here's one example: image

; Reproducer for a bad performance regression triggered by switch to the new PM.
; `barney` ended up with the local variables not being optimized away and that
; had rather dramatic effect on some GPU code. See
; https://bugs.llvm.org/show_bug.cgi?id=52037 for the gory details.
;
; NOTE that opt -O3 produces different IR.
;
; RUN: opt -mtriple=nvptx64-nvidia-cuda -passes='default<O3>' -S %s -o - \
; RUN: | llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - \
; RUN: | FileCheck %s
;*** IR Dump After Combine redundant instructions *** (function: _ZN8cuforces12forcesDeviceI13forces_paramsIL10KernelType3EL14SPHFormulation1EL20DensityDiffusionType3EL12BoundaryType4E12FullViscSpecIL12RheologyType0EL15TurbulenceModel1EL26ComputationalViscosityType0EL12ViscousModel0EL15AverageOperator0ELm517ELb0EELm517EL12ParticleType1ELSD_0EL7RunMode1ELb0ELb0ELb0ELb0E5emptyI18xsph_forces_paramsESF_I20volume_forces_paramsESF_I21grenier_forces_paramsESF_I25sa_boundary_forces_paramsESF_I28dummy_boundary_forces_paramsESF_I25water_depth_forces_paramsESF_I18keps_forces_paramsESF_I14tau_tex_paramsESF_I22eulerVel_forces_paramsESF_I29internal_energy_forces_paramsESF_I28effective_visc_forces_paramsEELS2_3ELS3_1ELS4_3ELS5_4ESC_Lm517ELSD_1ELSD_0EEEvT_)
; ModuleID = 'reduced.ll.ll'
source_filename = "<stdin>"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
%struct.char3 = type { i8, i8, i8 }
%"class.cuneibs::neiblist_iterator_core" = type <{ i32*, i16*, %struct.float
;*** IR Dump After Straight line strength reduction (slsr) *** (function: _ZN8cuforces12forcesDeviceI13forces_paramsIL10KernelType3EL14SPHFormulation1EL20DensityDiffusionType3EL12BoundaryType4E12FullViscSpecIL12RheologyType0EL15TurbulenceModel1EL26ComputationalViscosityType0EL12ViscousModel0EL15AverageOperator0ELm517ELb0EELm517EL12ParticleType1ELSD_0EL7RunMode1ELb0ELb0ELb0ELb0E5emptyI18xsph_forces_paramsESF_I20volume_forces_paramsESF_I21grenier_forces_paramsESF_I25sa_boundary_forces_paramsESF_I28dummy_boundary_forces_paramsESF_I25water_depth_forces_paramsESF_I18keps_forces_paramsESF_I14tau_tex_paramsESF_I22eulerVel_forces_paramsESF_I29internal_energy_forces_paramsESF_I28effective_visc_forces_paramsEELS2_3ELS3_1ELS4_3ELS5_4ESC_Lm517ELSD_1ELSD_0EEEvT_)
; ModuleID = 'reduced.ll.ll'
source_filename = "<stdin>"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"
%struct.char3 = type { i8, i8, i8 }
%struct.float4 = type { float, float, float, float }
%struct.float3
; Compile with:
;
; clang "-cc1" "-triple" "nvptx64-nvidia-cuda" "-aux-triple" "x86_64-pc-linux-gnu"
; "-S" "-disable-llvm-verifier" "-discard-value-names" "-main-file-name"
; "DamBreak3D.cu" "-mrelocation-model" "static" "-mframe-pointer=all"
; "-fno-rounding-math" "-fno-verbose-asm" "-no-integrated-as" "-aux-target-cpu"
; "x86-64" "-fcuda-is-device"
; "-target-feature" "+ptx70" "-target-sdk-version=11.0" "-target-cpu" "sm_70"
; "-O3" "-x" "ir"
--- bin/res.clang 2021-10-05 16:31:10.824553505 -0700
+++ bin/res.clang11 2021-10-05 16:56:46.800860833 -0700
@@ -11,39 +11,39 @@
Common:
GLOBAL:402 CONSTANT[3]:2844
Function _ZN13cupostprocess14calcVortDeviceIL10KernelType3EL12BoundaryType4EEEv24neibs_interaction_paramsIXT0_ENSt11conditionalIXcvbeqT0_LS2_2EE18sa_boundary_params5emptyIS5_EE4typeEEP6float3:
- REG:47 STACK:112 SHARED:0 LOCAL:0 CONSTANT[0]:424 CONSTANT[2]:16 TEXTURE:0 SURFACE:0 SAMPLER:0
+ REG:46 STACK:0 SHARED:0 LOCAL:0 CONSTANT[0]:424 CONSTANT[2]:16 TEXTURE:0 SURFACE:0 SAMPLER:0
Function _ZN13cupostprocess20calcTestpointsDeviceIL10KernelType3EL12BoundaryType4E12FullViscSpecIL12RheologyType0EL15TurbulenceModel1EL26ComputationalViscosityType0EL12ViscousModel0EL15AverageOperator0ELm513ELb0EEEEvNS_17testpoints_paramsIXT0_ET1_XeqsrSB_9turbmodelLS5_3EE24neibs_interaction_paramsIXT0_ENSt11conditionalIXcvbeqT0_LS2_2EE18sa_boundary_params5emptyISE_EE4typeEENSD_IXcvbeqsrSB_9turbmodelLS5_3EE15keps_tex_paramsSF_ISK_EE4typeENSD_IXcvbeqsrSB_9turb
--- bin/res.clang 2021-10-05 16:31:10.824553505 -0700
+++ bin/res.nvcc 2021-10-05 16:31:00.712472386 -0700
@@ -9,219 +9,232 @@
Resource usage:
Common:
- GLOBAL:402 CONSTANT[3]:2844
+ GLOBAL:0
+
+Fatbin elf code:
namespace {
template <int N>
struct __Tag;
# 54 "__clang_cuda_texture_intrinsics.h" 3
template <class>
struct __FT;
template <>