Skip to content

Instantly share code, notes, and snippets.

@LunNova
Last active October 14, 2025 02:31
Show Gist options
  • Save LunNova/5f9419592b07844c1952c8b2366424eb to your computer and use it in GitHub Desktop.
Save LunNova/5f9419592b07844c1952c8b2366424eb to your computer and use it in GitHub Desktop.

Load Logic Perf

Subject Time Peak Memory vs Baseline
[hipblaslt] Use multiprocessing.Pool for TensileCreateLibrary 18.531s 879.0 MB baseline
[hipblaslt] CustomKernels: lru_cache for 20x speedup of some logic files 18.275s 878.3 MB 0% faster, 0% less memory
[hipblaslt] reduce memory usage during logic load 15.500s 873.7 MB 15% faster, 0% less memory
[hipblaslt] Remove unused key arg for getPrimitiveParameterValueAbbreviation 15.535s 873.4 MB 15% faster, 0% less memory
[hipblaslt] intern strings to reduce duplicate memory for solution keys 15.001s 721.9 MB 17.5% faster, 17.5% less memory
[hipblaslt] tensilelite: gc in parallel worker 15.167s 722.1 MB 17.5% faster, 17.5% less memory
[hipblaslt] tensilelite: teach state_key_ordering slots 14.827s 722.5 MB 20% faster, 17.5% less memory
[hipblaslt] tensilelite: intern FreeIndex, BatchIndex, BoundIndex and SizeMapping 14.970s 704.1 MB 17.5% faster, 20% less memory
[hipblaslt] tensilelite: remove unused targetObjFilename 14.998s 702.3 MB 17.5% faster, 20% less memory
[hipblaslt] tensilelite: record code object file index without mutations 14.448s 584.1 MB 22.5% faster, 32.5% less memory

Full ISA TensileCreateLibrary

Time before

Total time (s): 4630.94
Total kernels processed: 196454
Kernels processed per second: 42.42
KernelHelperObjs: 328
# Peak memory ~34GB not logged

Time After

Total time (s): 3418.73
Total kernels processed: 196454
Kernels processed per second: 57.46
KernelHelperObjs: 328
Peak memory usage (MB): 24,266.2
Current memory usage (MB): 21,930.3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment