| Subject | Time | Peak Memory | vs Baseline |
|---|---|---|---|
| [hipblaslt] Use multiprocessing.Pool for TensileCreateLibrary | 18.531s | 879.0 MB | baseline |
| [hipblaslt] CustomKernels: lru_cache for 20x speedup of some logic files | 18.275s | 878.3 MB | 0% faster, 0% less memory |
| [hipblaslt] reduce memory usage during logic load | 15.500s | 873.7 MB | 15% faster, 0% less memory |
| [hipblaslt] Remove unused key arg for getPrimitiveParameterValueAbbreviation | 15.535s | 873.4 MB | 15% faster, 0% less memory |
| [hipblaslt] intern strings to reduce duplicate memory for solution keys | 15.001s | 721.9 MB | 17.5% faster, 17.5% less memory |
| [hipblaslt] tensilelite: gc in parallel worker | 15.167s | 722.1 MB | 17.5% faster, 17.5% less memory |
| [hipblaslt] tensilelite: teach state_key_ordering slots | 14.827s | 722.5 MB | 20% faster, 17.5% less memory |
| [hipblaslt] tensilelite: intern FreeIndex, BatchIndex, BoundIndex and SizeMapping | 14.970s | 704.1 MB | 17.5% faster, 20% less memory |
| [hipblaslt] tensilelite: remove unused targetObjFilename | 14.998s | 702.3 MB | 17.5% faster, 20% less memory |
| [hipblaslt] tensilelite: record code object file index without mutations | 14.448s | 584.1 MB | 22.5% faster, 32.5% less memory |
Time before
Total time (s): 4630.94
Total kernels processed: 196454
Kernels processed per second: 42.42
KernelHelperObjs: 328
# Peak memory ~34GB not logged
Time After
Total time (s): 3418.73
Total kernels processed: 196454
Kernels processed per second: 57.46
KernelHelperObjs: 328
Peak memory usage (MB): 24,266.2
Current memory usage (MB): 21,930.3