Skip to content

Instantly share code, notes, and snippets.

@rosenrodt
Created March 16, 2020 09:47
Show Gist options
  • Save rosenrodt/53594799a641e19871a4c1d76f18dcdc to your computer and use it in GitHub Desktop.
Save rosenrodt/53594799a641e19871a4c1d76f18dcdc to your computer and use it in GitHub Desktop.
transpose lds
GlobalParameters:
MinimumRequiredVersion: 4.14.0
PrintLevel: 1
ForceRedoBenchmarkProblems: True
ForceRedoLibraryLogic: True
ForceRedoLibraryClient: True
CMakeBuildType: Release
EnqueuesPerSync: 1
SyncsPerBenchmark: 1
LibraryPrintDebug: False
NumElementsToValidate: 65535
ValidationMaxToPrint: 4
ValidationPrintValids: False
ShortNames: False
MergeFiles: True
Platform: 0
Device: 0
KernelTime: True
PinClocks: False
SleepPercent: 200
PrintSolutionRejectionReason: True
DataInitTypeA: 3
DataInitTypeB: 3
#DataInitTypeC: 0
#DataInitTypeD: 0
DataInitTypeBeta: 2
DataInitTypeAlpha: 1
PrintTensorA: 0
PrintTensorB: 0
PrintTensorD: 0
NewClient: 2
BenchmarkProblems:
########################################
# TN - standard
########################################
-
- # ProblemType
OperationType: GEMM
DataType: s
TransposeA: True
TransposeB: False
UseBeta: True
Batched: True
- # BenchmarkProblemSizeGroup - Standard
InitialSolutionParameters:
BenchmarkCommonParameters:
- KernelLanguage: ["Assembly"]
- EdgeType: ["ShiftPtr"]
#- LoopTail: [True]
- PrefetchLocalRead: [False]
ForkParameters:
- MatrixInstruction:
- [32, 32, 1, 2]
- PrefetchGlobalRead: [False]
- ThreadTile:
- [ 1, 32 ]
- [ 2, 32 ]
- WorkGroup:
- [ 32, 8, 1 ]
- [ 64, 4, 1 ]
- WorkGroupMapping: [1]
- GlobalSplitU: [1]
- DepthU: [8,16,32]
- GlobalReadVectorWidth: [4]
- VectorWidth: [4]
# - InnerUnroll: [1, 2]
- TransposeLDS: [1]
- LdsPadA: [0, 2]
- LdsPadB: [0, 2]
#- DisableKernelPieces: [0,1,2,3,4,5,6,7,9]
- SuppressNoLoadLoop: [False, True]
- OptNoLoadLoop: [0, 1]
#- ScheduleLocalWrite: [0]
#- ScheduleGlobalRead: [0]
#- ScheduleIterAlg: [0]
BenchmarkForkParameters:
JoinParameters:
BenchmarkJoinParameters:
BenchmarkFinalParameters:
- ProblemSizes:
- Exact: [ 7680, 8192, 1, 8192, 8224, 8224, 8224, 8224 ]
# - Exact: [ 64, 128, 1, 16 ]
# - Exact: [ 64, 128, 1, 17 ]
#- Exact: [ 7680, 8192, 1, 8192, 7712, 7712, 8224, 8224 ]
########################################
LibraryLogic:
ScheduleName: "arcturus"
DeviceNames: ["Device 7380", "Device 7388", "Device 738c", "Device 7390"]
ArchitectureName: "gfx908"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment