Kernel | Porter | Quail |
---|---|---|
Mass | 0.002 | 0.003 |
Laplace | 0.012 | 0.008 |
Hyperelasticity | 0.11 | 0.052 |
Porter: Nvidia Titan X
Quail: Nvidia K40c
Kernel | NO CSE | WITH CSE |
---|---|---|
Mass | 0.003 | 0.0028 |
Laplace | 0.008 | 0.003 |
Hyperelasticity | 0.052 | 0.0039 |
Intepretation:
- Quite an improvement on using the
CSE
. The effect can be seen a lot in the compute intensive kernels like the Hypereleasticy, where the timing gets enhanced by an order of magnitude. - For hyperelasticity the FLOPs went down from
48936*nel
to130*nel
. (nel
being the number of elements being used) - The time to assemble a kernel also got down sciginificantly because of the CSEs.
Kernel | Loopy | PyOP2 | MatFree |
---|---|---|---|
Mass | 0.029 | 0.0013 | 0.0021 |
Laplace | 0.020 | 0.0013 | 0.0023 |
Hyperelasticity | 0.041 | 0.0034 | 0.0088 |
The hardware has 2x 8 Xeon cores
Interpretation: This shows that there is quite a lot needed to be done on the CPU end.
- No change in the bandwidth numbers.
- Hand calculation of the lower bound of bandwidth yields bandwidth of the range 30 GB/s.
- Finding a way to deal with scheduling the indexed CSEs(we call it Island problem)
- Making
operators.py
compatible with the newerkernel._ir
. - Minor issues with
Rayleigh Bernard
kernel. - Making
loopy
compute thefootprint
bandwidth.