Skip to content

Instantly share code, notes, and snippets.

@kaushikcfd
Last active February 5, 2018 19:53
Show Gist options
  • Save kaushikcfd/19ab706b4246b6b938ae67c4e75d6e54 to your computer and use it in GitHub Desktop.
Save kaushikcfd/19ab706b4246b6b938ae67c4e75d6e54 to your computer and use it in GitHub Desktop.

Below are the timing values in seconds for the kernel matvecs, invovlving different strategies in which the matvec is performed.

The time is in seconds in each table.

POCL and AMD are the 2 OpenCL implementations on which the timings are done.

Single Core

Kernel POCL AMD MatFree PyOP2(SpMV)
Mass 0.122 0.136 0.031 0.011
Laplace 0.124 0.125 0.035 0.011
Hyperelasticity 0.268 0.264 0.105 0.027

Cost of atomics on single core CPU

POCL:

Kernel With Without
Mass 0.122 0.064
Laplace 0.124 0.041
Hyperelasticity 0.268 0.091

AMD:

Kernel With Without
Mass 0.136 0.073
Laplace 0.125 0.057
Hyperelasticity 0.264 0.164

Cost of atmoics on Multicore CPU with Vectorization

Added MatFree just to compare what number are we chasing.

Kernel With Without MatFree
Mass 0.029 0.009 0.002
Laplace 0.020 0.010 0.002
Hyperelasticity 0.041 0.019 0.009

New Lower bounds on bandwidth(using Footprint Measurement)

  • Mass: 52GBps(was 1125 GBps)
  • Laplace: 53GBps(was 270 GBps)
  • Hyperelasticity: 18GBps(was 206 GBps)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment