Last active
April 13, 2023 06:01
-
-
Save Bulat-Ziganshin/653e723608c7406b96f8f899cba6fab4 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
https://devtalk.nvidia.com/default/topic/933827/cuda-programming-and-performance/fast-256-bin-histogram/ | |
http://www.cse.uconn.edu/~zshi/course/cse5302/ref/chhugani08sorting.pdf | |
http://link.springer.com/chapter/10.1007/978-3-642-23397-5_16 | |
http://arxiv.org/abs/1008.2849 Faster Radix Sort via Virtual Memory and Write-Combining Jan Wassenberg, Peter Sanders | |
https://devtalk.nvidia.com/default/topic/378826/cuda-programming-and-performance/my-speedy-sgemm/post/2703033/#2703033 | |
https://devtalk.nvidia.com/default/topic/390366/cuda-programming-and-performance/instruction-latency/post/2768197/#2768197 | |
https://devtalk.nvidia.com/default/topic/913832/cuda-programming-and-performance/sum-reduction-working-in-fermi-kepler-and-maxwell/ | |
https://devtalk.nvidia.com/default/topic/776043/cuda-programming-and-performance/whats-new-in-maxwell-sm_52-gtx-9xx-/1 | |
https://devtalk.nvidia.com/default/topic/690631/cuda-programming-and-performance/so-whats-new-about-maxwell-/post/4305310/#4305310 | |
https://devtalk.nvidia.com/default/topic/878664/cuda-programming-and-performance/custom-memory-allocator-for-cuda-desired/post/4755097/#4755097 | |
https://devtalk.nvidia.com/default/topic/695408/first-impressions-of-cuda-6-managed-memory/ | |
https://devtalk.nvidia.com/default/topic/878455/cuda-programming-and-performance/gtx750ti-and-buffers-gt-1gb-on-win7/post/4672378/#4672378 | |
https://devtalk.nvidia.com/default/topic/638031/ldg-versus-textures/ | |
http://stackoverflow.com/questions/17004557/how-to-avoid-tlb-miss-and-high-global-memory-replay-overhead-in-cuda-gpus | |
https://devtalk.nvidia.com/default/topic/873995/global-memory-access-bottleneck/ | |
http://www.techenablement.com/inside-nvidias-unified-memory-multi-gpu-limitations-and-the-need-for-a-cudamadvise-api-call/ | |
https://parallel-computing.pro/index.php/9-cuda/43-openmp-4-0-on-nvidia-cuda-gpus | |
GPGPU-Sim 3.x Manual http://archive.is/krK9N | |
https://devtalk.nvidia.com/default/topic/928796/gpu-accelerated-libraries/moderngpu-2-0/ | |
https://devtalk.nvidia.com/default/topic/844924/announcements/cudapad-and-its-source-code-are-now-available-for-download-/ | |
CUB: http://on-demand.gputechconf.com/gtc/2015/video/S5617.html | |
https://www.microway.com/hpc-tech-tips/cub-action-simple-examples-using-cub-template-library/ | |
http://on-demand.gputechconf.com/gtc-express/2013/videos/understanding-parallel-graph-algorithms.mp4 | |
http://on-demand.gputechconf.com/gtc/2013/webinar/essential-optimization-techniques-for-nvidia-kepler-and-fermi-architecture.mp4 | |
https://devtalk.nvidia.com/default/topic/799429/cuda-programming-and-performance/possible-to-use-the-cuda-math-api-integer-intrinsics-to-find-the-nth-unset-bit-in-a-32-bit-int/post/4407256/#4407256 | |
https://devtalk.nvidia.com/default/topic/804281/cuda-programming-and-performance/maxwell-integer-mul-mad-instruction-counts | |
https://devtalk.nvidia.com/default/topic/980740/cuda-programming-and-performance/xmad-meaning/ | |
http://stackoverflow.com/questions/35566178/instruction-replay-in-cuda/35593124#35593124 | |
https://devtalk.nvidia.com/default/topic/937736/cuda-programming-and-performance/saturated-16-bit-1-15-float-hack/post/4887190/#4887190 | |
http://stackoverflow.com/questions/37732735/nvprof-option-for-bandwidth | |
https://devtalk.nvidia.com/default/topic/1006066/cuda-programming-and-performance/pascal-l1-cache | |
https://devtalk.nvidia.com/default/topic/1009766/cuda-programming-and-performance/single-gpu-core-vs-single-cpu-core | |
http://www.hardware.fr/articles/948-2/gp104-7-2-milliards-transistors-16-nm.html | |
http://www.hardware.fr/articles/951-2/polaris-10-5-7-milliards-transistors-14-nm.html | |
the way i learned sass is | |
1. ptx manual: http://docs.nvidia.com/cuda/parallel-thread-execution/ | |
2. http://docs.nvidia.com/cuda/cuda-binary-utilities/#instruction-set-ref | |
3. https://github.com/laanwj/decuda | |
4. read wiki of asfermi project: https://github.com/hyqneuron/asfermi/wiki | |
5. read manual of kepler sass: https://hpc.aliyun.com/doc/keplerAssemblerUserGuide | |
6. there is also maxas, but its docs doesn't describe commands | |
low-level benchmarks: | |
http://www.cs.berkeley.edu/~volkov/volkov10-GTC.pdf | |
http://www.stuffedcow.net/research/cudabmk Demystifying GPU Microarchitecture through Microbenchmarking | |
http://asg.ict.ac.cn/dgemm/microbenchs.tar.gz | |
http://repository.lib.ncsu.edu/ir/bitstream/1840.16/9585/1/etd.pdf | |
https://hal.inria.fr/file/index/docid/789958/filename/112_Lai.pdf | |
http://hgpu.org/?p=14541 Dissecting GPU Memory Hierarchy through Microbenchmarking | |
http://hgpu.org/?p=16616 Understanding Latency Hiding on GPUs by Vasily Volkov | |
a few books with low-level GPU details: | |
http://www.cudahandbook.com/ | |
Shane Cook "CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs" | |
Rob Farber "CUDA Application Design and Development" | |
David Kirk, Wen-mei Hwu "Programming Massively Parallel Processors" | |
Talks: | |
http://on-demand-gtc.gputechconf.com/gtc-quicklink/9BNvqKX | |
http://on-demand.gputechconf.com/gtc/2013/presentations/S3466-Programming-Guidelines-GPU-Architecture.pdf | |
http://on-demand.gputechconf.com/gtc/2016/presentation/s6807-angerer-dynamic-parallelism.pdf | |
AMD: | |
https://radeonopencompute.github.io/documentation.html | |
http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf | |
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/documentation/ | |
http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf | |
https://forum.beyond3d.com/posts/1721467/ | |
https://forum.beyond3d.com/threads/amd-southern-islands-7-series-speculation-rumour-thread.50220/page-22#post-1515943 | |
https://github.com/SunsetQuest/Asm4GCN | |
https://realhet.wordpress.com/ | |
http://x.pgy.hu/~worm/het/hp/GCN_Reference_Card.html | |
http://www.asmcommunity.net/forums/topic/?id=30544 | |
Intel: | |
https://software.intel.com/en-us/articles/introduction-to-gen-assembly | |
ARM: | |
http://www.anandtech.com/show/10375/arm-unveils-bifrost-and-mali-g71/2 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment