Skip to content

Instantly share code, notes, and snippets.

@goatandsheep
Created April 1, 2016 22:51
Show Gist options
  • Save goatandsheep/032da0794feac7a801b737c89bfe8330 to your computer and use it in GitHub Desktop.
Save goatandsheep/032da0794feac7a801b737c89bfe8330 to your computer and use it in GitHub Desktop.
4F03 notes

Tutorial 2016-03-31

  • Kemal Ahmed

  • example is posted online, mirrors the renderer function
  • Should not need to do any more then just add pragmas to the code
  • bottleneck is the renderer 2 fro loops that iterate through all pixels in the image, this process is super parallel, no dependencies
  • extern function parallelizing:
  • make file will be posted (use pgCC for all files! )
  • sshfs to mount the files!
  • make sure it compiles before you add pragmas with pgCC on server
  • use open acce api from openacc.org

  1. both loops: #pragma acc parallel
  2. get error that sum function need routine information. #pragma acc routine ____ above sum function use (seq)
  3. still erring due to extern function, write above extern declaration as well # pragma acc routine seq
    1. (needs to be above both)
  4. make data copies explicit! pragma acc parallel
    1. #pragma acc loop
    2. #pragma acc loop (above 2 for loops)
    3. copyin/copyout (any variables in acc region)
    4. copying = pcopyin (faster is p)
    5. #pragma acc parallel copyout(arr[0:height*width]) create(tmp, arbitrary) copyin(params)

  • if passing image around with image[0:width*height] is 1/3 of array, since array is [r][g][b], so use image[0:width*height*3]
  • compile with tesla, no ACC, get makefile running
  • compile c++ with pgcc NOT pgcc++
  • copy: brings stuff back and forth cpu-gpu
  • copyin: brings stuff into gpu, but not back
  • private(): when you don't want the same variable to be changed by multiple threads
  • Pair
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment