Skip to content

Instantly share code, notes, and snippets.

@dappelha
dappelha / .f90
Created November 8, 2024 23:12
OpenACC overlap of GPU work and CPU work
! transfrom this:
Do step = 1, 100
acc kernel
acc update()
Call diagnostics_on_cpu()
end
!Needs to change to:
Step=1
acc kernel async(step)
@dappelha
dappelha / gist:2101e53893c0abaf6273015870a1ec81
Created December 13, 2023 18:35
3D nested loops on GPUs
for (int l = 0; l < ld; l++)
{
for (int k = 0; k < kd; k++)
{
for (int j = 0; j < jd; j++)
{
if (l > 0 && l < (ld - 1) && k > 0 && k < (kd - 1) && j > 0 && j < (jd - 1))
{
jp = j + 1;
jm = j - 1;
@dappelha
dappelha / loop_unrolling.F90
Created October 25, 2020 03:02
How to trick the compiler to unroll loops for you without manually unrolling the loop.
! Modern compilers with -O3 usually unroll loops when the start and stop bounds of the loop are known
! at compile time. Here is an example where I use a new secondary loop with fixed bounds to unroll by
! the amount specified in the parameter nunroll. This allows the routine to be general with only a change
! to nunroll (and a recompile) to unroll by a different amount.
program loop_unrolling
implicit none
integer :: i, ii, iend, istart
integer, parameter :: nunroll=2
#!/bin/bash
world_rank=$PMIX_RANK
let local_size=$RANKS_PER_SOCKET
export CUDA_CACHE_PATH=/dev/shm/$USER/nvcache_$PMIX_RANK
executable=$1
shift
if [ $world_rank = $PROFILE_RANK ]; then
nvprof -f -o $PROFILE_PATH $executable "$@"
else
$executable "$@"
@dappelha
dappelha / nvtx_mod.F90
Last active October 9, 2024 02:51
Fortran module that provides interface with NVIDIA Tools Extension (NVTX) library. This version works with XLF which requires valid arguements to c_loc.
! Tested Oct 2024 with gfortran.
module nvtx_mod
use iso_c_binding
implicit none
integer,private :: col(7) = [ int(Z'0000ff00'), int(Z'000000ff'), int(Z'00ffff00'), int(Z'00ff00ff'),&
int(Z'0000ffff'), int(Z'00ff0000'), int(Z'00ffffff') ]
!character(len=256), private :: tempName
character, private, target :: tempName(256)