Skip to content

Instantly share code, notes, and snippets.

@Hermann-SW
Last active July 10, 2025 19:38
Show Gist options
  • Save Hermann-SW/8f82f3f16a8f6b93e7a222405933bec9 to your computer and use it in GitHub Desktop.
Save Hermann-SW/8f82f3f16a8f6b93e7a222405933bec9 to your computer and use it in GitHub Desktop.
OpenCL example notes
see comments
@Hermann-SW
Copy link
Author

Hermann-SW commented Jul 5, 2025

https://github.com/michel-meneses/great-opencl-examples

hermann@RX-Vega56:~$ clinfo | egrep "(Board|compute)"
  Board name:					 Radeon RX Vega
  Max compute units:				 56
hermann@RX-Vega56:~$ rocminfo | grep Uuid.*GPU
  Uuid:                    GPU-02151b5f2eca39a4               
hermann@RX-Vega56:~$ 
hermann@RX-Vega56:~/great-opencl-examples/print_info$ ls
print_info.cpp
hermann@RX-Vega56:~/great-opencl-examples/print_info$ g++ print_info.cpp -L/opt/rocm/lib -lamdocl64
In file included from /usr/include/CL/cl.h:20,
                 from /usr/include/CL/opencl.h:26,
                 from /usr/include/CL/cl.hpp:167,
                 from print_info.cpp:1:
/usr/include/CL/cl_version.h:21:104: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)’
   21 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
      |                                                                                                        ^
hermann@RX-Vega56:~/great-opencl-examples/print_info$ ./a.out 
OpenCL Device Info:
Name: gfx900:xnack-
Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 2.0 
Max size of work-items: (1024,1024,1024)
Max size of work-groups: 256
Number of compute units: 56
Global memory size (bytes): 8573157376
Local memory size per compute unit (bytes): 1170
hermann@RX-Vega56:~/great-opencl-examples/print_info$ 
hermann@Radeons:~/great-opencl-examples/print_info$ clinfo | egrep "(Board|compute)"
  Board name:					 Radeon RX Vega
  Max compute units:				 64
  Board name:					 Radeon RX Vega
  Max compute units:				 64
hermann@Radeons:~/great-opencl-examples/print_info$ rocminfo | grep Uuid.*GPU
  Uuid:                    GPU-021521a4d1243124               
  Uuid:                    GPU-02151dfe505629a4               
hermann@Radeons:~/great-opencl-examples/print_info$ ./a.out 
OpenCL Device Info:
Name: gfx900:xnack-
Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 2.0 
Max size of work-items: (1024,1024,1024)
Max size of work-groups: 256
Number of compute units: 64
Global memory size (bytes): 8573157376
Local memory size per compute unit (bytes): 1024
hermann@Radeons:~/great-opencl-examples/print_info$ 

hermann@Radeon-vii:~/great-opencl-examples/print_info$ clinfo | egrep "(Board|comput)"
  Board name:					 AMD Radeon VII
  Max compute units:				 60
hermann@Radeon-vii:~/great-opencl-examples/print_info$ rocminfo | grep Uuid
  Uuid:                    CPU-XX                             
  Uuid:                    GPU-3f52314172fc1a63               
hermann@Radeon-vii:~/great-opencl-examples/print_info$ ./a.out 
OpenCL Device Info:
Name: gfx906:sramecc+:xnack-
Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 2.0 
Max size of work-items: (1024,1024,1024)
Max size of work-groups: 256
Number of compute units: 60
Global memory size (bytes): 17163091968
Local memory size per compute unit (bytes): 1092
hermann@Radeon-vii:~/great-opencl-examples/print_info$ 
hermann@Radeon-pro-vii:~/great-opencl-examples/print_info$ clinfo | egrep "(Board|compute)"
  Board name:					 AMD Radeon (TM) Pro VII
  Max compute units:				 60
hermann@Radeon-pro-vii:~/great-opencl-examples/print_info$ rocminfo | grep Uuid.*GPU
  Uuid:                    GPU-bf1478a17337ecdb               
hermann@Radeon-pro-vii:~/great-opencl-examples/print_info$ ./a.out 
OpenCL Device Info:
Name: gfx906:sramecc+:xnack-
Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 2.0 
Max size of work-items: (1024,1024,1024)
Max size of work-groups: 256
Number of compute units: 60
Global memory size (bytes): 17163091968
Local memory size per compute unit (bytes): 1092
hermann@Radeon-pro-vii:~/great-opencl-examples/print_info$ 
hermann@7950x:~/great-opencl-examples/print_info$ clinfo | egrep "(Board|compute)"
  Board name:					 AMD Instinct MI50/MI60
  Max compute units:				 60
  Board name:					 AMD Instinct MI50/MI60
  Max compute units:				 60
  Board name:					 AMD Instinct MI50/MI60
  Max compute units:				 60
  Board name:					 AMD Radeon Graphics
  Max compute units:				 1
hermann@7950x:~/great-opencl-examples/print_info$ rocminfo | grep Uuid.*GPU
  Uuid:                    GPU-c49e19417337ece3               
  Uuid:                    GPU-f890794172e62691               
  Uuid:                    GPU-13c24061732c730c               
  Uuid:                    GPU-XX                             
hermann@7950x:~/great-opencl-examples/print_info$ ./a.out 
OpenCL Device Info:
Name: gfx906:sramecc+:xnack-
Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 2.0 
Max size of work-items: (1024,1024,1024)
Max size of work-groups: 256
Number of compute units: 60
Global memory size (bytes): 17163091968
Local memory size per compute unit (bytes): 1092
hermann@7950x:~/great-opencl-examples/print_info$ 

@Hermann-SW
Copy link
Author

https://github.com/rsnemmen/OpenCL-examples

hermann@RX-Vega56:~/OpenCL-examples/add_numbers$ ls
add_numbers.c  add_numbers.cl  Makefile  README.md
hermann@RX-Vega56:~/OpenCL-examples/add_numbers$ git diff
diff --git a/Hello_World/hello.c b/Hello_World/hello.c
index 49350af..7fa57c3 100644
--- a/Hello_World/hello.c
+++ b/Hello_World/hello.c
@@ -58,7 +58,7 @@
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/stat.h>
-#include <OpenCL/opencl.h>
+#include <CL/opencl.h>
 
 ////////////////////////////////////////////////////////////////////////////////
 
hermann@RX-Vega56:~/OpenCL-examples/add_numbers$ make
gcc -std=c99 -Wall -DUNIX -g -DDEBUG -m64 -o add_numbers add_numbers.c   -lOpenCL
In file included from /usr/include/CL/cl.h:20,
                 from add_numbers.c:14:
/usr/include/CL/cl_version.h:21:9: note: ‘#pragma message: cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)’
   21 | #pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
      |         ^~~~~~~
add_numbers.c: In function ‘main’:
add_numbers.c:197:4: warning: ‘clCreateCommandQueue’ is deprecated [-Wdeprecated-declarations]
  197 |    queue = clCreateCommandQueue(context, device, 0, &err);
      |    ^~~~~
/usr/include/CL/cl.h:1801:1: note: declared here
 1801 | clCreateCommandQueue(cl_context                     context,
      | ^~~~~~~~~~~~~~~~~~~~
hermann@RX-Vega56:~/OpenCL-examples/add_numbers$ ./add_numbers
Computed sum = 2016.0.
Check passed.
hermann@RX-Vega56:~/OpenCL-examples/add_numbers$ 

@Hermann-SW
Copy link
Author

Hermann-SW commented Jul 5, 2025

While latest AMD ROCM version is currently 6.4.1, it does not support AMD Instinct MI50 GPUs anymore.
For example rocHPL (High Performance Linpack, AMD HPC Accelerators) does not compile with 6.4.1 rocm for Instinct MI50:
https://www.mersenneforum.org/node/1070049?p=1080817#post1080817

But 6.3.3 rocm (released 2/2025) works fine, and Instinct MI50 (released 11/2018) shows 4.7196 TFLOPs FP64 with rocHPL (see previous posting). Theoretical performance for FP64 (double) is 6.705 TFLOPS (see Intinct MI50 link).

"Introduction to the HIP programming model" from docs:
https://rocm.docs.amd.com/projects/HIP/en/docs-6.3.3/understand/programming_model.html

"ROCm Revisited: Getting Started with HIP" on rocm blogs:
https://rocm.blogs.amd.com/ecosystems-and-partners/rocm-revisited-hip/README.html

rocm examples github repo discussed in that blog:
https://github.com/ROCm/rocm-examples

The discussed hello world example hangs on a Vega56 (with rocm 6.3.1), though:

hermann@RX-Vega56:~/rocm-examples/HIP-Basic/hello_world$ make
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common  -o hip_hello_world main.hip 
hermann@RX-Vega56:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world 
Hello world from host!
Hello world from device or host!

Completes on Vega64 (with rocm 6.3.3), but misses GPU thread outputs:

hermann@Radeons:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world 
Hello world from host!
Hello world from device or host!
hermann@Radeons:~/rocm-examples/HIP-Basic/hello_world$ 

Different output on Radeon pro vii (with rocm 6.3.3) than discussed on blog article:

hermann@Radeon-pro-vii:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world 
Hello world from host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device kernel block 1 thread 0!
Hello world from device kernel block 1 thread 1!
Hello world from device kernel block 0 thread 0!
Hello world from device kernel block 0 thread 1!
hermann@Radeon-pro-vii:~/rocm-examples/HIP-Basic/hello_world$ 

Instinct MI50 (with rocm 6.3.3) behaves like RX Vega64:

hermann@7950x:~/rocm-examples/HIP-Basic/hello_world$ rocminfo | grep Uuid.*GPU
  Uuid:                    GPU-c49e19417337ece3               
  Uuid:                    GPU-f890794172e62691               
  Uuid:                    GPU-13c24061732c730c               
hermann@7950x:~/rocm-examples/HIP-Basic/hello_world$ export ROCR_VISIBLE_DEVICES="0,GPU-c49e19417337ece3"
hermann@7950x:~/rocm-examples/HIP-Basic/hello_world$ rocminfo | grep Uuid.*GPU
  Uuid:                    GPU-c49e19417337ece3               
hermann@7950x:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world 
Hello world from host!
Hello world from device or host!
hermann@7950x:~/rocm-examples/HIP-Basic/hello_world$

@Hermann-SW
Copy link
Author

Wait, the blog gpuHelloWorld example is different to above repo example:

hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ cat gpuHelloWorld.cpp 
#include <hip/hip_runtime.h>

__global__ void gpuHelloWorld(int N) 
{
    int gid = blockIdx.x * blockDim.x + threadIdx.x;
    if (gid < N) 
    {
        printf("Hello world from block %d and thread %d \n", blockIdx.x, gid); 
    }
}

int main() 
{
    int gridSize = 4; //Four blocks
    int blockSize = 4; //Threads per block
    int N = gridSize * blockSize; //Size of problem

    gpuHelloWorld<<<gridSize, blockSize>>>(N); //Launch kernel

    return 0;
}
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ 

It does not work on Instinct MI50, but does work as discussed on blog on Radeon pro vii:

hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ hipcc gpuHelloWorld.cpp -o gpuHelloWorld
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ ./gpuHelloWorld 
Hello world from block 0 and thread 0 
Hello world from block 0 and thread 1 
Hello world from block 0 and thread 2 
Hello world from block 0 and thread 3 
Hello world from block 1 and thread 4 
Hello world from block 1 and thread 5 
Hello world from block 1 and thread 6 
Hello world from block 1 and thread 7 
Hello world from block 2 and thread 8 
Hello world from block 2 and thread 9 
Hello world from block 2 and thread 10 
Hello world from block 2 and thread 11 
Hello world from block 3 and thread 12 
Hello world from block 3 and thread 13 
Hello world from block 3 and thread 14 
Hello world from block 3 and thread 15 
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ ./gpuHelloWorld 
Hello world from block 2 and thread 8 
Hello world from block 2 and thread 9 
Hello world from block 2 and thread 10 
Hello world from block 2 and thread 11 
Hello world from block 3 and thread 12 
Hello world from block 3 and thread 13 
Hello world from block 3 and thread 14 
Hello world from block 3 and thread 15 
Hello world from block 0 and thread 0 
Hello world from block 0 and thread 1 
Hello world from block 0 and thread 2 
Hello world from block 0 and thread 3 
Hello world from block 1 and thread 4 
Hello world from block 1 and thread 5 
Hello world from block 1 and thread 6 
Hello world from block 1 and thread 7 
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ 

@Hermann-SW
Copy link
Author

Nice, my Radeon pro vii GPU has 3,840 cores, and according first comment demo (as well as shown clinfo output) 60 CUs. AMD GPUs (at least those I have shown) have #CUs*64==#cores. So I modified gridsize (# CUs) and blocksize, and GPU responds with 3,840 messages, from blocks 0..59 and with core numbers 0..3839 ...

hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ diff gpuHelloWorld.cpp.orig gpuHelloWorld.cpp
14,15c14,15
<     int gridSize = 4; //Four blocks
<     int blockSize = 4; //Threads per block
---
>     int gridSize = 60; //Four blocks
>     int blockSize = 64; //Threads per block
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ hipcc gpuHelloWorld.cpp -o gpuHelloWorld
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ ./gpuHelloWorld > out
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ sort -u out | wc --lines
3840
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ sort -k8 -n out | head -3
Hello world from block 0 and thread 0 
Hello world from block 0 and thread 1 
Hello world from block 0 and thread 2 
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ sort -k8 -n out | tail -3
Hello world from block 59 and thread 3837 
Hello world from block 59 and thread 3838 
Hello world from block 59 and thread 3839 
hermann@Radeon-pro-vii:~/rocm-examples/blog/gpuHelloWorld$ 

@Hermann-SW
Copy link
Author

Hermann-SW commented Jul 5, 2025

I looked into the hello_world example source, and the output shown for Radeon pro vii is correct (dimensions 2x2):

hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ tail -18 main.hip 
}

int main()
{
    print_hello_host();

    print_hello();

    // Launch the kernel.
    helloworld_kernel<<<dim3(2), // 3D grid specifying number of blocks to launch: (2, 1, 1)
                        dim3(2), // 3D grid specifying number of threads to launch: (2, 1, 1)
                        0, // number of bytes of additional shared memory to allocate
                        hipStreamDefault // stream where the kernel should execute: default stream
                        >>>();

    // Wait on all active streams on the current device.
    HIP_CHECK(hipDeviceSynchronize());
}
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ 

The Radeon pro vii is single GPU in its PC, and so I tried Radeon vii which is single GPU as well, and it works:

hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world 
Hello world from host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device kernel block 0 thread 0!
Hello world from device kernel block 0 thread 1!
Hello world from device kernel block 1 thread 0!
Hello world from device kernel block 1 thread 1!
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ 

Not sure yet what the other GPU problems are (Vega56 is single GPU in its PC, "Radeons" PC has ̶t̶w̶o̶ (now) single Vega64 and 7950x PC has three MI50s).

@Hermann-SW
Copy link
Author

Hermann-SW commented Jul 5, 2025

Running that demo on all 3,840 cores of the Radeon vii works as well, in the demo the threads are numbered in range 0..63 in their block and not 0..3839 as output shown in blog demo:

hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ git diff
diff --git a/HIP-Basic/hello_world/main.hip b/HIP-Basic/hello_world/main.hip
index 584f4190..44e74211 100644
--- a/HIP-Basic/hello_world/main.hip
+++ b/HIP-Basic/hello_world/main.hip
@@ -67,8 +67,8 @@ int main()
     print_hello();
 
     // Launch the kernel.
-    helloworld_kernel<<<dim3(2), // 3D grid specifying number of blocks to launch: (2, 1, 1)
-                        dim3(2), // 3D grid specifying number of threads to launch: (2, 1, 1)
+    helloworld_kernel<<<dim3(60), // 3D grid specifying number of blocks to launch: (2, 1, 1)
+                        dim3(64), // 3D grid specifying number of threads to launch: (2, 1, 1)
                         0, // number of bytes of additional shared memory to allocate
                         hipStreamDefault // stream where the kernel should execute: default stream
                         >>>();
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ make
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common  -o hip_hello_world main.hip 
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world > out
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ grep "device kernel" out | wc --lines
3840
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ grep "device kernel" out | sort -u | wc --lines
3840
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ grep "device kernel" out | sort -k7 -k9 -n| head -3
Hello world from device kernel block 0 thread 0!
Hello world from device kernel block 0 thread 1!
Hello world from device kernel block 0 thread 2!
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ grep "device kernel" out | sort -k7 -k9 -n| tail -3
Hello world from device kernel block 59 thread 61!
Hello world from device kernel block 59 thread 62!
Hello world from device kernel block 59 thread 63!
hermann@Radeon-vii:~/rocm-examples/HIP-Basic/hello_world$ 

@Hermann-SW
Copy link
Author

Hermann-SW commented Jul 6, 2025

Not sure yet what the other GPU problems are (Vega56 is single GPU in its PC,
"Radeons" PC has ̶t̶w̶o̶ (now) single Vega64 and 7950x PC has three MI50s).

Found it — at least for Instinct MI50 GPUs.
I remembered that on the AMD 7950X CPU PC I had tried to disable internal graphics card in Bios, but was not successful before moving to basement. The internal "GPU-XX" seems to make the problems:

hermann@7950x:~$ !rocm
rocminfo | grep Uuid.*GPU
  Uuid:                    GPU-c49e19417337ece3               
  Uuid:                    GPU-f890794172e62691               
  Uuid:                    GPU-13c24061732c730c               
  Uuid:                    GPU-XX                             
hermann@7950x:~$ 

My AMD 7600X CPU PC has four Instinct MI50s, and internal graphics card disabled in Bios (no GPU-XX):

hermann@7600x:~$ rocminfo | grep Uuid
  Uuid:                    CPU-XX                             
  Uuid:                    GPU-d64a58a17330f0ed               
  Uuid:                    GPU-6e56508172dc76b6               
  Uuid:                    GPU-6a0e7961732c730d               
  Uuid:                    GPU-304c70e172dc768c               
hermann@7600x:~$ 

And here the blog hello world works (not he very first time, but two times after that):

hermann@7600x:~/rocm-examples/blog/gpuHelloWorld$ hipcc gpuHelloWorld.cpp -o gpuHelloWorld
hermann@7600x:~/rocm-examples/blog/gpuHelloWorld$ ./gpuHelloWorld 
hermann@7600x:~/rocm-examples/blog/gpuHelloWorld$ ./gpuHelloWorld 
Hello world from block 1 and thread 4 
Hello world from block 1 and thread 5 
Hello world from block 1 and thread 6 
Hello world from block 1 and thread 7 
Hello world from block 2 and thread 8 
Hello world from block 2 and thread 9 
Hello world from block 2 and thread 10 
Hello world from block 2 and thread 11 
Hello world from block 3 and thread 12 
Hello world from block 3 and thread 13 
Hello world from block 3 and thread 14 
Hello world from block 3 and thread 15 
Hello world from block 0 and thread 0 
Hello world from block 0 and thread 1 
Hello world from block 0 and thread 2 
Hello world from block 0 and thread 3 
hermann@7600x:~/rocm-examples/blog/gpuHelloWorld$ ./gpuHelloWorld 
Hello world from block 0 and thread 0 
Hello world from block 0 and thread 1 
Hello world from block 0 and thread 2 
Hello world from block 0 and thread 3 
Hello world from block 3 and thread 12 
Hello world from block 3 and thread 13 
Hello world from block 3 and thread 14 
Hello world from block 3 and thread 15 
Hello world from block 1 and thread 4 
Hello world from block 1 and thread 5 
Hello world from block 1 and thread 6 
Hello world from block 1 and thread 7 
Hello world from block 2 and thread 8 
Hello world from block 2 and thread 9 
Hello world from block 2 and thread 10 
Hello world from block 2 and thread 11 
hermann@7600x:~/rocm-examples/blog/gpuHelloWorld$ 

So the most important to me Instinct MI50 GPUs now work (I will have to correctly disable internal graphics card on "7950x" PC).
That is no problem because both PCs (with 1850W PSU each) are operated headless in basement:
50pc

Next I tried the rocm-examples repo hello world, and it works as well.
AND it showed the reason why the first time calling gpuHelloWorld above showed no output:
"no ROCm-capable device is detected"

hermann@7600x:~/rocm-examples$ cd HIP-Basic/hello_world
hermann@7600x:~/rocm-examples/HIP-Basic/hello_world$ make
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common  -o hip_hello_world main.hip 
hermann@7600x:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world 
Hello world from host!
Hello world from device or host!
An error encountered: "no ROCm-capable device is detected" at main.hip:77
hermann@7600x:~/rocm-examples/HIP-Basic/hello_world$ ./hip_hello_world 
Hello world from host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device or host!
Hello world from device kernel block 0 thread 0!
Hello world from device kernel block 0 thread 1!
Hello world from device kernel block 1 thread 0!
Hello world from device kernel block 1 thread 1!
hermann@7600x:~/rocm-examples/HIP-Basic/hello_world$ 

The rocm-examples repo hello world has this at the end of code:

$ tail -3 ../../HIP-Basic/hello_world/main.hip 
    // Wait on all active streams on the current device.
    HIP_CHECK(hipDeviceSynchronize());
}
$

I added this simple equivalent to blog hello world:

hermann@7600x:~/rocm-examples/blog/gpuHelloWorld$ tail -4 gpuHelloWorld.cpp
    assert(hipDeviceSynchronize() == hipSuccess);

    return 0;
}
hermann@7600x:~/rocm-examples/blog/gpuHelloWorld$ 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment