This demo shows how to load a function compiles to fatbin format using CUDA Driver API.
This is nice because the fatbin packages all compiled code for all device version during compile time (given some compiler flags), so we don't need to use the JIT linker/compiler.
make
./hello # test regular kernel call
./dyload # test dynamically loaded function call