Koboldcpp is a hybrid LLM model interface which involves the use of llamacpp + GGML for loading models shared on both the CPU and GPU. It can also be used to completely load models on the GPU. This interface tends to be used with OpenBLAS or CLBlast which uses frameworks such as OpenCL.
However, OpenCL can be slow and those with GPUs would like to use their own frameworks. This guide will focus on those with an Nvidia GPU that can run CUDA on Windows. Nvidia provides its own framework called CuBLAS. This can be used with koboldcpp!
Please do not annoy the koboldcpp developers for help! Sometimes the CMakefile can go bad or things might break, but the devs are NOT responsible for having CuBLAS issues since they outright state that support is limited. Those rules also apply for this guide. Build at your own peril.