Brian bdashore3

Building KoboldCpp with CuBLAS on Windows

Koboldcpp is a hybrid LLM model interface which involves the use of llamacpp + GGML for loading models shared on both the CPU and GPU. It can also be used to completely load models on the GPU. This interface tends to be used with OpenBLAS or CLBlast which uses frameworks such as OpenCL.

However, OpenCL can be slow and those with GPUs would like to use their own frameworks. This guide will focus on those with an Nvidia GPU that can run CUDA on Windows. Nvidia provides its own framework called CuBLAS. This can be used with koboldcpp!

Disclaimer

Please do not annoy the koboldcpp developers for help! Sometimes the CMakefile can go bad or things might break, but the devs are NOT responsible for having CuBLAS issues since they outright state that support is limited. Those rules also apply for this guide. Build at your own peril.

Quest 2 ADB without the PC

The Quest 2 can run many ADB commands which enhance the overall experience of the VR world, but there's a catch.

Every time the Quest 2 is rebooted, all adb settings are reset to provide a clean slate. This is similar to how iOS relinks everything with a simple reboot. Doing this makes sure the user isn't impacted and has to factory reset as a result.

Therefore, users need to be connected to a PC and run ADB commands every time the headset reboots. My goal is to take the PC out of the equation and do everything on the headset.

	#!/usr/bin/env python3
	import argparse
	import os
	import subprocess

	def main(model, outbase, outdir):
	llamabase = "F:\AI\ggml\llama-cpp"
	llamabuild = "F:\AI\ggml\llama-cpp-build"

	if not os.path.isdir(model):

	{
	"measurement": [
	{
	"key": "model.layers.0.self_attn.q_proj",
	"numel": 26214400,
	"options": [
	{
	"desc": "0.05:3b/0.95:2b 32g s4",
	"bpw": 2.17529296875,
	"total_bits": 57024000.0,

	base_model: meta-llama/Llama-2-13b-hf
	base_model_config: meta-llama/Llama-2-13b-hf
	model_type: LlamaForCausalLM
	tokenizer_type: LlamaTokenizer
	is_llama_derived_model: true

	load_in_8bit: false
	load_in_4bit: true
	strict: false

	import glob
	import re
	import os.path
	import statistics
	import random

	import pandas
	import yaml

	from transformers import LlamaTokenizer, AutoTokenizer

	base_model: meta-llama/Llama-2-13b-hf
	base_model_config: meta-llama/Llama-2-13b-hf
	model_type: LlamaForCausalLM
	tokenizer_type: LlamaTokenizer
	is_llama_derived_model: true

	load_in_8bit: false
	load_in_4bit: true
	strict: false

	// Goal: Provide regex replacement with any regex string
	// Advantages:
	// - Lots of flexibility. Any regex string can be used
	// - Strings with the same delimiter can be used
	// Disadvantages:
	// - Requires the user to know how regex works
	// - More complicated UI
	// - Can become confusing, fast.

	// matchRegexString - Can be anything. In this case it checks for if the string is surrounded by braces and grabs with braces.

	// Goal: Provide regex beginning/end replacement without the need for users to know regex
	// Advantages:
	// - Allows for checks if the character is the same (ex. backticks)
	// - Simplified UI for users (just need a start character, end character, and replacement string)
	// Disadvantages:
	// - Lack of inputting a raw regex string
	// - Less flexibility, covers less usecases

	// beginningChar - The beginning char(s) to look for
	// endingChar - The ending char(s) to look for

	{
	"name": "kingbri's extensions",
	"author": "kingbri",
	"sources": [
	{
	"name": "AnimeTosho",
	"version": "2",
	"baseUrl": "https://animetosho.org",
	"htmlParser": {
	"searchUrl": "/search?q={query}",

Brian bdashore3

Building KoboldCpp with CuBLAS on Windows

Disclaimer

Quest 2 ADB without the PC

Dependencies