Skip to content

Instantly share code, notes, and snippets.

View abodacs's full-sized avatar

Abdullah Mohammed abodacs

View GitHub Profile
@virattt
virattt / hotpotqa-vector_db-metrics.ipynb
Created February 24, 2024 19:55
hotpotqa-vector_db-metrics.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jrknox1977
jrknox1977 / multi_ollama_containers.md
Last active April 8, 2025 03:51
Running Multiple ollama containers on a single host.

Multiple Ollama Containers on a single host (with multiple GPUs)

I don't want model RELOAD

  • I have a large machine with 2 GPUs and a considerable amount of RAM.
  • I was trying to use ollama to server llava and mistral BUT it would reload the models every time I switched model requests.
  • So this is the solution that appears to be working: Multiple Containers, each serving a different model, on different ports.

Ollama model working dir:

  • I have many models already downloaded on my machine so I mount the host ollama working dir to the containers.
  • Linux (At least on my linux machine) - /usr/share/ollama/.ollama
"""
git clone https://github.com/argilla-io/distilabel.git
pip install -e ".[hf-inference-endpoints]"
"""
import asyncio
import os
import pandas as pd
from llm_swarm import LLMSwarm, LLMSwarmConfig
from huggingface_hub import AsyncInferenceClient
@kalomaze
kalomaze / llm_samplers_explained.md
Last active June 10, 2025 12:13
LLM Samplers Explained

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

  • Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
  • 1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
  • Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4
@virattt
virattt / query_expansion-cohere.ipynb
Created February 17, 2024 19:49
query_expansion-cohere.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@thomwolf
thomwolf / fast_speech_text_speech.py
Last active January 14, 2025 12:13
speech to text to speech
""" To use: install LLM studio (or Ollama), clone OpenVoice, run this script in the OpenVoice directory
git clone https://github.com/myshell-ai/OpenVoice
cd OpenVoice
git clone https://huggingface.co/myshell-ai/OpenVoice
cp -r OpenVoice/* .
pip install whisper pynput pyaudio
"""
from openai import OpenAI
import time
@jrknox1977
jrknox1977 / ollama_dspy.py
Created February 9, 2024 18:06
ollama+DSPy using OpenAI APIs.
# install DSPy: pip install dspy
import dspy
# Ollam is now compatible with OpenAI APIs
#
# To get this to work you must include `model_type='chat'` in the `dspy.OpenAI` call.
# If you do not include this you will get an error.
#
# I have also found that `stop='\n\n'` is required to get the model to stop generating text after the ansewr is complete.
# At least with mistral.
from collections import defaultdict
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from datasets import load_dataset
from rich.console import Console
from rich.table import Table
from transformers import (
AutoConfig,
@dan-palmer
dan-palmer / prompt.json
Created February 2, 2024 13:49
Arc Search Browse for Me Prompt
{
"messages": [
{
"content": "You are an advanced, reliable, candid AI system that takes user search queries, converts them into questions, and answers them, using specific facts and details sourced from webpages to prove your answer. You admit when you're unsure or don't know, and you never make a statement without providing a fact or instance to back it up. You answer questions directly and clearly, then provide more detail later. You follow the JSON schema exactly.",
"role": "system"
},
{
"content": "# CONTEXT\nCurrent date: #{DATE_TIME}.\n\nHere are result from a web search for '#{QUERY}':\nBEGIN WEB PAGE #{HOST_1} #{MARKDOWN_1}END WEB PAGE\nBEGIN WEB PAGE #{HOST_2} #{MARKDOWN_2}END WEB PAGE\nBEGIN WEB PAGE #{HOST_3} #{MARKDOWN_3}END WEB PAGE\nBEGIN WEB PAGE #{HOST_4} #{MARKDOWN_4}END WEB PAGE\nBEGIN WEB PAGE #{HOST_5} #{MARKDOWN_5}END WEB PAGE\nBEGIN WEB PAGE #{HOST_6} #{MARKDOWN_6}END WEB PAGE",
"role": "system"
},
@Mistobaan
Mistobaan / langchain_llamacpp_natural_functions.ipynb
Last active March 19, 2024 22:08
langchain_llamacpp_natural_functions.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.