Skip to content

Instantly share code, notes, and snippets.

@tos-kamiya
Created September 14, 2024 16:04
Show Gist options
  • Save tos-kamiya/80fa574faedbea077b50d12936dcac65 to your computer and use it in GitHub Desktop.
Save tos-kamiya/80fa574faedbea077b50d12936dcac65 to your computer and use it in GitHub Desktop.
A script to immediately free up VRAM used by ollama services.
#!/bin/bash
# The default ollama server address
OLLAMA_URL="http://localhost:11434"
# Help function
show_help() {
echo "Usage: $0 [OPTIONS] [URL]"
echo "Unload all loaded models in ollama."
echo ""
echo "Options:"
echo " -h, --help Show this help message and exit."
echo ""
echo "Arguments:"
echo " URL Optional. The URL of the ollama server. Defaults to 'http://localhost:11434'."
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help)
show_help
exit 0
;;
*)
OLLAMA_URL=$1
shift
;;
esac
done
# Identify the models being currently loaded
loaded_models=$(curl -s "${OLLAMA_URL}/api/ps" | jq -r '.models[].name')
# Request unloading for each of the models
for model in $loaded_models; do
curl -s -X POST "${OLLAMA_URL}/api/generate" -d "{\"model\": \"${model}\", \"keep_alive\": 0}" > /dev/null 2>&1
echo "Unloading model: ${model}" >&2
done
# reference: https://www.reddit.com/r/ollama/comments/1cnxnrv/how_to_set_keepalive_1_on_ollama_linux/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment