Created
March 28, 2026 15:30
-
-
Save rotty3000/151e4b197f4be99c1b5e0c67ac7bbb5b to your computer and use it in GitHub Desktop.
Ollama Environment Variables
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Ollama Environment Variables | |
| # Search: https://deepwiki.com/ollama/ollama | |
| # Query: How do I find out if I should set and the value for the environment variable <env_var> | |
| CUDA_VISIBLE_DEVICES= # (comma-separated) You should set CUDA_VISIBLE_DEVICES when you have multiple NVIDIA GPUs in your system and want to control which specific GPUs Ollama uses for inference. | |
| GGML_VK_VISIBLE_DEVICES= # (comma-separated numeric IDs) You should set GGML_VK_VISIBLE_DEVICES when you have multiple Vulkan-compatible GPUs and want to control which specific GPUs Ollama uses for inference, or when you want to disable Vulkan GPU support entirely. | |
| GPU_DEVICE_ORDINAL= # (comma-separated numeric IDs) You should set GPU_DEVICE_ORDINAL when you have multiple AMD GPUs in your system and want to control which specific AMD GPUs are visible to Ollama, similar to other AMD GPU visibility variables. | |
| HIP_VISIBLE_DEVICES= # (comma-separated numeric IDs) You should set HIP_VISIBLE_DEVICES when you have multiple AMD GPUs in your system and want to control which specific AMD GPUs are visible to Ollama on non-Linux systems (Windows). | |
| HSA_OVERRIDE_GFX_VERSION= # (string, indexed) You should set HSA_OVERRIDE_GFX_VERSION when you have an AMD GPU that is not natively supported by ROCm and want to force it to use a similar, supported LLVM target. | |
| OLLAMA_CONTEXT_LENGTH=8192 # (integer) You should set OLLAMA_CONTEXT_LENGTH when you want to override Ollama's automatic context length defaults that are based on your available VRAM, or when you need a specific context size for particular tasks. | |
| OLLAMA_DEBUG=INFO # (string) You should set OLLAMA_DEBUG when you need to troubleshoot issues with Ollama or want to see more detailed logging information about its operations. | |
| OLLAMA_DEBUG_LOG_REQUESTS=false - # ? | |
| OLLAMA_EDITOR= # (string) You should set OLLAMA_EDITOR when you want to specify a particular text editor to use when editing prompts interactively in Ollama's CLI mode (triggered with Ctrl+G). | |
| OLLAMA_FLASH_ATTENTION=true # (bool) You should set OLLAMA_FLASH_ATTENTION when you want to enable Flash Attention, a memory optimization feature that reduces VRAM usage during attention computation, especially beneficial for larger context sizes. | |
| OLLAMA_GPU_OVERHEAD=1073741824 # (integer; bytes) You should set OLLAMA_GPU_OVERHEAD when you want to reserve a specific amount of VRAM on each GPU to account for system overhead and prevent out-of-memory errors when loading models. | |
| OLLAMA_HOST=http://0.0.0.0:11434 # You should set OLLAMA_HOST when you want to change the network address and port that Ollama binds to, such as exposing it on your network, using a custom port, or connecting to a remote Ollama server. | |
| OLLAMA_KEEP_ALIVE=1h30m0s # You should set OLLAMA_KEEP_ALIVE when you want to control how long models remain loaded in memory after requests complete, either to keep them available longer for faster responses or to free up memory more quickly. | |
| OLLAMA_KV_CACHE_TYPE=q8_0 # (string; f16, q8_0, q4_0) You should set OLLAMA_KV_CACHE_TYPE when you want to reduce VRAM usage by quantizing the Key/Value cache, which is especially beneficial for larger context sizes and when Flash Attention is enabled. | |
| OLLAMA_LLM_LIBRARY= # (string) You should set OLLAMA_LLM_LIBRARY when Ollama's automatic LLM library detection is having problems or when you want to force a specific LLM library for compatibility or performance reasons. | |
| OLLAMA_LOAD_TIMEOUT=5m0s # (string; duration syntax) The OLLAMA_LOAD_TIMEOUT environment variable controls how long Ollama waits for a model to load before giving up due to stalled progress. The default value is 5 minutes. | |
| OLLAMA_MAX_LOADED_MODELS=6 # (integer) The OLLAMA_MAX_LOADED_MODELS environment variable controls the maximum number of models that can be loaded concurrently in memory, provided they fit in available VRAM. The default value is 0, which automatically sets the limit to 3 models per GPU (or 3 for CPU-only systems). | |
| OLLAMA_MAX_QUEUE=512 # (integer) The OLLAMA_MAX_QUEUE environment variable controls the maximum number of requests that Ollama will queue when all models are busy before rejecting additional requests with a 503 error. The default value is 512 requests. | |
| OLLAMA_MODELS=/home/ollama/.ollama/models # (string; path) The OLLAMA_MODELS environment variable specifies the directory where Ollama stores downloaded models. The default location varies by platform: $HOME/.ollama/models on macOS/Linux and C:\Users\%username%\.ollama\models on Windows. | |
| OLLAMA_MULTIUSER_CACHE=false # (bool) The OLLAMA_MULTIUSER_CACHE environment variable optimizes prompt caching for multi-user scenarios by changing cache eviction behavior to better handle concurrent access patterns when multiple users are accessing the same model. | |
| OLLAMA_NEW_ENGINE=true # (bool) The OLLAMA_NEW_ENGINE environment variable enables Ollama's new native engine that uses a custom tokenizer implementation instead of the llama.cpp backend. The default is false (disabled), but it's automatically enabled for certain model architectures. | |
| OLLAMA_NOHISTORY=false # (bool) The OLLAMA_NOHISTORY environment variable controls whether Ollama preserves readline history in interactive mode. The default is false (history is preserved). | |
| OLLAMA_NOPRUNE=false # (bool) The OLLAMA_NOPRUNE environment variable controls whether Ollama prunes (cleans up) unused model blobs on startup. The default is false (pruning is enabled). | |
| OLLAMA_NO_CLOUD=false # (bool) The OLLAMA_NO_CLOUD environment variable disables Ollama's cloud features, including remote inference and web search capabilities. The default is false (cloud features enabled). | |
| OLLAMA_NUM_PARALLEL=5 # (integer) The OLLAMA_NUM_PARALLEL environment variable controls the maximum number of parallel requests each model can process simultaneously. The default value is 1. | |
| OLLAMA_ORIGINS=[* http://localhost ...] # (comma-separated) The OLLAMA_ORIGINS environment variable controls which web origins are allowed to make cross-origin requests to the Ollama server. This is used for CORS (Cross-Origin Resource Sharing) configuration. | |
| OLLAMA_REMOTES=[ollama.com] # (comma-separated) The OLLAMA_REMOTES environment variable controls which remote hosts are allowed for downloading and accessing remote models. The default value is ollama.com. | |
| OLLAMA_SCHED_SPREAD=false # (bool) The OLLAMA_SCHED_SPREAD environment variable controls whether Ollama spreads model loading across all available GPUs. The default is false (disabled), which means Ollama first tries to fit each model on a single GPU before spreading across multiple GPUs. | |
| OLLAMA_VULKAN=false # (bool) The OLLAMA_VULKAN environment variable enables experimental Vulkan GPU support for Ollama. The default is false (disabled). | |
| ROCR_VISIBLE_DEVICES= # (comma-separated) The ROCR_VISIBLE_DEVICES environment variable controls which AMD GPUs are visible to Ollama when using the ROCm library on Linux systems. It's used to select specific GPUs for model execution. | |
| HTTPS_PROXY= # (url) You should set HTTPS_PROXY when your network requires a proxy server to access the internet, particularly when Ollama needs to pull models from external registries. | |
| HTTP_PROXY= # (url) You should generally NOT set HTTP_PROXY for Ollama, as it can cause issues with client connections to the server. Ollama only uses HTTPS for model pulls, not HTTP. Setting HTTP_PROXY may interrupt client connections to the Ollama server. | |
| NO_PROXY= # (comma-separated) You should set NO_PROXY when you have a proxy configured but want to exclude certain destinations from going through the proxy, such as local services or specific domains. | |
| http_proxy= # same as above | |
| https_proxy= # same as above | |
| no_proxy= # same as above | |
| # Set by editing the ollama service using | |
| # | |
| # $ sudo systemctl edit ollama.service | |
| # | |
| # [Service] | |
| # Environment="OLLAMA_HOST=0.0.0.0" | |
| # Environment="OLLAMA_ORIGINS=*" | |
| # Environment="OLLAMA_CONTEXT_LENGTH=8192" | |
| # Environment="OLLAMA_MAX_LOADED_MODELS=6" | |
| # Environment="OLLAMA_NUM_PARALLEL=5" | |
| # Environment="OLLAMA_KEEP_ALIVE=90m" | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment