Skip to content

Instantly share code, notes, and snippets.

@rajivmehtaflex
Created May 28, 2024 05:05
Show Gist options
  • Save rajivmehtaflex/82c68b3bfbd26f30e6852fd98c18b3db to your computer and use it in GitHub Desktop.
Save rajivmehtaflex/82c68b3bfbd26f30e6852fd98c18b3db to your computer and use it in GitHub Desktop.
LLM Execution server
--model
/workspace/codesandbox-template-blank/llamafile/AutoCoder_S_6.gguf
--server
--host
0.0.0.0
-ngl
100
...
Test gguf file
- Download llamafile
- Download/Create gruff file
./llamafile -m ./AutoCoder_S_6.gguf —server —port 8889 —temp 0.3
Create standalone llm server
- Create .args file
- Compile to server
cp llamafile AutoCoder.llamafile
./zipalign -j0 AutoCoder.llamafile AutoCoder_S_6.gguf .args
./AutoCoder.llamafile
./AutoCoder.llamafile --port 8890
curl http://localhost:8080/v1/chat/completions \
‐H "Content‐Type: application/json" ‐d ‘{
"model": "/workspace/codesandbox-template-blank/llamafile/AutoCoder_S_6.gguf",
"stream": true,
"messages": [
{
"role": "system",
"content": "You are a poetic assistant."
},
{
"role": "user",
"content": "Compose a poem that explains FORTRAN."
}
]
}’
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment