Lambda Docs
https://docs.lambdalabs.com/
The Lambda Chat Completions API enables you to use the Llama 3.1 405B Instruct large language model (LLM) and fine-tuned versions such as Nous Research's Hermes 3, without the need to set up your own vLLM API server on an on-demand instance or 1-Click Cluster (1CC).
Try Lambda Chat!
Since the Lambda Chat Completions API is compatible with the OpenAI API, you can use it as a drop-in replacement for applications currently using the OpenAI API.
The Lambda Chat Completions API implements endpoints for:
- Creating chat completions (
/chat/completions
) - Creating completions (
/completions
) - Listing models (
/models
)
Currently, two models are available:
hermes-3-llama-3.1-405b-fp8
(18K context length)hermes-3-llama-3.1-405b-fp8-128k
(128K context length)
If a request using the hermes-3-llama-3.1-405b-fp8
model is made with a context length greater than 18K, the request will fall back to using the hermes-3-llama-3.1-405b-fp8-128k
model.
To use the Lambda Chat Completions API, first generate a Cloud API key from the dashboard. You can also use a Cloud API key that you've already generated.
In the examples below, replace MODEL
with one of the models listed above and API-KEY
with your actual Cloud API key.
First, create a Python virtual environment. Then, install the OpenAI Python API library.
Run, for example:
from openai import OpenAI
openai_api_key = "API-KEY"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "MODEL"
chat_completion = client.chat.completions.create(
messages=[{
"role": "system",
"content": "You are a helpful assistant named Hermes, made by Nous"
}, {
"role": "user",
"content": "Who won the world series in 2020?"
}, {
"role": "assistant",
"content": "The Los Angeles Dodgers won the World Series in 2020."
}, {
"role": "user",
"content": "Where was it played?"
}],
model=model,
)
print(chat_completion)
You should see output similar to:
ChatCompletion(id='chat-e489d950acaa41deb02cb794d1ecfe6b', choices=[Choice(...])
The /completions
endpoint takes a single text string (a prompt) as input, then outputs a response. In comparison, the /chat/completions
endpoint takes a list of messages as input.
To use the /completions
endpoint:
First, create a Python virtual environment. Then, install the OpenAI Python API library.
Run, for example:
from openai import OpenAI
openai_api_key = "API-KEY"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
model = "MODEL"
response = client.completions.create(
prompt="Computers are",
temperature=0,
model=model,
)
print(response)
You should see output similar to:
Completion(id='cmpl-bed15d67c6894588bc0292c1cc5ed28d', choices=[Completion(...])
To use the /models
endpoint:
First, create a Python virtual environment. Then, install the OpenAI Python API library.
Run:
from openai import OpenAI
openai_api_key = "API-KEY"
openai_api_base = "https://api.lambdalabs.com/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base,
)
client.models.list()
You should see output similar to:
SyncPage[Model](data=[Model(id='hermes-3-llama-3.1-405b-fp8', created=1677, ...)])
For more information, visit the following links:
This Markdown format includes the main content from the PDF document, including code examples and relevant links.