Skip to content

Instantly share code, notes, and snippets.

View hamelsmu's full-sized avatar
💻
Always learning.

Hamel Husain hamelsmu

💻
Always learning.
View GitHub Profile
@hamelsmu
hamelsmu / SmartLLMChain_logs.txt
Created February 13, 2024 22:58
The logs for SmartLLMChain for this blog post: https://hamel.dev/blog/posts/prompt/
> Entering new SmartLLMChain chain...
Prompt after formatting:
I have a 12 liter jug and a 6 liter jug.I want to measure 6 liters. How do I do it?
Idea 1:
1. Fill the 12 liter jug completely.
2. Pour the contents of the 12 liter jug into the 6 liter jug. This will leave you with 6 liters in the 12 liter jug.
3. Empty the 6 liter jug.
4. Pour the remaining 6 liters from the 12 liter jug into the now empty 6 liter jug.
5. You now have 6 liters in the 6 liter jug.
Idea 2:
{"conversations":[{"from":"system","value":"Honeycomb is an observability platform that allows you to write queries to inspect trace data. You are an assistant that takes a natural language query (NLQ) and a list of valid columns and produce a Honeycomb query."},{"from":"human","value":"\n\nNLQ: \"How many error messages have occurred?\"\n\nColumns: ['error', 'error_count', 'error.type', 'error.msg', 'error.cause_chain', 'email', 'target', 'code.lineno', 'latency', 'id', 'name', 'http.url', 'transaction.id', 'client_reference', 'code.namespace', 'type', 'duration_ms', 'provider', 'account.id', 'internal.payment.missing_in_adapter', 'service.name', 'http.method', 'operation', 'busy_ns', 'idle_ns', 'status_code', 'library.name', 'http.host', 'classification', 'invitation_id', 'thread.id', 'entity_type', 'thread.name', 'library.version', 'code.filepath', 'customer.payment.outbound.missing_in_internal', 'level', 'claim_id', 'parent_name', 'adapter.payment.missing_in_internal', 'net.host.port', 'service.version',
@hamelsmu
hamelsmu / big_inference.py
Created August 6, 2023 18:08
tensor parallel modal
# # Fast inference with vLLM (Llama 2 13B)
#
# In this example, we show how to run basic inference, using [`vLLM`](https://github.com/vllm-project/vllm)
# to take advantage of PagedAttention, which speeds up sequential inferences with optimized key-value caching.
#
# `vLLM` also supports a use case as a FastAPI server which we will explore in a future guide. This example
# walks through setting up an environment that works with `vLLM ` for basic inference.
#
# We are running the Llama 2 13B model here, and you can expect 30 second cold starts and well over 100 tokens/second.
# The larger the batch of prompts, the higher the throughput. For example, with the 60 prompts below,
@hamelsmu
hamelsmu / torchserve.md
Last active August 24, 2024 04:07
Why `torchserve` is great in the age of LLMs

Why torchserve is great in the age of LLMs

I previously wrote about TorchServe as nice way to serve models.

There are a plethora of ways to optimize LLM models for inference, such as quantization, Paged Attention, kernel fusion, and other compilation techniques. The list of these keeps expanding as the demand for serving OSS LLMs increases. There is a groundswell of demand for serving OSS LLMs, because many organizations have tried OpenAI/Anthropic/Palm, but want a a solution they can control

image

This is why TorchServe is nice, you can use any framework you want in the custom handler, such as GPTQ, transformers, vLLM, etc (as long as there is a python client). With TorchServe, there is enough f

@hamelsmu
hamelsmu / app.ipynb
Last active November 14, 2022 18:16
A demo of creating a gradio app with nbdev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelsmu
hamelsmu / app.ipynb
Created November 7, 2022 20:20
A demo of creating a gradio app with nbdev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelsmu
hamelsmu / app.ipynb
Created November 7, 2022 18:21
A demo of creating a gradio app with nbdev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelsmu
hamelsmu / app.ipynb
Created November 7, 2022 18:10
A demo of creating a gradio app with nbdev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelsmu
hamelsmu / app.ipynb
Created November 7, 2022 18:07
A demo of creating a gradio app with nbdev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hamelsmu
hamelsmu / app.ipynb
Created November 7, 2022 17:54
A demo of creating a gradio app with nbdev
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.