Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
| # Clone llama.cpp | |
| git clone https://github.com/ggerganov/llama.cpp.git | |
| cd llama.cpp | |
| # Build it | |
| make clean | |
| LLAMA_METAL=1 make | |
| # Download model | |
| export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin |
| # coding=utf-8 | |
| # Copyright 2023 The HuggingFace Inc. team. All rights reserved. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software |
Each day at our company, developers are required to document their activities, painstakingly jotting down their daily work and future plans. A monotonous chore that I just really dislike.
So now, there's a scribe for that :
Sam Altman: "I think if you have a smart person who has learned to do good research and has the right sort of mindset, it only takes about six months to make them, you know, take a smart physics researcher and make them into a productive AI researcher. So we don't have enough talent in the field yet, but it's coming soon. We have a program at open AI that does exactly this. And I'm astonished how well it works."
This worked on 14/May/23. The instructions will probably require updating in the future.
llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)
Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.
It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.
08737ef720f0510c7ec2aa84d7f70c691073c35d.Everyone is now racing to create open-source alternatives to compete with GPT3.5/GPT4. A common shortcut used by some teams to bootstrap their effort is to fine-tune their model on ChatGPT output. I used to think it was a good idea and totally fair play to do this. Actually, I still think it’s fair play. OpenAI effectively distilled the entire web into its models. They are saying themself that they are using publicly accessible information (mostly). So distilling their model is, in effect, distilling the public open web, so small Term of Service details aside, I don’t see major ethical problems with that. Right? Well, it’s not entirely true and I realized now that, even when ignoring the ethical considerations, using their output is a really bad idea.
First of all, from a purely technical point of view, as @yoavgo is explaining it beautifully in his recent post, there is no way to align LLMs correctly without the RLHF component. I encourag
| # NOTE: | |
| # You can find an updated, more robust and feature-rich implementation | |
| # in Zeno Build | |
| # - Zeno Build: https://github.com/zeno-ml/zeno-build/ | |
| # - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py | |
| import openai | |
| import asyncio | |
| from typing import Any |
| import openai | |
| import pinecone | |
| from sentence_transformers import SentenceTransformer | |
| class GPTConversationManager: | |
| def __init__(self, api_key, pinecone_api_key, index_name): | |
| self.api_key = api_key | |
| openai.api_key = self.api_key | |
| self.conversation_history = [] | |
| self.pinecone_api_key = pinecone_api_key |