Problem: Vector Addition
In this post we will cover how to solve the simplest CUDA problem: adding two arrays. I'll explain the code, step by step
Here is the initial template that we are given:
Problem: Vector Addition
In this post we will cover how to solve the simplest CUDA problem: adding two arrays. I'll explain the code, step by step
Here is the initial template that we are given:
This aims to be factual information about the size of large language models. None of this document was written by AI. I do not include any information from leaks or rumors. The focus of this document is on base models (the raw text continuation engines, not 'helpful chatbot/assistants'). This is a view from a few years ago to today of one very tiny fraction of the larger LLM story that's happening.
This is a tool that sorts images into folders using "AI". You create the folders you want the images to be put into.
I got 'claude' to write with a couple prompts. Make venv and install deps with pip.
This is a guide on how to set up a way to loom with a local LLM on your computer. Loom is the name for a way of making use of an LLM base model, such as GPT-2, in order to read, write and explore generated text.
To loom you can use the following software stack:
The LLamafile project doesn't make sense.
The claim is that it is "bringing LLMs to the people", but you could already run an LLM - which is a large binary file containing lots of floating point numbers - by using llama.cpp.
Llamafile joins a compiled binary program to run LLMs with a weights binary into a single file. This isn't a useful goal. you could simply distribute a zip containing an .exe and a weights file together. Or better still: Decouple the program that runs these chatbots from the chatbot weights.
Imagine if PNG files were also an executable that could pop open a window that displays a PNG on your computer. There is a reason we don't do this: It's not good engineering.
This spec seems to have gotten in thanks to work by apignotti, https://hn.algolia.com/?q=WebAssembly+tail+calls
There is also a very interesting project for generalized effect handlers that may build on top of this platform https://wasmfx.dev/community/
Great news for schemers with web browsers.
This is my research report. I've included a lot of the code and chat interactions for people to read through if interested. I worked on this crossword https://www.theguardian.com/crosswords/quick/16553
I had a vision for a GPT powered crossword solver. My idea is that it would do a tree search over GPT generated guesses that would include the knowns so far, like:
I didn't end up doing that because ChatGPT and GPT-4 are terrible at questions involving the length of words, or guessing words that contain specific letters at specific locations. It can sometimes do them but usually fails. I think this is because it's token based. I am curious whether a character based LLM would be better at such tasks.
This worked on 14/May/23. The instructions will probably require updating in the future.
llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)
Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.
It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.
08737ef720f0510c7ec2aa84d7f70c691073c35d.Executive summary: If you use AutoGPT, you need to be aware of prompt injection. This is a serious problem that can cause your AutoGPT agent to perform unexpected and unwanted tasks. Unfortunately, there isn't a perfect solution to this problem available yet.
If you set up an AutoGPT agent to perform task A, a prompt injection could 'derail' it into performing task B instead. Task B could be anything. Even something unwanted like deleting your personal files or sending all your bitcoins to some crooks wallet.
Docker helps limit the file system access that agents have. Measures like this are extremely useful. It's important to note that the agent can still be derailed.
This is a report on my experience pair programming with Bard on a neural network task that challenged it to its current limits.
Bard now has the ability to program, or put another way Google has removed the gating that blocked it from trying.
All the code in this article is basically 99% produced by Bard. I either prompted it to refactor things or I just tweaked one line or two lines of every 100.
Note: I used gpt-4 a little bit too, for the training part, but this is mostly Bard.