You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
Instantly share code, notes, and snippets.
🤹♂️
overcommitting
Yannis Zarkadas
yanniszark
🤹♂️
overcommitting
CS PhD @ Columbia University.
Computer systems and eBPF.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?
I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.
I have had working knowledge of GPUs for a very long time now. I remember playing around with NVIDIA RIVA 128 or something similar with DirectX when they were still 3D graphics accelerators. I have also tried to keep up with the times and did some basic shader programming on a contemporary NVIDIA or AMD GPU.
However, today's GPUs are necessary for another reason – the explosion of AI workloads, including large language models (LLMs). From a GPU perspective, AI workloads are just massive applications of tensor operations such as matrix addition and multiplication. However, how does the modern GPU execute them, which is much more efficient than running the workloads on a CPU?
Consider CUDA, NVIDIA's programming language that extends C to exploit data parallelism on GPUs. In CUDA, you write code for CPU (host code) or GPU (device code). CPU code is just mostly plain C, but CUDA extends the language in two ways: it allows you to define functions for GPUs (kernels) and also prov