Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
a simple script that reads tweets inside a json file, uses openai to compute embeddings and creates two files, metadata.tsv and output.tsv, which cam be used to visualise the tweets and their embeddings in TensorFlow Projector (https://projector.tensorflow.org/) | |
""" | |
# obtain tweets.json from https://gist.github.com/gd3kr/948296cf675469f5028911f8eb276dbc | |
import pandas as pd | |
import json | |
from openai import OpenAI |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
the twitter api is stupid. it is stupid and bad and expensive. hence, this. | |
Literally just paste this in the JS console on the bookmarks tab and the script will automatically scroll to the bottom of your bookmarks and keep a track of them as it goes. | |
When finished, it downloads a JSON file containing the raw text content of every bookmark. | |
for now it stores just the text inside the tweet itself, but if you're reading this why don't you go ahead and try to also store other information (author, tweetLink, pictures, everything). come on. do it. please? | |
*/ |
ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?
I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.