Christopher Carroll Smith chriscarrollsmith

Embedding your downloaded Twitter archive data for semantic search and RAG with `llm`

Getting your tweets

Twitter allows you to download your archive of tweets. You can do this by going to your account settings and requesting your archive. Once you receive the email with the download link, you can download the zip file.

Exploring the data

Top-level keys of twitter_archive.json (via jq 'keys' twitter_archive.json):

	#!/usr/bin/env bun
	// @bun-dependencies: puppeteer@latest puppeteer-extra@latest puppeteer-extra-plugin-stealth@latest

	import puppeteer from 'puppeteer';

	async function scrapeXTweet(url, metadata) {
	// Launch browser with stealth options
	const browser = await puppeteer.launch({
	headless: true, // or 'new' for new headless mode
	executablePath: '/usr/bin/google-chrome',

	import os
	import sys
	from typing import Any, Dict, List

	from openai import OpenAI


	def ensure_api_key() -> None:
	if not os.getenv("OPENAI_API_KEY"):
	print("ERROR: Please set OPENAI_API_KEY in your environment.")

Christopher Carroll Smith chriscarrollsmith

Embedding your downloaded Twitter archive data for semantic search and RAG with llm

Getting your tweets

Exploring the data

Embedding your downloaded Twitter archive data for semantic search and RAG with `llm`