Skip to content

Instantly share code, notes, and snippets.

View chriscarrollsmith's full-sized avatar

Christopher Carroll Smith chriscarrollsmith

View GitHub Profile
@chriscarrollsmith
chriscarrollsmith / scrape_tweet.js
Last active August 10, 2025 14:46
Scrape the text of a tweet by URL with Puppeteer. Self-installs dependencies when run with bun. Add arbitrary metadata with --metadata-json flag.
#!/usr/bin/env bun
// @bun-dependencies: puppeteer@latest puppeteer-extra@latest puppeteer-extra-plugin-stealth@latest
import puppeteer from 'puppeteer';
async function scrapeXTweet(url, metadata) {
// Launch browser with stealth options
const browser = await puppeteer.launch({
headless: true, // or 'new' for new headless mode
executablePath: '/usr/bin/google-chrome',
@chriscarrollsmith
chriscarrollsmith / embed_tweets_with_llm.md
Last active August 12, 2025 13:03
Workflows for using Simon Willson's 'llm' CLI tool to embed your downloadable Twitter archive for semantic search and RAG

Embedding your downloaded Twitter archive data for semantic search and RAG with llm

Getting your tweets

Twitter allows you to download your archive of tweets. You can do this by going to your account settings and requesting your archive. Once you receive the email with the download link, you can download the zip file.

Exploring the data

Top-level keys of twitter_archive.json (via jq 'keys' twitter_archive.json):

@chriscarrollsmith
chriscarrollsmith / openai_vector_search_example.py
Last active September 8, 2025 01:02
Quick script for inspecting file_search tool results returned from OpenAI embeddings vector store
import os
import sys
from typing import Any, Dict, List
from openai import OpenAI
def ensure_api_key() -> None:
if not os.getenv("OPENAI_API_KEY"):
print("ERROR: Please set OPENAI_API_KEY in your environment.")