Skip to content

Instantly share code, notes, and snippets.

View veekaybee's full-sized avatar
💫
in the latent space

Vicki Boykis veekaybee

💫
in the latent space
View GitHub Profile
@veekaybee
veekaybee / README.md
Last active December 6, 2024 19:08
whisper.ipynb

Using Whisper to transcribe audio

This episode of Recsperts was transcribed with Whisper from OpenAI, an open-source neural net trained on almost 700 hours of audio. The model includes an encoder-decoder architecture by tokenizing audio into 30-second chunks, normalizing audio samples to the log-Mel scale, and passing the data into an encoder. A decoder is trained to predict the captioned text matching the encoder, and the model includes transcription, as well as timestamp-aligned transcription, and multilingual translation.

Screen Shot 2023-01-29 at 11 09 57 PM

The transcription process outputs a single string file, so it's up to the end-user to parse out individual speakers, or run the model [through a sec

@veekaybee
veekaybee / searchrecs.md
Last active February 3, 2025 14:37
Understanding search and recommendations

How are search and recommendations the same, and how are they different?

TL;DR:

  • The design of both search and recommendations is to find and filter information
  • Search is a "recommendation with a null query"
  • Search is "I want this", recommendations is "you might like this"
@veekaybee
veekaybee / chatgpt.md
Last active June 1, 2025 16:08
Everything I understand about chatgpt

ChatGPT Resources

Context

ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?

I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.

Model Architecture

@veekaybee
veekaybee / pyscript.html
Created November 26, 2022 16:21
Testing out Pyscript
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Some plotting</title>
<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
<py-env>

To run: dot -Tpng trie.dot -o trie.png

import com.twitter.scalding._
class WordCountJob(args: Args) extends Job(args) {
val lines = TypedPipe.from(TextLine("posts.txt"))
lines.flatMap { line => tokenize(line) }
.groupBy { word => word }
.size
.groupAll
"com.lihaoyi" %% "os-lib" % "0.7.8"
// Clone my static site repo, loop through posts and get all files as a single file
val wd = os.pwd / "_posts"
val sd = os.Path("/Users/vicki/IdeaProjects/scalding/scalding-repl")
// Concatentates all the files
os.write.over(
wd / "posts.md",
@veekaybee
veekaybee / distance.md
Last active December 30, 2021 15:41
Different Distance Measures

Jaccard Similarity

import numpy as numpy
import typing
 
a = [1,2,3,4,5,11,12]
b = [2,3,4,5,6,8,9]

cats = ["calico", "tabby", "tom"]

Keybase proof

I hereby claim:

  • I am veekaybee on github.
  • I am veekaybee (https://keybase.io/veekaybee) on keybase.
  • I have a public key ASC1BmRUMCaXHMnJ2DzEnxIyypbZqJmYGJIbCxhhrrSZKgo

To claim this, I am signing this object: