Skip to content

Instantly share code, notes, and snippets.

View rdoume's full-sized avatar
🍕

Robin rdoume

🍕
View GitHub Profile
@gd3kr
gd3kr / script.js
Created February 15, 2024 06:30
Download a JSON List of twitter bookmarks
/*
the twitter api is stupid. it is stupid and bad and expensive. hence, this.
Literally just paste this in the JS console on the bookmarks tab and the script will automatically scroll to the bottom of your bookmarks and keep a track of them as it goes.
When finished, it downloads a JSON file containing the raw text content of every bookmark.
for now it stores just the text inside the tweet itself, but if you're reading this why don't you go ahead and try to also store other information (author, tweetLink, pictures, everything). come on. do it. please?
*/
@veekaybee
veekaybee / normcore-llm.md
Last active November 18, 2024 15:27
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

@sayle-doit
sayle-doit / bq_job_editions_cost_comparison_with_autoscaler.sql
Last active June 3, 2024 14:46
Compare BigQuery job costs when running a job on either BigQuery Editions with the autoscaler or on-demand with both new and old pricing models.
/*
* This query will look at the past 30 days of job history to analyze it for costs under
* BigQuery Editions while utilizing the new autoscaling feature that was introduced.
* It does this for those using both PAYG (Pay As You Go) and commitment models.
* It will also compare this versus running the query with the on-demand model.
*
* Note that this query utilizes some math modeling behaviors that the BigQuery
* autoscaler uses. Namely these are the up to 10 seconds "slot scale up time,"
* the minimum of 60 seconds "slot scale down time," and the behavior that the
* autoscaler scales up and down in factors of 100 slots for each job.
@sayle-doit
sayle-doit / bq_storage_across_org.sql
Last active June 16, 2023 17:55
Determine BigQuery Storage Costs Across an Organization for Both Compressed (Physical) and Uncompressed (Logical) Storage
/*
* This query will run across an entire organization looking at tables across every project
* and shows how they will compare on compressed and uncompressed storage.
*
* Region Notes:
* This query will only read from a single region or multi-region at a time. It's
* currently not possible to read this data from across all
*
* By default this reads from the US multi-region, so this might need to be changed if
* your data lives elsewhere.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@erikbern
erikbern / kaplan_meier_for_revenue.py
Last active October 14, 2023 19:04
Kaplan-Meier for multiple revenue events
from matplotlib import pyplot
import random
import time
pyplot.style.use("ggplot")
now = time.time()
def generate_user(censor=now):
# Pick some point in time the user was created
t_created = t = now - random.random() * 1e7
@pmbaumgartner
pmbaumgartner / softie.py
Last active July 12, 2024 13:50
Create a soft label classifier from any scikit-learn regressor object
from sklearn.base import BaseEstimator, ClassifierMixin
from scipy.special import expit, logit
class SoftLabelClassifier(BaseEstimator, ClassifierMixin):
def __init__(self, regressor, eps=0.001):
self.regressor = regressor
self.eps = eps
def fit(self, X, y=None):
@33eyes
33eyes / commit_jupyter_notebooks_code_to_git_and_keep_output_locally.md
Last active November 18, 2024 09:51
How to commit jupyter notebooks without output to git while keeping the notebooks outputs intact locally

Commit jupyter notebooks code to git and keep output locally

  1. Add a filter to git config by running the following command in bash inside the repo:
git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'  
  1. Create a .gitattributes file inside the directory with the notebooks

  2. Add the following to that file:

@koaning
koaning / main.py
Created November 16, 2019 19:25
keras grid job
import uuid
import json
import random
import keras
import numpy as np
import tensorflow as tf
import click