Skip to content

Instantly share code, notes, and snippets.

View ddofer's full-sized avatar

Dan Ofer ddofer

View GitHub Profile
@yoavg
yoavg / structured-cot.md
Created November 25, 2024 23:13
Structured-chain-of-thought breaks some basic language-use principles

Are OpenAI training models in a way that encourages security risks?

Todays's topic is structured outputs, how to produce them, their interprlay with chain-of-thought, and a potential security risk this opens up.

Structured Outputs

When using an LLM programatically as part of a larger system or process, it is useful to have the model produce outputs in a structured format which is easy to parse programatically. Formatting the output as a JSON structure makes a lot of sense in this regard, and the commercial LLM models are trained to produce JSON outputs according to your specification. So for example instead of asking the model to produce a list of 10 items (left) which may be tricky to parse, I could ask it to return the answer as a JSON list of 10 strings (right).

@Hellisotherpeople
Hellisotherpeople / blog.md
Last active March 2, 2025 22:14
You probably don't know how to do Prompt Engineering, let me educate you.

You probably don't know how to do Prompt Engineering

(This post could also be titled "Features missing from most LLM front-ends that should exist")

Apologies for the snarky title, but there has been a huge amount of discussion around so called "Prompt Engineering" these past few months on all kinds of platforms. Much of it is coming from individuals who are peddling around an awful lot of "Prompting" and very little "Engineering".

Most of these discussions are little more than users finding that writing more creative and complicated prompts can help them solve a task that a more simple prompt was unable to help with. I claim this is not Prompt Engineering. This is not to say that crafting good prompts is not a difficult task, but it does not involve doing any kind of sophisticated modifications to general "template" of a prompt.

Others, who I think do deserve to call themselves "Prompt Engineers" (and an awful lot more than that), have been writing about and utilizing the rich new eco-system

@jackd
jackd / README.md
Last active August 2, 2022 12:41
tensorflow graphics keras port for PR #155

Get my forked tensorflow graphics repo and switch to appropriate branch

git clone https://github.com/jackd/graphics.git
cd graphics
git checkout sparse-feastnet
pip install -e .
cd ..

Get this gist:

@aditya-malte
aditya-malte / smallberta_pretraining.ipynb
Created February 22, 2020 13:41
smallBERTa_Pretraining.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dkapitan
dkapitan / describe_robust.py
Last active November 1, 2021 08:54
Monkey-patch for pd.Dataframe.describe() with robust statistics
def describe_robust(self, percentiles=None, include=None, exclude=None, trim=0.2):
"""
Monkey-patch for pd.Dataframe.describe based on robust statistics.
Calculate trimmed mean and winsorized standard deviation with default trim 0.2.
Uses scipy.stats.mstats (trimmed_mean, winsorized) and numpy.std
See e.g. http://www.uh.edu/~ttian/ES.pdf for methodical background.
BSD 3-Clause License
@GaelVaroquaux
GaelVaroquaux / impact_encoding.py
Created October 29, 2018 14:19
Target encoding (or impact encoding)
# how to use : df should be the dataframe restricted to categorical values to impact,
# target should be the pd.series of target values.
# Use fit, transform etc.
# three types : binary, multiple, continuous.
# for now m is a param <===== but what should we put here ? I guess some function of total shape.
# I mean what would be the value of m we want to have for 0.5 ?
import pandas as pd
import numpy as np
@GaelVaroquaux
GaelVaroquaux / deconfound.py
Last active July 18, 2021 12:35
Linear deconfounding in a fit-transform API
"""
A scikit-learn like transformer to remove a confounding effect on X.
"""
from sklearn.base import BaseEstimator, TransformerMixin, clone
from sklearn.linear_model import LinearRegression
import numpy as np
class DeConfounder(BaseEstimator, TransformerMixin):
""" A transformer removing the effect of y on X.
@fomightez
fomightez / useful_FASTA_handling.py
Last active July 3, 2024 21:19
snippets for dealing with FASTA
# This is not presently all encompassing as it was started well after my sequence work repo
# at https://github.com/fomightez/sequencework , where much of this related code is.
# For making FASTA files/entriees out of dataframes, see 'specific dataframe contents saved as formatted text file example'
# in my useful pandas snippets gist https://gist.github.com/fomightez/ef57387b5d23106fabd4e02dab6819b4
  1. download model
if [[ ! -e 'numberbatch-17.06.txt' ]]; then
    wget https://conceptnet.s3.amazonaws.com/downloads/2017/numberbatch/numberbatch-17.06.txt.gz
    gunzip numberbatch-17.06.txt.gz
fi
sudo pip install wordfreq
sudo pip install gensim