Skip to content

Instantly share code, notes, and snippets.

View lewtun's full-sized avatar
🤫
LLM whispering

lewtun

🤫
LLM whispering
View GitHub Profile
@lewtun
lewtun / codeblock.py
Last active January 9, 2022 15:35
Chapter 7 - page 175 - fix code block
for question_type in ["How", "What", "Is"]:
for question in (
dfs["train"][dfs["train"].question.str.startswith(question_type)]
.sample(n=3, random_state=42)['question']):
print(question)
from datasets import load_dataset
def validate_datasets(reference_dataset, new_dataset):
"""Validate the column names and rows of the new dataset"""
splits = list(reference_dataset.keys())
for split in splits:
ref_dset = reference_dataset[split]
new_dset = new_dataset[split]
# Check column names agree
ref_cols = set(ref_dset.column_names)
@lewtun
lewtun / subjqa-electronics-test.json
Created March 30, 2021 19:30
SubjQA test set for Electronics domain in SQuADv2 format
{"data": [{"title": "B00001WRSJ", "paragraphs": [{"qas": [{"question": "What is the tonal balance of these headphones?", "id": "d0781d13200014aa25860e44da9d5ea7", "answers": [{"text": "I have been a headphone fanatic for thirty years", "answer_start": 0}], "is_impossible": false}], "context": "I have been a headphone fanatic for thirty years and have owned and used a variety of headphones over those years, to include Stax SR-5, Sennheiser HD-424 and HD-580. The Sony MDRV6 excells as the best value of any headphone that I've ever owned. They are especially good at producing natural-sounding deep bass, and the overall octave-to-octave balance is excellent. The sound quality is all in all comparable to other headphones that cost considerably more.The MDRV6 is especially well-suited for travel due to the collapsible design, and for noisy environments or for quiet environments such as a library where the sound emitted by open-back headphones would distract others.The MDRV6 is not quite as comfortable as some ot
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@lewtun
lewtun / datasets-wikiann.ipynb
Last active December 5, 2020 19:02
Snippets to produce dummy data for WikiANN in HF datasets
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@lewtun
lewtun / Makefile
Created August 31, 2020 15:24
Anaconda with make
.PHONY: install
#################################################################################
# GLOBALS #
#################################################################################
SHELL=/bin/bash
CONDA_ACTIVATE=source $$(conda info --base)/etc/profile.d/conda.sh ; conda activate ; conda activate
#################################################################################
# COMMANDS #
@lewtun
lewtun / Makefile
Last active August 13, 2020 20:17
Makefile for nbdev with linting and code formatting
# Copyight 2016 drivendata
# Copyright 2019 fast.ai
# Copyright 2020 Lewis Tunstall
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
sample