Skip to content

Instantly share code, notes, and snippets.

@jakevdp
jakevdp / CategoricalCMAP.ipynb
Last active February 14, 2024 18:00
Example of a categorical color map in matplotlib
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jacquerie
jacquerie / grobid.py
Created October 30, 2015 15:46
Python-driven GROBID retraining
#!/usr/bin/env python
# -*- coding: utf8 -*-
import os
import grobid_core
import grobid_trainer
if __name__ == '__main__':
@Tushar-N
Tushar-N / pad_packed_demo.py
Last active October 27, 2024 15:17
How to use pad_packed_sequence in pytorch<1.1.0
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
seqs = ['gigantic_string','tiny_str','medium_str']
# make <pad> idx 0
vocab = ['<pad>'] + sorted(set(''.join(seqs)))
# make model
@rdapaz
rdapaz / win32com.client.py
Last active November 19, 2024 06:07
Fix for module win32com.gen_py has no attribute 'CLSIDToPackageMap'
# If errors are found, do this
# clear contents of C:\Users\<username>\AppData\Local\Temp\gen_py
# that should fix it, to test it type
import win32com.client
app = win32com.client.gencache.EnsureDispatch('Word.Application')
app.Visible = True
using Microsoft.Office.Interop.Word;
using System.IO;
namespace MSWordExample
{
public class LineNumberingKiller
{
static void Main(string[] args)
{
Application word = new Application();
@W4ngatang
W4ngatang / download_glue_data.py
Last active May 4, 2025 12:17
Script for downloading data of the GLUE benchmark (gluebenchmark.com)
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
@aplz
aplz / fasttext_cv.py
Created September 5, 2018 16:15
sklearn cross-validation for fasttext
import argparse
import os
import fasttext
from sklearn.base import BaseEstimator
from sklearn.metrics import f1_score
from sklearn.model_selection import cross_val_score, StratifiedKFold
def read_data(data_dir):
@eviltester
eviltester / gist:11093f0e4c501a41990e227393184eda
Last active February 14, 2025 09:30
uncheck twitter interests
var timer=100;document.querySelectorAll("div > input[type='checkbox']:checked").forEach((interest) => {setTimeout(function(){interest.click()},timer);timer+=2000;});
@asamofal
asamofal / update_bions_lenovo_x1_g6.md
Created April 18, 2020 21:50
Update BIOS on ThinkPad X1 Carbon Gen 6th (Ubuntu 18.04, Legacy mode)

Update BIOS on ThinkPad X1 Carbon Gen 6th (Ubuntu 18.04)

Laptop (ThinkPad) Lenovo X1 Carbon 6th Gen (Type 20KH, 20KG)

If you are using your ThinkPad X1G6 with Linux in "Legacy only" mode, there's only one way how to update BIOS - you should use "BIOS Update (Bootable CD)". So this is a step by step guid how to do it.

Check current BIOS version:

fwupdmgr get-devices

@yoavg
yoavg / LLMs.md
Last active February 6, 2025 02:39

Some remarks on Large Language Models

Yoav Goldberg, January 2023

Audience: I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Intro

Around 2014-2017, right within the rise of neural-network based methods for NLP, I was giving a semi-academic-semi-popsci lecture, revolving around the story that achieving perfect language modeling is equivalent to being as intelligent as a human. Somewhere around the same time I was also asked in an academic panel "what would you do if you were given infinite compute and no need to worry about labour costs" to which I cockily responded "I would train a really huge language model, just to show that it doesn't solve everything!". We