Skip to content

Instantly share code, notes, and snippets.

@sswt
sswt / np_dtypes_sq_eucl.py
Last active December 12, 2021 09:47
Numpy dtypes performance on squared euclidean distance
import time
import numpy as np
nl = np.random.randint(0, 1000, 9*10**6).reshape((3000,3000))
def sq_euclidean(X):
XX = np.sum(X * X, axis=1)[:, np.newaxis]
Y = X
YY = XX.T
distances = np.dot(X, Y.T)
distances *= -2
@sswt
sswt / eval_map
Created January 23, 2017 19:59
Outbrain map@12 evaluation
def eval_map(yhat, dtrain):
inds = (-yhat).argsort(kind='mergesort')
X = np.vstack([dtrain.data['display_id'], dtrain.get_label()]).T
X = X[inds[X[inds, 0].argsort(kind='mergesort')]]
y_ind = np.where(X[:, 1] == 1)[0]
y_pr = np.unique(X[:, 0], return_index=True)[1]
return 'map@12_ob', np.mean(1 / (y_ind - y_pr + 1)), True
@sswt
sswt / latency.txt
Created May 10, 2020 08:16 — forked from jboner/latency.txt
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
# -*- coding: utf-8 -*-
""" Deletes all tweets below a certain retweet threshold.
"""
import tweepy
from datetime import datetime
# Constants
CONSUMER_KEY = ''