Flavio brdsio

# http://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F
# http://wiki.apache.org/solr/UpdateXmlMessages#Updating_a_Data_Record_via_curl

curl "http://index.websolr.com/solr/a0b1c2d3/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'

I'm amused at the traction this little gist is getting on Google! I would be remiss not to point out that six+ years later I'm still helping thousands of companies on a daily basis with their search index management, by providing managed Solr as a service over at Websolr, and hosted Elasticsearch at Bonsai. Check us out if you'd like an expert helping hand at Solr and Elasticsearch hosting, ops and support!

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs

Useful Pandas Snippets

A personal diary of DataFrame munging over the years.

Data Types and Conversion

Convert Series datatype to numeric (will error if column has non-numeric values)
(h/t @makmanalp)

	import nltk

	text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
	computer or the gears of a cycle transmission as he does at the top of a mountain
	or in the petals of a flower. To think otherwise is to demean the Buddha...which is
	to demean oneself."""

	# Used when tokenizing words
	sentence_re = r'''(?x) # set flag to allow verbose regexps
	([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A.

	##
	# Copyright (C) 2011 - present by OpenGamma Inc. and the OpenGamma group of companies
	#
	# Please see distribution for license.
	##

	# Loads the time-series for yield curve data points to construct a 3D "curve over time" graph.

	# Curve tickers
	tickers <- c ("US00O/N Index", "US0001W Index", "US0002W Index", "US0001M Index", "US0002M Index", "US0003M Index", "USSW2 Curncy", "USSW3 Curncy", "USSW4 Curncy", "USSW5 Curncy", "USSW6 Curncy", "USSW7 Curncy", "USSW8 Curncy", "USSW9 Curncy", "USSW10 Curncy", "USSW15 Curncy", "USSW20 Curncy", "USSW25 Curncy", "USSW30 Curncy")

	function cached_completer(completer) {
	var cache = {};
	return function(request, response) {
	if (request.term in cache) {
	response(cache[request.term]);
	} else {
	completer(request, function(resp) {
	cache[request.term] = resp;
	response(resp);
	});

	This is free and unencumbered software released into the public domain.

	Anyone is free to copy, modify, publish, use, compile, sell, or
	distribute this software, either in source code form or as a compiled
	binary, for any purpose, commercial or non-commercial, and by any
	means.

	In jurisdictions that recognize copyright laws, the author or authors
	of this software dedicate any and all copyright interest in the
	software to the public domain. We make this dedication for the benefit

	def next_digit(value, base):
	return value + str(sum(int(a)*b for a,b in zip(value, base))%11%10)

	def make_valid(value, ap2, base):
	return next_digit(next_digit(value, base), ap2+base)

	def is_valid_cpf(cpf):
	return make_valid(cpf[:9], [0], [1,2,3,4,5,6,7,8,9]) == cpf

	def is_valid_cnpj(cnpj):

	'''From Coding Train
	https://youtu.be/BAejnwN4Ccw
	3/2/2017
	Added Genetic Algorithm
	4/27/2017
	'''

	import random

	cities = [];

	import pandas as pd
	import pandas_datareader.data as web
	import numpy as np
	import datetime
	from scipy.optimize import minimize
	TOLERANCE = 1e-10


	def _allocation_risk(weights, covariances):