Matt Hall kwinkunks

Answering this question on Cross Validated.

Regressor comparison

I wrote a blog post about this here.

I tweeted about it here and again

Inspired, of course, by the various wonderful comparisons in the sklearn docs, like this one for classifiers.

	def has_illegal_chars(string: str, illegal: str = ',;"!+=') -> bool:
	"""
	Detect the presence of illegal characters in a string.
	By default, illegal characters are: `,;"!+=`

	Args:
	string: A string of text of any length.
	illegal: A sequence of characters that are not allowed.

	Returns:

	"""
	Given a 4D array of shape (n, h, w, c) representing n images of shape (h, w, c),
	make a single image consisting of a regular grid of smaller images.

	License: MIT No attribution
	"""
	import numpy as np

	def reshape(arr, rows, cols, pixels=False):
	"""Reshapes a 4D array into a grid of images.

	# Properties of the scaled standard deviational hyperellipsoid.
	#
	# Author: Matt Hall, [email protected]
	# Copyright: 2022, Matt Hall
	# Licence: Apache 2.0, https://www.apache.org/licenses/LICENSE-2.0
	#
	# These small functions implement n-dimensional lookup of the beta-distribution
	# approximation to this problem. They answer the questions, "What proportion
	# of a multivariate Gaussian distribution is contained by `r` standard
	# deviations?" and "How many standard deviations contain a proportion `p` of

	# We want to get unique items in a sequence, but to keep the order in which they appear.
	# There are quite a few solutions here > http://www.peterbe.com/plog/uniqifiers-benchmark
	# Good, up to date summary of methods > https://stackoverflow.com/a/17016257/3381305

	# Some test data: text...
	tdat = 100 * ['c', 'a', 'c', 'b', 'e', 'd', 'f', 'g', 'h', 'i', 'j', 'j']
	tarr = np.array(tdat)
	tser = pd.Series(tdat)
	# ... and numbers.
	narr = np.random.randint(0, 10, size=1200)