Paulo Haddad paulochf

Disabling GNOME Tracker and Other Info

GNOME's tracker is a CPU and privacy hog. There's a pretty good case as to why it's neither useful nor necessary here: http://lduros.net/posts/tracker-sucks-thanks-tracker/

After discovering it chowing 2 cores, I decided to go about disabling it.

Directories

Applied Functional Programming with Scala - Notes

1. Mastering Functions

A function is a mapping from one set, called a domain, to another set, called the codomain. A function associates every element in the domain with exactly one element in the codomain. In Scala, both domain and codomain are types.

val square : Int => Int = x => x * x

Spark Tips & Tricks

Misc. Tips & Tricks

If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the

	# After Ubuntu 16.04, Systemd becomes the default.
	# It is simpler than https://gist.github.com/Doowon/38910829898a6624ce4ed554f082c4dd

	[Unit]
	Description=Jupyter Notebook

	[Service]
	Type=simple
	PIDFile=/run/jupyter.pid
	ExecStart=/home/phil/Enthought/Canopy_64bit/User/bin/jupyter-notebook --config=/home/phil/.jupyter/jupyter_notebook_config.py

	class VotingClassifier(object):
	"""Stripped-down version of VotingClassifier that uses prefit estimators"""
	def __init__(self, estimators, voting='hard', weights=None):
	self.estimators = [e[1] for e in estimators]
	self.named_estimators = dict(estimators)
	self.voting = voting
	self.weights = weights

	def fit(self, X, y, sample_weight=None):
	raise NotImplementedError

	import numpy as np

	class Resample(object):
	def __init__(self, cv, method='under'):
	self.cv = cv
	self.method = method

	def split(self, X, y, **kwargs):
	for train_idx, test_idx in self.cv.split(X, y, **kwargs):
	counts = np.bincount(y[train_idx]) # assumes y are from {0, 1..., n_classes-1}