Juan Riaza juanriaza

Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112

For older versions of Spark and ipython, please, see also previous version of text.

Install Java Development Kit

The point of this is to use cheap machines with small/slow storage to coordinate client requests while dedicating the machines with the big and fast storage to doing what they do best. I found that request coordination was contributing to about half the CPU usage on our Cassandra nodes, on average. Solid state storage is quite expensive, nearly doubling the cost of typical hardware. It also means that if people have control over hardware placement within the network, they can place proxy nodes closer to the client without impacting their storage footprint or fault tolerance characteristics.

This is accomplished in Cassandra by passing the -Dcassandra.join_ring=false option when the process is started. These nodes will connect to the seeds, cache the gossip data, load the schema, and begin listening for client requests. Messages like "/x.x.x.x is now UP!" will appear on the other nodes.

There are also some more practical benefits to this. Handling client requests caused us to push the NewSize of the heap up

Most VPN Services are Terrible

Short version: I strongly do not recommend using any of these providers. You are, of course, free to use whatever you like. My TL;DR advice: Roll your own and use Algo or Streisand. For messaging & voice, use Signal. For increased anonymity, use Tor for desktop (though recognize that doing so may actually put you at greater risk), and Onion Browser for mobile.

This mini-rant came on the heels of an interesting twitter discussion: https://twitter.com/kennwhite/status/591074055018582016

2015-01-29 Unofficial Relay FAQ

Compilation of questions and answers about Relay from React.js Conf.

Disclaimer: I work on Relay at Facebook. Relay is a complex system on which we're iterating aggressively. I'll do my best here to provide accurate, useful answers, but the details are subject to change. I may also be wrong. Feedback and additional questions are welcome.

What is Relay?

Relay is a new framework from Facebook that provides data-fetching functionality for React applications. It was announced at React.js Conf (January 2015).

The introduction to Reactive Programming you've been missing

(by @andrestaltz)

This tutorial as a series of videos

If you prefer to watch video tutorials with live-coding, then check out this series I recorded with the same contents as in this article: Egghead.io - Introduction to Reactive Programming.

PSQL

Magic words:

psql -U postgres

Some interesting flags (to see all, use -h or --help depending on your psql version):

-E: will describe the underlaying queries of the \ commands (cool for learning!)
-l: psql will list all databases and then exit (useful if the user you connect with doesn't has a default database, like at AWS RDS)

	from sre_parse import Pattern, SubPattern, parse as sre_parse
	from sre_compile import compile as sre_compile
	from sre_constants import BRANCH, SUBPATTERN


	class Scanner(object):

	def __init__(self, tokens, flags=0):
	subpatterns = []
	pat = Pattern()

	#!/usr/bin/env python
	# -- coding: utf-8 --

	import subprocess
	import time

	VERBOSE = False


	def wait(seconds):

	# -- coding: utf-8 --
	from __future__ import absolute_import

	import os
	from scrapy.contrib.httpcache import FilesystemCacheStorage
	from .dupefilter import splash_requst_fingerprint


	class SplashAwareFSCacheStorage(FilesystemCacheStorage):
	def _get_request_path(self, spider, request):

	"""XPath extension functions for lxml, inspired by:

	https://gist.github.com/shirk3y/458224083ce5464627bc

	Usage:

	import xpathfuncs; xpathfuncs.setup()

	"""
	import string