qiwihui qiwihui

FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.

Effective Engineer - Notes

By Edmond Lau
Highly Recommended 👍
http://www.theeffectiveengineer.com/

What's an Effective Engineer?

Set the default size for new Docker for Mac disk images

UPDATE: The instructions here are no longer necessary! Resizing the disk image is now possible right from the UI since Docker for Mac Version 17.12.0-ce-mac49 (21995).

If you are getting the error: No space left on device

Configuring the qcow2 size cap is possible in the current versions:

# my disk is currently 64GiB

Spark internals through code

Nothing gives you more detail about spark internals than actually reading it source code. In addition, you get to learn many design techniques and improve your scala coding skills. These are the random notes I make while reading the spark code. The best way to comprehend the notes is to load spark code into an IDE, e.g. IntelliJ, and navigate the code on the side.

Genesis - creation of a spark cluster

The scripts for creating a spark cluster are: start-master.sh and start-slave.sh. Read them carefully, and you can see that both scripts are very similar except the values for $CLASS variable. For start-master.sh, the value is CLASS="org.apache.spark.deploy.master.Master", while the value for start-slave.sh is shown below with more context.

# NOTE: This exact class name is matched downstream by SparkSubmit.

	#!/bin/sh

	# cross & static compile shadowsocks-libev

	PCRE_VER=8.41
	PCRE_FILE="http://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-$PCRE_VER.tar.gz"

	MBEDTLS_VER=2.6.0
	MBEDTLS_FILE="https://tls.mbed.org/download/mbedtls-$MBEDTLS_VER-gpl.tgz"

	Settings on Twidere:

	API URL Format: https://your-host/[DOMAIN.]twitter.com/
	Uncheck "Same OAuth signing URL"
	Uncheck "No verion suffix"

	Password login recommended.

	#!/bin/bash

	# Minimum TODOs on a per job basis:
	# 1. define name, application jar path, main class, queue and log4j-yarn.properties path
	# 2. remove properties not applicable to your Spark version (Spark 1.x vs. Spark 2.x)
	# 3. tweak num_executors, executor_memory (+ overhead), and backpressure settings

	# the two most important settings:
	num_executors=6
	executor_memory=3g

	# Redis Cheatsheet
	# All the commands you need to know


	redis-server /path/redis.conf # start redis with the related configuration file
	redis-cli # opens a redis prompt


	# Strings.

	"""
	A weighted version of categorical_crossentropy for keras (2.0.6). This lets you apply a weight to unbalanced classes.
	@url: https://gist.github.com/wassname/ce364fddfc8a025bfab4348cf5de852d
	@author: wassname
	"""
	from keras import backend as K
	def weighted_categorical_crossentropy(weights):
	"""
	A weighted version of keras.objectives.categorical_crossentropy

	from pyspark.sql.functions import udf
	from pyspark.sql.types import BooleanType

	def regex_filter(x):
	regexs = ['.ALLYOURBASEBELONGTOUS.']

	if x and x.strip():
	for r in regexs:
	if re.match(r, x, re.IGNORECASE):
	return True