Maybe treper

In this article, I will share some of my experience on installing NVIDIA driver and CUDA on Linux OS. Here I mainly use Ubuntu as example. Comments for CentOS/Fedora are also provided as much as I can.

Installing gcc 4.8 and Linuxbrew on CentOS 6

The GCC distributed with CentOS 6 is 4.4.7, which is pretty outdated. I'd like to use gcc 4.8+. Also, when trying to install Linuxbrew you run into a dependency loop where Homebrew's gcc depends on zlib, which depends on gcc. Here's how I solved the problem.

Note: Requires sudo privileges.

Resources:

http://superuser.com/a/676337/88393: Forum response on using CERN's open Scientific Linux distribution of RHEL's developer toolset.
http://linux.web.cern.ch/linux/devtoolset/: CERN's developer toolset installation instructions.

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

Nutch - the crawler (fetches and parses websites)
HBase - filesystem storage for Nutch (Hadoop component, basically)

#Kafka - Messaging Basics This assumes you are starting fresh and have no existing Kafka or ZooKeeper data. See http://kafka.apache.org/documentation.html#quickstart for more details.

##Install Java

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination

	1. Check Python3 root
	>>> import sys
	>>> import os
	>>> sys.executable
	'/usr/local/bin/python3'

	OR

	$ which python3
	/usr/local/bin/python3

	= Some yum usage for people who know "apt" =

	If you are familiar with the apt package manager on Debian/Ubuntu this page should help you transfer your knowledge to working with yum on Fedora/RHEL/CentOS/etc.

	Note that this page as currently written is by non-apt experts, so there may be some mistakes.

	== General points ==

	* Speed:
	* data/CPU: apt on Debian deals with roughly ~37,000 packages[1] and an extra 6,500 "provides"[2]. yum on Fedora deals with roughly 24,000 packages, 143,000 provides and 3,100,000 file provides.

	// generate [0..n-1]
	auto seq = [](size_t n) -> std::vector<size_t> {
	std::vector<size_t> v(n);
	for (size_t i=0; i<n; ++i) v[i] = i;
	return v;
	};
	auto index = seq(n);

	// n * n distance matrix
	std::vector<D> dists(n * n);

	#!/bin/sh

	# one way (older scala version will be installed)
	# sudo apt-get install scala

	#2nd way
	sudo apt-get remove scala-library scala
	wget http://www.scala-lang.org/files/archive/scala-2.11.4.deb
	sudo dpkg -i scala-2.11.4.deb
	sudo apt-get update

	package topic

	import spark.broadcast._
	import spark.SparkContext
	import spark.SparkContext._
	import spark.RDD
	import spark.storage.StorageLevel
	import scala.util.Random
	import scala.math.{ sqrt, log, pow, abs, exp, min, max }
	import scala.collection.mutable.HashMap