Skip to content

Instantly share code, notes, and snippets.

View toff63's full-sized avatar

Christophe Marchal toff63

View GitHub Profile
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.SparkConf
object SimpleStreamingApp {
def main(args:Array[String]) {
val conf = new SparkConf().setMaster("local[2]").setAppName("Simple App")
val ssc = new StreamingContext(conf, Seconds(45)) // Time window of 45 second.
val lines = ssc.socketTextStream("localhost", 9999)
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args:Array[String]) {
val logFile = "/mnt/home2/Documents/bin/spark-1.5.1-bin-hadoop2.6/README.md"
val conf = new SparkConf().setAppName("Simple App")
val sparkContext = new SparkContext(conf)
 ssh -t -i KP_bastion ec2-user@bastion \
 ssh -t -i /home/ec2-user/.ssh/KP_server [email protected] \

"tail -f /var/log/syslog"

ssh -t -i KP_bastion ec2-user@bastion ssh -t -i /home/ec2-user/.ssh/KP_server [email protected]

Client config

eureka.shouldUseDns=true
eureka.eurekaServer.domainName=mydomain.com
archaius.deployment.region=us-west-2

Troubleshoot

Retrieve list of eureka nodes:

Spark

main function executing various parallel operations on a cluster. Main abstraction: Resilient Distributed Dataset (Fault tolerant abstraction for In-Memory Cluster computing)

RDD

Motivated by two types of applications that traditional Map Reduce weren't handling efficiently:

  • iterative alogrithms: used in iterative ML algorithms like Page-Rank, K-means clustering, logic regression
  • interactive Data Mining tools
@toff63
toff63 / download_all_gists.js
Last active September 16, 2015 13:42 — forked from diegopacheco/download_all_gists.js
Download All Gists from Github
var request = require('request')
, path = require('path')
, fs = require('fs')
, url = "https://api.github.com/users/diegopacheco/gists"
, savepath = 'D:/tmp/gists';
String.prototype.replaceAll = function(search, replace, ignoreCase) {
if (ignoreCase) {
var result = [];
var _string = this.toLowerCase();
@toff63
toff63 / vector.md
Last active September 16, 2015 14:03 — forked from diegopacheco/vector.md
How To install and Run Netflix/Vector on Amazon Linux OS ?

Install PCP

sudo yum install -y git gcc perl-CPAN bison flex byacc libmicrohttpd-devel gcc-c++ nodejs npm --enablerepo=epel 
git clone git://git.pcp.io/pcp 
cd pcp/ 
./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-webapi 
sudo groupadd -r pcp 
sudo useradd -c "Performance Co-Pilot" -g pcp -d /var/lib/pcp -M -r -s /usr/sbin/nologin pcp 
make 

sudo make install

#NetflixOSS in a nutshell

Based on http://jhohertz.github.io/netflixoss-slides

Cluster management

  • Aminator to build your container (AMI image)\
  • Asgard to manage the nodes of your cluster
  • ICE Cost Analysis/Optimization
  • Cluster state with query language

Keybase proof

I hereby claim:

  • I am toff63 on github.
  • I am christophemarcha (https://keybase.io/christophemarcha) on keybase.
  • I have a public key whose fingerprint is 6E27 B1FB C06C 0920 AF4C D193 142E 7701 646C C913

To claim this, I am signing this object: