Skip to content

Instantly share code, notes, and snippets.

View ceteri's full-sized avatar

paco xander nathan ceteri

View GitHub Profile
@ceteri
ceteri / 00.graphx.scala
Last active September 9, 2021 13:38
Spark GraphX demo
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
case class Peep(name: String, age: Int)
val vertexArray = Array(
(1L, Peep("Kim", 23)),
(2L, Peep("Pat", 31)),
(3L, Peep("Chris", 52)),
(4L, Peep("Kelly", 39)),
import nltk
nltk.download()
## use nltk.download() within a Python prompt to
## download the `punkt` data
## Anaconda is recommended, to pick up NumPy, NLTK, etc.
## http://continuum.io/downloads
## this also requires TextBlob/PerceptronTagger
@ceteri
ceteri / 0.setup.sh
Last active April 24, 2019 11:04
Spark Streaming demo
# using four part files to construct "minitweet"
cat rawtweets/part-0000[1-3] > minitweets
# change log4j properties to WARN to reduce noise during demo
mv conf/log4j.properties.template conf/log4j.properties
vim conf/log4j.properties # Change to WARN
# launch Spark shell REPL
./bin/spark-shell
@ceteri
ceteri / vagrant.sh
Created July 14, 2014 19:51
Getting started with Vagrant + IPython notebook for Just Enough Math tutorial
vagrant up
vagrant ssh
cd jem
nbserver
@ceteri
ceteri / log.scala
Last active May 14, 2020 13:12
Intro to Apache Spark: code example for RDD animation
// load error messages from a log into memory
// then interactively search for various patterns
// base RDD
val lines = sc.textFile("log.txt")
// transformed RDDs
val errors = lines.filter(_.startsWith("ERROR"))
val messages = errors.map(_.split("\t")).map(r => r(1))
messages.cache()
@ceteri
ceteri / clk.tsv
Last active May 14, 2020 13:13
Intro to Apache Spark: code example for (K,V), join, operator graph
2014-03-04 15dfb8e6cc4111e3a5bb600308919594 11
2014-03-06 81da510acc4111e387f3600308919594 61
@ceteri
ceteri / 01.repl.txt
Last active April 17, 2022 18:46
Intro to Apache Spark: general code examples
$ ./bin/spark-shell
14/04/18 15:23:49 INFO spark.HttpServer: Starting HTTP Server
14/04/18 15:23:49 INFO server.Server: jetty-7.x.y-SNAPSHOT
14/04/18 15:23:49 INFO server.AbstractConnector: Started [email protected]:49861
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 0.9.1
/_/
@ceteri
ceteri / 0.install_on_mesos_master
Last active December 29, 2015 03:39
Exelixi -- an Apache Mesos example framework in Python for running genetic algorithms at scale. See https://github.com/ceteri/exelixi
bash-3.2$ ssh -A -l ubuntu 54.205.7.177
The authenticity of host '54.205.7.177 (54.205.7.177)' can't be established.
RSA key fingerprint is 60:0e:23:7a:b2:c7:42:50:82:86:57:8e:e3:a2:da:74.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '54.205.7.177' (RSA) to the list of known hosts.
Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-41-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Tue Dec 10 17:22:31 UTC 2013
@ceteri
ceteri / chronos.txt
Created October 23, 2013 22:25
Chronos tutorial
paco@granite:~$ curl http://downloads.mesosphere.io.s3.amazonaws.com/chronos/chronos.tgz -o chronos.tgz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 35.0M 100 35.0M 0 0 3800k 0 0:00:09 0:00:09 --:--:-- 6913k
paco@granite:~$ tar xzf chronos.tgz
paco@granite:~$ cd chronos/
paco@granite:~/chronos$ nohup ./bin/chronos-marathon &
[1] 26210
paco@granite:~/chronos$ nohup: ignoring input and appending output to ‘nohup.out’
@ceteri
ceteri / kmeans.py
Last active December 25, 2015 15:49
scikit-learn examples
print(__doc__)
from time import time
import numpy as np
import pylab as pl
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA