Skip to content

Instantly share code, notes, and snippets.

View fran0x's full-sized avatar

Francisco Lopez fran0x

View GitHub Profile
@fran0x
fran0x / Spark_Jupyter_OS_X.md
Last active January 27, 2018 18:15
Steps to configure Jupyter (iPython Notebook) with Python (3.5.1) and Spark (1.6.0) kernel on Mac OS X (El Capitan)

Install Python3, Scala and Apache Spark via Brew (http://brew.sh/)

brew update
brew install python3
brew install scala
brew install apache-spark

Set environment variables

@fran0x
fran0x / keybase.md
Created April 11, 2016 18:54
Github identity (Keybase)

Keybase proof

I hereby claim:

  • I am flopezlasanta on github.
  • I am flopezlasanta (https://keybase.io/flopezlasanta) on keybase.
  • I have a public key whose fingerprint is 55A8 3CF8 344E 834A 3E00 ED65 3FD4 E16E 77EA DB72

To claim this, I am signing this object:

@fran0x
fran0x / cheat-sheet-iterm2.md
Last active June 16, 2016 17:44
Cheat Sheet iTerm2

Cheat Sheet iTerm2

To install iTerm2 in OS X run brew install caskroom/cask/iterm2 (requires the almighty Homebrew installed first).

Action Command
Vertical split Command + d
Horizontal split Command + Shift + d
Close the screen Command + w
Move around screens Command + Alt + (up/down/left/right)
@fran0x
fran0x / Control.scala
Last active June 16, 2016 06:01
Utility code to auto-close resources (e.g. files)
// Control.using is used to automatically close any resource that has a close method
// note: from the book "Beginning Scala" (by David Pollak)
object Control {
import scala.language.reflectiveCalls
def using[A <: { def close(): Unit }, B](param: A)(f: A => B): B =
try {
f(param)
} finally {
param.close()
@fran0x
fran0x / Measure.scala
Last active June 16, 2016 06:01
Utility code for time measurement
// Measure.time is used to measure the time that takes to complete a block of code (in nanoseconds)
// note: this version does not return the result of calling that function; a different version should be created for that
object Measure {
def time(block: => Unit)={
val s = System.nanoTime
block
System.nanoTime - s
}
}
@fran0x
fran0x / digitalocean-swarm.sh
Last active March 16, 2023 13:22
Script to create a Docker Swarm cluster in Digital Ocean
#!/bin/bash
# Configuration
#export DIGITALOCEAN_ACCESS_TOKEN= # Digital Ocean Token (mandatory to provide)
export DIGITALOCEAN_SIZE=512mb # default
export DIGITALOCEAN_REGION=nyc3 # default
export DIGITALOCEAN_PRIVATE_NETWORKING=true # default=false
#export DIGITALOCEAN_IMAGE="ubuntu-15-04-x64" # default
# For other settings see defaults in https://docs.docker.com/machine/drivers/digital-ocean/
@fran0x
fran0x / spark-jobserver-docker-macos.md
Created July 1, 2016 07:32 — forked from jaceklaskowski/spark-jobserver-docker-macos.md
How to run spark-jobserver on Docker and Mac OS (using docker-machine)
@fran0x
fran0x / jvm-tools.md
Created July 1, 2016 07:32 — forked from jaceklaskowski/jvm-tools.md
I should have known these tools earlier - a story about jps, jstat and jmap

From http://stackoverflow.com/a/32393044/1305344:

object size extends App {
  (1 to 1000000).map(i => ("foo"+i, ()))
  val input = readLine("prompt> ")
}

Run it with sbt 'runMain size' and then use jps (to know the pids), jstat -gc pid (to query for gc) and jmap (similar to jstat) to analise resource allocation.

@fran0x
fran0x / spark.md
Last active August 8, 2016 07:20 — forked from jaceklaskowski/spark-intro.md
Introduction to Apache Spark

Introducting Apache Spark

  • What use cases are a good fit for Apache Spark? How to work with Spark?
    • create RDDs, transform them, and execute actions to get result of a computation
    • All computations in memory = "memory is cheap" (we do need enough of memory to fit all the data in)
      • the less disk operations, the faster (you do know it, don't you?)
    • You develop such computation flows or pipelines using a programming language - Scala, Python or Java <-- that's where ability to write code is paramount
    • Data is usually on a distributed file system like Hadoop HDFS or NoSQL databases like Cassandra
    • Data mining = analysis / insights / analytics
  • log mining
@fran0x
fran0x / machine-learning.md
Created July 10, 2016 17:22 — forked from jaceklaskowski/machine-learning.md
Machine Learning for the very Impatient

How much of machine learning is statistics and vice versa?

Learning using https://www.coursera.org/learn/machine-learning/home/welcome

  • machine learning = teaching a computer to learn concepts using data — without being explicitly programmed.
  • Supervised learning = "right answers" given
  • Regression problem
    • continuous valued output
    • deduce the function for a given data set and predict other values
  • "in regression problems, we are taking input variables and trying to map the output onto a continuous expected result function."