Skip to content

Instantly share code, notes, and snippets.

View fran0x's full-sized avatar

Francisco Lopez fran0x

View GitHub Profile
@fran0x
fran0x / digitalocean-swarm.sh
Last active March 16, 2023 13:22
Script to create a Docker Swarm cluster in Digital Ocean
#!/bin/bash
# Configuration
#export DIGITALOCEAN_ACCESS_TOKEN= # Digital Ocean Token (mandatory to provide)
export DIGITALOCEAN_SIZE=512mb # default
export DIGITALOCEAN_REGION=nyc3 # default
export DIGITALOCEAN_PRIVATE_NETWORKING=true # default=false
#export DIGITALOCEAN_IMAGE="ubuntu-15-04-x64" # default
# For other settings see defaults in https://docs.docker.com/machine/drivers/digital-ocean/
@fran0x
fran0x / Measure.scala
Last active June 16, 2016 06:01
Utility code for time measurement
// Measure.time is used to measure the time that takes to complete a block of code (in nanoseconds)
// note: this version does not return the result of calling that function; a different version should be created for that
object Measure {
def time(block: => Unit)={
val s = System.nanoTime
block
System.nanoTime - s
}
}
@fran0x
fran0x / Control.scala
Last active June 16, 2016 06:01
Utility code to auto-close resources (e.g. files)
// Control.using is used to automatically close any resource that has a close method
// note: from the book "Beginning Scala" (by David Pollak)
object Control {
import scala.language.reflectiveCalls
def using[A <: { def close(): Unit }, B](param: A)(f: A => B): B =
try {
f(param)
} finally {
param.close()
@fran0x
fran0x / cheat-sheet-iterm2.md
Last active June 16, 2016 17:44
Cheat Sheet iTerm2

Cheat Sheet iTerm2

To install iTerm2 in OS X run brew install caskroom/cask/iterm2 (requires the almighty Homebrew installed first).

Action Command
Vertical split Command + d
Horizontal split Command + Shift + d
Close the screen Command + w
Move around screens Command + Alt + (up/down/left/right)
@fran0x
fran0x / keybase.md
Created April 11, 2016 18:54
Github identity (Keybase)

Keybase proof

I hereby claim:

  • I am flopezlasanta on github.
  • I am flopezlasanta (https://keybase.io/flopezlasanta) on keybase.
  • I have a public key whose fingerprint is 55A8 3CF8 344E 834A 3E00 ED65 3FD4 E16E 77EA DB72

To claim this, I am signing this object:

@fran0x
fran0x / Spark_Jupyter_OS_X.md
Last active January 27, 2018 18:15
Steps to configure Jupyter (iPython Notebook) with Python (3.5.1) and Spark (1.6.0) kernel on Mac OS X (El Capitan)

Install Python3, Scala and Apache Spark via Brew (http://brew.sh/)

brew update
brew install python3
brew install scala
brew install apache-spark

Set environment variables

# load the "orders" table from Hive into a DataFrame
orders_df=sqlCtx.sql("select * from orders")
orders_df.printSchema()
# 1) calculate number of orders in SUSPECTED_FRAUD status
sqlCtx.select("select count(order_id) from orders where order_status='SUSPECTED_FRAUD'").show(5)
# load the "order_items" table from Hive into a DataFrame
order_items_df=sqlCtx.sql("select * from order_items")
order_items_df.printSchema()
# copy the Hive configuration file hive-site.xml to the spark configuration folder
# sudo cp /etc/hive/conf.dist/hive-site.xml /usr/lib/spark/conf/
# launch pyspark with the spark-csv package (note: version 1.2.0 has some issues thus better use 1.3.0)
# PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.3.0
# check dataframes are working
sqlCtx.createDataFrame([("somekey", 1)])
# load yelp dataset
@fran0x
fran0x / SimpleWordTokenizer.py
Last active August 29, 2015 14:23
Simple word tokenizer that returns a list of non-empty words in lowercase
def simpleWordTokenizer(string):
""" A simple (for-comprehension) implementation of input string tokenization
Args:
string (str): input string
Returns:
list: a list of tokens in lowercase and no empty strings
"""
return [x for x in re.split(split_regex, string.lower()) if x]
starWarsDarkSide = 'Only at the end do you realize the power of the Dark Side.'
@fran0x
fran0x / JoinDuplicatedLines.scala
Created June 15, 2015 13:27
Reads a CSV file with header and rows where first column is the key, writes into new file without header and where lines with duplicated key are merged into a single one
import java.io.File
import java.io.PrintWriter
import scala.annotation.migration
import scala.collection.immutable.ListMap
import scala.collection.mutable.Map
object JoinDuplicatedLines {
def main(args: Array[String]) {
val input = io.Source fromFile "input.csv"