Skip to content

Instantly share code, notes, and snippets.

View alonsoir's full-sized avatar
👋
time to learn python by doing experiments with AI.

@alonso_isidoro alonsoir

👋
time to learn python by doing experiments with AI.
View GitHub Profile
@alonsoir
alonsoir / JavaDemo.java
Created September 25, 2015 14:59 — forked from jacek-lewandowski/JavaDemo.java
Java API for Spark Cassandra Connector - tutorial for blog post
package com.datastax.spark.demo;
import com.datastax.driver.core.Session;
import com.datastax.spark.connector.cql.CassandraConnector;
import com.google.common.base.Optional;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;

10 Scala One Liners to Impress Your Friends

Here are 10 one-liners which show the power of scala programming, impress your friends and woo women; ok, maybe not. However, these one liners are a good set of examples using functional programming and scala syntax you may not be familiar with. I feel there is no better way to learn than to see real examples.

Updated: June 17, 2011 - I'm amazed at the popularity of this post, glad everyone enjoyed it and to see it duplicated across so many languages. I've included some of the suggestions to shorten up some of my scala examples. Some I intentionally left longer as a way for explaining / understanding what the functions were doing, not necessarily to produce the shortest possible code; so I'll include both.

1. Multiple Each Item in a List by 2

The map function takes each element in the list and applies it to the corresponding function. In this example, we take each element and multiply it by 2. This will return a list of equivalent size, compare to o

// mydf.count()
// 63385686
val mydf = sqlContext.read.parquet("ParaMarina/sensEnriched.parquet")
mydf.cache
val r = scala.util.Random
import org.apache.spark.sql.functions.udf
@alonsoir
alonsoir / parse_properties.scala
Created May 27, 2017 20:33 — forked from ninthdrug/parse_properties.scala
Parsing java properties files with Scala
import scala.io.Source.fromFile
def parseProperties(filename: String): Map[String,String] = {
val lines = fromFile(filename).getLines.toSeq
val cleanLines = lines.map(_.trim).filter(!_.startsWith("#")).filter(_.contains("="))
cleanLines.map(line => { val Array(a,b) = line.split("=",2); (a.trim, b.trim)}).toMap
}
@alonsoir
alonsoir / output
Last active September 11, 2017 09:24
playing with Word2Vec using scala, spark-2.2.0 and my cv...
output of previous commands:
Text: [Alonso Isidoro Román.] =>
Vector: [-0.04789555072784424,-0.09852258116006851,0.13238833844661713]
Text: [(+34) 667 519 829 ♦ skype id: alonso.isidoro.roman] =>
Vector: [0.10119100660085678,-0.16546553373336792,-0.02654876373708248]
Text: [[email protected] ♦ http://www.linkedin.com/pub/alonso-isidoro-roman/45/574/8ab] =>
Vector: [0.14931941032409668,-0.11237160116434097,-0.040140967816114426]
@alonsoir
alonsoir / gist:bf284f731b3a2dd1fc1f4a66ade9a9c6
Created September 11, 2017 09:24
playing with my cv and Word2Vec
import org.apache.spark.ml.feature.Word2Vec
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.sql.Row
val documentDF = sc.textFile("/home/aroman/Descargas/my-cv.txt").flatMap(_.split(" ")).map(word=> Array(word.mkString(" "))).toDF("text")
// why do i have to use this vector size? this vector size means the dimensionality of the feature vector
// and the mincount?
val word2Vec = new Word2Vec()
.setInputCol("text")
.setOutputCol("result")
try {
model.findSynonyms("alonso",1)
} catch {
case e: IllegalStateException => println("oooops!")
case _ : Throwable => println("wtf!!!")
} finally {
println("finally...")
}
@alonsoir
alonsoir / gist:9a3d76913b756661ad62f9fc54d0260f
Last active September 12, 2017 09:39
comando spark-shell con las dependencias necesarias siguiendo el ejemplo de https://github.com/alonsoir/food2vec/blob/master/build.gradle
aroman@aroman:~/spark-distros/spark-2.2.0-bin-hadoop2.7$ clear && bin/spark-shell --driver-memory 1g --packages "org.deeplearning4j:deeplearning4j-core:0.9.1,org.deeplearning4j:deeplearning4j-ui:0.6.0,org.deeplearning4j:deeplearning4j-nlp:0.9.1,org.nd4j:nd4j-native:0.9.1,com.typesafe.scala-logging:scala-logging_2.11:3.7.2"
# To run crossdata in local mode:
aroman@aroman:~/spark-distros/spark-2.1.0-bin-hadoop2.7$ bin/spark-shell --driver-memory 1g --jars /home/aroman/stratio-internal-projects/crossdata-core_2.11-2.5.2-jar-with-dependencies.jar
@alonsoir
alonsoir / install-kafka.txt
Created May 2, 2018 11:38 — forked from jarrad/install-kafka.txt
Install Kafka on OSX via Homebrew
$> brew cask install java
$> brew install kafka
$> vim ~/bin/kafka
# ~/bin/kafka
#!/bin/bash
zkServer start
kafka-server-start.sh /usr/local/etc/kafka/server.properties
@alonsoir
alonsoir / clean_docker.sh
Created May 25, 2018 16:14 — forked from urodoz/clean_docker.sh
Cleans the old images and exited containers
# Clean the exited containers
docker rm $(sudo docker ps -a | grep Exit | cut -d ' ' -f 1)
# Clean the untagged images (or old images)
docker rmi $(docker images | tail -n +2 | awk '$1 == "<none>" {print $'3'}')
# On Docker 1.9+ you can remove the orphan volumes with the next command
docker volume rm $(docker volume ls -qf dangling=true)