This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| http://recsys.yoochoose.net/index.html | |
| https://github.com/g-rutter/Clikistreams | |
| https://github.com/opencypher/cypher-for-apache-spark | |
| http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer | |
| https://github.com/apache/tinkerpop/tree/master/spark-gremlin | |
| https://github.com/datastax/graph-examples | |
| https://www.datastax.com/dev/blog/dse-graph-frame | |
| https://neo4j.com/docs/cypher-refcard/current/ | |
| https://developer.teradata.com/aster/articles/aster-npath-guide |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| git clone https://github.com/jupyter-scala/jupyter-scala.git | |
| ./jupyter-scala | |
| #inside jupyter | |
| import $exclude.`org.slf4j:slf4j-log4j12`, $ivy.`org.slf4j:slf4j-nop:1.7.21` // for cleaner logs | |
| import $profile.`hadoop-2.6` | |
| import $ivy.`org.apache.spark::spark-sql:2.1.0` // adjust spark version - spark >= 2.0 | |
| import $ivy.`org.apache.hadoop:hadoop-aws:2.6.4` | |
| import $ivy.`org.jupyter-scala::spark:0.4.2` // for JupyterSparkSession (SparkSession aware of the jupyter-scala kernel) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark.sql.functions._ | |
| val df = spark | |
| .read | |
| .option("inferSchema", "true") | |
| .option("header", "true") | |
| .option("delimiter", ";") | |
| .csv("/Users/guilherme.braccialli/Desktop/simulado_1000_20k.csv") | |
| val Array(dfTrain, dfTest) = df.randomSplit(Array(0.7, 0.3), seed=3) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import scala.collection.parallel._ | |
| val df = (1 to 10).toDF | |
| val list = Seq(1 to 300).par | |
| list.tasksupport = new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(2)) | |
| val listR = list.map(l => df.withColumn("l", lit(l.toString)).groupBy("l").count.collect) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| //%python | |
| //import base64 | |
| //html64 = base64.b64encode(""" | |
| <div id=a>adfadfas</div> | |
| <script> | |
| function test(msg){ | |
| document.getElementById('a').innerHTML = msg; | |
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import sys, requests, json, time, datetime | |
| from subprocess import call | |
| def getLastEndJob(url, startTime): | |
| resp = requests.get(url + "/api/v1/applications") | |
| resp.encoding = 'utf-8' | |
| sparkuis = resp.json() | |
| kill = True | |
| msg = "" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| yarn node -list|sed -n "s/^\(ip[^:]*\):.*/\1/p" > /home/hadoop/nodes.txt | |
| < /home/hadoop/nodes.txt xargs -t -I{} -P10 scp -o StrictHostKeyChecking=no /etc/hadoop/conf/mapred-site.xml_worker {}:/tmp/mapred-site.xml | |
| < /home/hadoop/nodes.txt xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no {} "sudo cp -f /tmp/mapred-site.xml /etc/hadoop/conf/mapred-site.xml" | |
| < /home/hadoop/nodes.txt xargs -t -I{} -P10 scp -o StrictHostKeyChecking=no /etc/hadoop/conf/yarn-site.xml_worker {}:/tmp/yarn-site.xml | |
| < /home/hadoop/nodes.txt xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no {} "sudo cp -f /tmp/yarn-site.xml /etc/hadoop/conf/yarn-site.xml" | |
| < /home/hadoop/nodes.txt xargs -t -I{} -P10 ssh -o StrictHostKeyChecking=no {} "sudo stop hadoop-yarn-nodemanager" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| spark.master yarn | |
| spark.dynamicAllocation.enabled true | |
| spark.executor.memory 20G | |
| spark.executor.cores 4 | |
| spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2 | |
| spark.yarn.am.memory 2G | |
| spark.dynamicAllocation.cachedExecutorIdleTimeout 60s | |
| spark.yarn.executor.memoryOverhead 3G | |
| spark.dynamicAllocation.executorIdleTimeout 60s | |
| spark.driver.memory 10G |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #option 1 - start jupyter using pyspark | |
| export PYSPARK_DRIVER_PYTHON_OPTS="notebook" | |
| export PYSPARK_DRIVER_PYTHON=/mnt/lib/python/anaconda2/bin/ipython | |
| pyspark --queue queue3 | |
| #option 2 - vanilla jupyter with jars | |
| import os | |
| os.environ["SPARK_HOME"] = "/Downloads/spark-2.2.1-bin-hadoop2.7/" | |
| os.environ["SPARK_CLASSPATH"] = "/tmp/shared/postgresql-42.2.1.jar" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| spark.hadoop.fs.s3a.experimental.input.fadvise random | |
| spark.hadoop.fs.s3a.readahead.range 67108864 | |
| spark.hadoop.fs.s3a.connection.maximum 200 | |
| spark.hadoop.fs.s3a.connection.establish.timeout 2000000 | |
| spark.hadoop.fs.s3a.connection.timeout 2000000 |