Tested with
Python 2.7, Ubuntu 16.04 LTS, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
[zookeeper] | |
hostname=slave1.example.com | |
hostname=slave2.example.com | |
hostname=slave3.example.com | |
port=2181 | |
timeout=6 | |
lock-path=/burrow/notifier | |
[kafka "XX-prod"] | |
broker=slave1.example.com |
import org.elasticsearch.spark._ | |
import org.apache.spark.sql._ | |
//val sqlContext = new SQLContext(sc) | |
val options = Map("pushdown" -> "true", "es.nodes" -> "host_ip_here", "es.port" -> "9200", | |
"es.nodes.wan.only" -> "true") | |
sqlContext.read.format("es").options(options).load("index_name").write.mode(SaveMode.Overwrite).json("path_to_output") | |
sc.esRDD("index_name",options) | |
sudo su | |
mkdir -p /etc/elasticsearch/analysis | |
cd /etc/elasticsearch/analysis | |
wget http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz | |
tar xvzf WNprolog-3.0.tar.gz | |
mv prolog/wn_s.pl . | |
rm -rf prolog | |
rm -f WNprolog-3.0.tar.gz |
Elasticsearch missing filter with nested objects |
Tested with
Python 2.7, Ubuntu 16.04 LTS, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
Tested with
Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
hadoop distcp -Dmapreduce.map.memory.mb=4096 -Dfs.s3a.awsAccessKeyId=XXX -Dfs.s3a.awsSecretAccessKey=XXXX -m 250 hdfs:///data/* s3a://api-v3-data-sources/output/ |
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-MacOSX-x86_64.sh
bash Anaconda2-4.3.1-MacOSX-x86_64.sh
/Users/zeta/anaconda/bin/pip install matlab_kernel
pip install -t dependencies -r requirements.txt | |
cd dependencies | |
zip -r ../dependencies.zip . |