Tested with
Python 2.7, Ubuntu 16.04 LTS, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
| [zookeeper] | |
| hostname=slave1.example.com | |
| hostname=slave2.example.com | |
| hostname=slave3.example.com | |
| port=2181 | |
| timeout=6 | |
| lock-path=/burrow/notifier | |
| [kafka "XX-prod"] | |
| broker=slave1.example.com |
| import org.elasticsearch.spark._ | |
| import org.apache.spark.sql._ | |
| //val sqlContext = new SQLContext(sc) | |
| val options = Map("pushdown" -> "true", "es.nodes" -> "host_ip_here", "es.port" -> "9200", | |
| "es.nodes.wan.only" -> "true") | |
| sqlContext.read.format("es").options(options).load("index_name").write.mode(SaveMode.Overwrite).json("path_to_output") | |
| sc.esRDD("index_name",options) | |
| sudo su | |
| mkdir -p /etc/elasticsearch/analysis | |
| cd /etc/elasticsearch/analysis | |
| wget http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz | |
| tar xvzf WNprolog-3.0.tar.gz | |
| mv prolog/wn_s.pl . | |
| rm -rf prolog | |
| rm -f WNprolog-3.0.tar.gz |
| Elasticsearch missing filter with nested objects |
Tested with
Python 2.7, Ubuntu 16.04 LTS, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
Tested with
Python 2.7, OS X 10.11.3 El Capitan, Apache Spark 2.1.0 & Hadoop 2.7
Download Apache Spark and build it or download the pre-built version.
| hadoop distcp -Dmapreduce.map.memory.mb=4096 -Dfs.s3a.awsAccessKeyId=XXX -Dfs.s3a.awsSecretAccessKey=XXXX -m 250 hdfs:///data/* s3a://api-v3-data-sources/output/ |
wget https://repo.continuum.io/archive/Anaconda2-4.3.1-MacOSX-x86_64.sh
bash Anaconda2-4.3.1-MacOSX-x86_64.sh
/Users/zeta/anaconda/bin/pip install matlab_kernel
| pip install -t dependencies -r requirements.txt | |
| cd dependencies | |
| zip -r ../dependencies.zip . |