Skip to content

Instantly share code, notes, and snippets.

@zouzias
zouzias / lucenerdd-max-mind-cities.json
Last active October 17, 2016 21:06
Max mind cities example
{"paragraphs":[{"text":"// Add this in the interpreter\n// %dep\n// z.addRepo(\"Spark Packages Repo\").url(\"http://dl.bintray.com/spark-packages/maven\")\n// z.load(\"org.zouzias:spark-lucenerdd_2.11:0.2.1\")\n\nval citiesDF = sqlContext.read.parquet(\"s3://recordlinkage/world-cities-maxmind.parquet\")\ncitiesDF.cache\nval total = citiesDF.count\n\nprintln(s\"Cities: ${total}\")","dateUpdated":"2016-10-17T21:01:57+0000","config":{"colWidth":12,"editorMode":"ace/mode/scala","graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled":true},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1476736326171_94517929","id":"20161017-201243_2050124200","result":{"code":"SUCCESS","type":"TEXT","msg":"citiesDF: org.apache.spark.sql.DataFrame = [Country: string, City: string ... 5 more fields]\nres1: citiesDF.type = [Country: string, City: string ... 5 more fields]\ntotal: Long = 3173958\nCities: 3173958\n"},"dateCreated":"2016-10-17T20:32:06+0000","dateStarte
Originally:
https://gist.github.com/7565976a89d5da1511ce
Hi Donald (and Martin),
Thanks for pinging me; it's nice to know Typesafe is keeping tabs on this, and I
appreciate the tone. This is a Yegge-long response, but given that you and
Martin are the two people best-situated to do anything about this, I'd rather
err on the side of giving you too much to think about. I realize I'm being very
critical of something in which you've invested a great deal (both financially
<!-- for udp -->
<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashSocketAppender">
<host>logstash_server/host>
<port>logstash_port</port>
<encoder class="net.logstash.logback.encoder.LogstashEncoder" />
</appender>
<!-- for tcp -->
<appender name="LOGSTASH" class="net.logstash.logback.appender.LogstashTcpSocketAppender">
<host>logstash_server/host>
@zouzias
zouzias / lucenerdd-search-notebook.json
Created October 1, 2016 15:14
Spark LuceneRDD full text world cities search notebook
{"paragraphs":[{"text":"%dep\nz.addRepo(\"Spark Packages Repo\").url(\"http://dl.bintray.com/spark-packages/maven\")\nz.load(\"org.zouzias:spark-lucenerdd_2.11:0.2.0\")","dateUpdated":"2016-10-01T14:57:49+0000","config":{"colWidth":12,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{},"map":{"baseMapType":"Streets","isOnline":true,"pinCols":[]}},"enabled":true,"editorMode":"ace/mode/scala","title":true},"settings":{"params":{},"forms":{}},"apps":[],"jobName":"paragraph_1475323002559_-245827101","id":"20161001-115642_482200633","result":{"code":"ERROR","type":"TEXT","msg":"Must be used before SparkInterpreter (%spark) initialized\nHint: put this paragraph before any Spark code and restart Zeppelin/Interpreter"},"dateCreated":"2016-10-01T11:56:42+0000","dateStarted":"2016-10-01T14:57:49+0000","dateFinished":"2016-10-01T14:57:49+0000","status":"ERROR","progressUpdateIntervalMs":500,"focus":true,"$$hashKey":"object:1352","title":"Load Spark LuceneRDD Jars"},{"te
sed 's/^.\(.*\).$/\1/' filename.txt
@zouzias
zouzias / convertCitiesToWKT.py
Created August 11, 2016 17:27
Convert GeoJSON data to WKT
import json
import geojson
from shapely.geometry import shape
# Get cities from https://github.com/mahemoff/geodata/blob/master/cities.geojson
with open('cities.geojson') as json_data:
d = json.load(json_data)
for country in d['features']:
name = country['properties']['city']
@zouzias
zouzias / bamboo-to-slack.py
Created June 24, 2016 11:58 — forked from remmelt/bamboo-to-slack.py
Post an Atlassian Bamboo build result to Slack
#!/usr/bin/python
"""
Create a stage in your project, make it the last stage.
Make a task in the stage with this inline script:
#! /bin/bash
/some/path/bamboo-to-slack.py "${bamboo.planKey}" "${bamboo.buildPlanName}" "${bamboo.buildResultsUrl}"
@zouzias
zouzias / compileSpark_2.11.sh
Created May 30, 2016 14:39
How to compile Spark with 2.11 support
git clean -ffxd
SPARK_VERSION="1.5.1"
git checkout -f "v$SPARK_VERSION" --
sh dev/change-scala-version.sh 2.11
export MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m -XX:+CMSClassUnloadingEnabled"
mvn clean
./make-distribution.sh --tgz --skip-java-test -Pscala-2.11 -Pyarn -Phadoop-2.6 -DskipTests -Dspark.version=$SPARK_VERSION -Dscala.version=2.11.7 -DautoVersionSubmodules=true -U -Djline.version=2.13 -Djline.groupid=jline
@zouzias
zouzias / check_cron.json
Last active April 5, 2016 06:36
Install Sensu on Ubuntu
{
"checks": {
"cron": {
"command": "check-process.rb -p cron",
"standalone": true,
"interval": 60,
"handlers": ["default", "debug"]
}
}
}
// Elasticsearch 2.x *copies* joda-time code and patch it into their codebase. It causes several issues
// see https://www.elastic.co/blog/to-shade-or-not-to-shade
// assemblyShadeRules in assembly := Seq(
// ShadeRule.rename("org.joda.time.base.**" -> "org.elasticsearch.joda.time.@1").inLibrary("org.elasticsearch" % "elasticsearch" % "2.1.1").inProject
// )
~