Skip to content

Instantly share code, notes, and snippets.

@ppearcy
ppearcy / gist:1661161
Created January 23, 2012 06:27
Tika PDFBox temporary file
public void parse(
InputStream stream, ContentHandler handler,
Metadata metadata, ParseContext context)
throws IOException, SAXException, TikaException {
File tmpFile = File.createTempFile("pdfbox-", ".tmp", null);
RandomAccess scratchFile = new RandomAccessFile(tmpFile, "rw");
PDDocument pdfDocument =
PDDocument.load(new CloseShieldInputStream(stream), scratchFile, true);
@ppearcy
ppearcy / gist:4658564
Created January 28, 2013 20:06
Elasticsearch 0.19.3 crash after updating hunspell jar and then deleting an index
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00002aaaab028f31, pid=6221, tid=1210988864
#
# JRE version: 6.0_21-b06
# Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode linux-amd64 )
# Problematic frame:
# C [libzip.so+0xaf31]
#
@ppearcy
ppearcy / gist:5294200
Created April 2, 2013 17:23
Code to dynamically set the number of shards per node for each elasticsearch index.
/**
* Iterate over the indexes and automatically set the index.routing.allocation.total_shards_per_node
* based on the total shards for the index and the number of data nodes that we have
*/
public void setTotalShardsPerNode() {
ClusterHealthResponse health = ESIndexer.es.client.admin().cluster().health(new ClusterHealthRequest()).actionGet();
// These values are used to decide what do do below
int numDataNodes = health.getNumberOfDataNodes();
int initShards = health.getInitializingShards();
@ppearcy
ppearcy / gist:5349534
Created April 9, 2013 21:24
Create a mapping and expand while allowing explicit field naming during expansion
curl -XDELETE 'http://localhost:9200/test/'
curl -XPOST localhost:9200/test/ -d '{
"mappings" : {
"test":{
"properties" : {
"multi_test" : {"type":"string", "index_name":"multi_test.string"}
}
}}
}
@ppearcy
ppearcy / gist:5349703
Last active December 16, 2015 00:39
It looks like there can be some implicit conflicts around index_name and multi-fields. This is likely an edge case, but could cause some very funky behavior.
curl -XDELETE 'http://localhost:9200/test/'
curl -XPOST localhost:9200/test/ -d '{
"mappings" : {
"test":{
"properties" : {
"multi_test" : {"type":"string", "index_name":"multi_test.string"}
}
}}
}
@ppearcy
ppearcy / gist:8480777
Created January 17, 2014 20:26
Ansible error
2014-01-17 15:18:01,404 p=5873 u=ppearcy |
2014-01-17 15:18:01,404 p=5873 u=ppearcy | /usr/local/Cellar/ansible/1.4.4/libexec/bin/ansible saf-local -vvvv -i ../hosts -m service -a name=isim state=started
2014-01-17 15:18:01,404 p=5873 u=ppearcy |
2014-01-17 15:18:01,440 p=5873 u=ppearcy | <192.168.100.10> ESTABLISH CONNECTION FOR USER: ppearcy
2014-01-17 15:18:01,440 p=5873 u=ppearcy | <192.168.100.10> EXEC ['ssh', '-tt', '-vvv', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=60s', '-o', 'ControlPath=/Users/ppearcy/.ansible/cp/ansible-ssh-%h-%p-%r', '-o', 'StrictHostKeyChecking=no', '-o', 'Port=22', '-o', 'KbdInteractiveAuthentication=no', '-o', 'PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey', '-o', 'PasswordAuthentication=no', '-o', 'ConnectTimeout=10', '192.168.100.10', "/bin/sh -c 'mkdir -p $HOME/.ansible/tmp/ansible-1389989881.44-54951323270350 && chmod a+rx $HOME/.ansible/tmp/ansible-1389989881.44-54951323270350 && echo $HOME/.ansible/tmp/ansible-1389989881.44-549513
{
"query" : {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{"term": {
"aggregations": {
"user_activity": {
"buckets": [
{
"key": "c2c5b6cb-d33a-4e94-bd6a-982b25c986e0",
"doc_count": 205,
"weekly": {
"buckets": [
{
"key_as_string": "2014-08-04T00:00:00.000Z",
@ppearcy
ppearcy / gist:c5d969326b9e6ace8046
Created October 27, 2014 06:34
Elasticsearch get bounded port
val nodeRequest = new NodesInfoRequestBuilder(testCluster.esClient.admin().cluster()).all()
val nodeResponse = nodeRequest.execute().get()
val remoteAddress = nodeResponse.remoteAddress()
val nodesInfo = nodeResponse.getNodes.toList.head
val inetString = nodesInfo.getTransport.address().publishAddress().toString
val pattern =
"""
|inet\[\/\d+\.\d+\.\d+\.\d+\:(\d+)\]
""".stripMargin.r
val finallyPort = pattern.findFirstMatchIn(inetString).get.group(1)
@ppearcy
ppearcy / gist:39e40489ab709e58d9a3
Created July 21, 2015 06:48
Schema with logicalType that can cause maven avro compiler error
{
"namespace": "schema.common",
"type": "record",
"name": "Action",
"fields": [
{
"name": "name",
"type": "string"
},
{