Skip to content

Instantly share code, notes, and snippets.

@srinivasanHadoop
srinivasanHadoop / TikaFileInputFormat.java
Created October 31, 2013 06:28
i integrated the apache tika with Hadoop mapreduce code
package com.srini.tikacustom;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
@BHSPitMonkey
BHSPitMonkey / GIMP Batch Rotations.md
Last active December 26, 2015 04:59
GIMP Python Console: Batch Rotations

GIMP Python Console: Batch Rotations

Ever wanted to rotate several layers at once? Today I figured out how to randomize a handful of layer rotations using the Python Console in the GIMP.

import random
import math

# Only one image was open, so I just grabbed the first image from the list
image = gimp.image_list()[0]
import spark.util.Vector
import scala.math.sqrt
def cosineDist(a:Vector,b:Vector):Double = {
if(a.length==b.length){
(a dot b)/(sqrt(a.squaredDist(Vector.zeros(a.length))*b.squaredDist(Vector.zeros(b.length))))
}
@jpountz
jpountz / Recover.java
Last active December 22, 2015 10:48
File to restore a corrupted segment if the stored fields are not corrupted.
// Set codec, dir and segmentName accordingly to the segment you are trying to restore
Codec codec = new Lucene42Codec();
Directory dir = FSDirectory.open(new File("/tmp/test"));
String segmentName = "_0";
IOContext ioContext = new IOContext();
SegmentInfo segmentInfos = codec.segmentInfoFormat().getSegmentInfoReader().read(dir, segmentName, ioContext);
Directory segmentDir;
if (segmentInfos.getUseCompoundFile()) {
segmentDir = new CompoundFileDirectory(dir, IndexFileNames.segmentFileName(segmentName, "", IndexFileNames.COMPOUND_FILE_EXTENSION), ioContext, false);
@mkhludnev
mkhludnev / q=*%3A*&wt=csv
Last active December 1, 2022 01:28
sample nested documents to demonstrate block join indexing and searching in Solr 4.5
id,_version_,BRAND_s,_root_,type_s,COLOR_s,SIZE_s
12,,,10,,Blue,XL
11,,,10,,Red,XL
10,1445176108735528960,Nike,10,parent,,
22,,,20,,Blue,XL
21,,,20,,Red,M
20,1445176108738674688,Nike,20,parent,,
32,,,30,,Blue,M
31,,,30,,Red,XL
30,1445176108740771840,Puma,30,parent,,
@zacstewart
zacstewart / classifier.py
Last active September 19, 2024 23:56
Document Classification with scikit-learn
import os
import numpy
from pandas import DataFrame
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline
from sklearn.cross_validation import KFold
from sklearn.metrics import confusion_matrix, f1_score
NEWLINE = '\n'
# coding=UTF-8
from __future__ import division
import re
# This is a naive text summarization algorithm
# Created by Shlomi Babluki
# April, 2013
class SummaryTool(object):
@ccstone
ccstone / BBEdit-TextWrangler_RegEx_Cheat_Sheet.txt
Last active June 15, 2025 17:57
BBEdit-TextWrangler Regular Expression Cheat-Sheet
————————————————————————————————————————————————————————————————————————————————————————————————————
BBEdit / BBEdit-Lite / TextWrangler Regular Expression Guide Modified: 2018/08/10 01:19
————————————————————————————————————————————————————————————————————————————————————————————————————
NOTES:
The PCRE engine (Perl Compatible Regular Expressions) is what BBEdit and TextWrangler use.
Items I'm unsure of are marked '# PCRE?'. The list while fairly comprehensive is not complete.
@omz
omz / Evernote Installer.py
Created February 27, 2013 15:07
Evernote Installer
# Simple installer script for using the Evernote SDK in Pythonista
#
# This script should be run from the root directory. In order to keep things
# tidy, it installs the module and all its dependencies in a directory named
# 'evernote-sdk'. In order to be able to import it, you have to add that to
# your import path, like this:
#
# import sys
# sys.path.append('evernote-sdk')
#
@benwaldie
benwaldie / 2013-02-24-TUAW_Waldie-1.applescript
Created February 24, 2013 20:33
TUAW > AppleScript > Generate OmniFocus Email Followups from Contacts
-- "using terms from" is necessary to let AppleScript know that these event handlers are terminology that belongs to the Contacts app.
using terms from application "Contacts"
-- This handler returns the Contacts property for which the plug-in should function.
on action property
return "email"
end action property
-- This handler returns the name of the plug-in to be displayed in the Contacts property popup menu.
on action title