Skip to content

Instantly share code, notes, and snippets.

@smerchek
smerchek / clean.sh
Created June 5, 2013 21:52
Remove all but most recent number of files in a directory
# Keep 4 most recent files/sub-directories
# source: http://superuser.com/a/260332
ls -r1 | tail -n +5 | xargs rm -rf

Using the JSON output of a Solr field analysis, to visualize the Lucene indexing/analysis pipeline.

@smerchek
smerchek / setup.sh
Created May 9, 2013 19:12
setting up a machine to build deb packages with fpm and fpm-cook
#setting up a machine to build deb packages
#http://lenni.info/blog/2012/05/installing-ruby-1-9-3-on-ubuntu-12-04-precise-pengolin/
sudo apt-get update
sudo apt-get install ruby1.9.1 ruby1.9.1-dev \
rubygems1.9.1 irb1.9.1 ri1.9.1 rdoc1.9.1 \
build-essential libopenssl-ruby1.9.1 libssl-dev zlib1g-dev
sudo update-alternatives --install /usr/bin/ruby ruby /usr/bin/ruby1.9.1 400 \
--slave /usr/share/man/man1/ruby.1.gz ruby.1.gz \
@smerchek
smerchek / lucene-analysis.md
Last active December 17, 2015 02:39
Strangeloop proposals

##Breaking Down the Lucene Analysis Process

The Lucene analysis process is very powerful, but most of us only know enough of the basics to put together a simple analyzer chain. Search isn't always plug-and-play, and the ability to manipulate and compose tokenizers and token filters will be the differentiator in developing your search product.

Using visualizations of the analysis chain, I will break down the Lucene analysis process to its most basic parts: char filters, tokenizers, and token filters. I'll show how differences in the composition of the token filters affects the final output. We'll see how tokens are more than just a stream; that they can become a token graph using synonyms and generating word parts.

##Reviewer Comments

I've been working directly with Lucene for the past year, implementing Softek's proprietary ranking algorithm for searching radiology documents. In the process, I've submitted patches or extended core Lucene and Solr code. I've implemented our own query parser extension and

@smerchek
smerchek / puppet-session-proposal.md
Last active December 11, 2015 21:19
Session Proposal - Puppet Intro

####DevOps: Automating Your Infrastructure with Puppet Puppet is an open source project built by PuppetLabs (http://puppetlabs.com) to automate the management of your IT infrastructure. Whether you manage a hosted environment or you run your own servers in-house, Puppet can help alleviate management headaches. Puppet lets your declaratively describe what a machine should look like, and then makes it happen (and makes sure it stays that way). This talk will go over the basics of Puppet, including: how to get started, the essentials of Puppet modules, using existing modules on the Puppet Forge, running Puppet on Windows. It will also touch on how to write a basic module.

@smerchek
smerchek / advanced-solr-session.md
Last active December 11, 2015 21:19
Session Proposal - Advanced Solr and Lucene

####Beyond the Basics: Lucene and Solr If you are already using Lucene and/or Solr (or even ElasticSearch), then this is the talk for you. We will go beyond the basics of these brilliant open source search platforms. Not only are there many ways to customize Solr through the standard configuration file, but there is so much more. Payloads offer up many possibilities for customization, including the ability to tag word with part of speech information. There is also a lot of ways to extend Lucene and Solr by creating your own filters, query parsers, tokenizers, token filters, and even highlighters with some simple Java code. If search is a core feature of your application, then you need to be using these advanced features to set yourself apart.

@smerchek
smerchek / gist:4032417
Created November 7, 2012 15:56
Linux: Run a shell script with an answer file
#each line in answers.txt represents a single answer.
sh pkg.sh < answers.txt
@smerchek
smerchek / site.pp
Created November 7, 2012 15:53
Puppet: Install a package with an answer file
#source: http://projects.puppetlabs.com/projects/1/wiki/debian_preseed_patterns
file { "/tmp/file.preseed":
source => 'puppet:///modules/modulename/file.preseed',
mode => 600,
backup => false,
}
package { 'packagename':
responsefile => '/tmp/file.preseed',
@smerchek
smerchek / site.pp
Created November 7, 2012 15:49
Puppet: Require apt-get update before packages
#source: http://johnleach.co.uk/words/771/puppet-dependencies-and-run-stages
exec { "apt-update":
command => "/usr/bin/apt-get update"
}
Exec['apt-update'] -> Package <| |>
@smerchek
smerchek / index.html
Created June 12, 2012 02:54
Comment Activity on the Trello Development Board in the Last 30 Days
<!DOCTYPE html>
<meta charset="utf-8">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
<script src="http://d3js.org/d3.v2.js?2.9.1"></script>
<script src="https://raw.github.com/timrwood/moment/1.6.2/min/moment.min.js"></script>
<style>
html {
font-family: Arial, Helvetica, sans-serif;
font-size: 10pt;
}