Last active
December 22, 2015 07:49
Simple asynchronous batch indexing with Sunspot. Currently an untested work-in-progress, I expect to refactor and contribute this to Sunspot proper.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# rails generate migration add_indexed_at_to_searchable_models | |
class AddIndexedAtToSearchableModels | |
TABLES = [ :articles, :authors, :comments ] | |
def self.up | |
TABLES.each do |name| | |
change_table(name) do |t| | |
t.integer :indexed_at | |
end | |
end | |
end | |
def self.down | |
TABLES.each do |name| | |
change_table(name) do |t| | |
t.remove :indexed_at | |
end | |
end | |
end | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# lib/tasks/indexer.rb | |
namespace :indexer do | |
task :run => :environment do | |
verbose = true | |
interval = 10.seconds | |
# Load the requested models, or all of Sunspot's searchable models | |
# TODO: error handling for invalid models or models not found | |
models = if ENV['MODELS'] | |
ENV['MODELS'].split(/,/).map{ |m| m.constantize } | |
else | |
Sunspot.searchable | |
end | |
if verbose | |
puts "Requested the following models: #{models.map{|m|m.name}.join(', ')}" | |
end | |
# Filter the models to those with an indexed_at column | |
models = models.select{ |m| m.columns.find{ |c| c.name == "indexed_at" }} | |
# Warn and exit if we don't have any models to work with | |
if models.blank? | |
puts "Your models must provide an indexed_at timestamp field" | |
exit(1) | |
end | |
# Infinite loop to look for new and updated objects and reindex them | |
loop do | |
last_run = Time.now | |
models.each do |model| | |
# Find batches of documents that have never been indexed, or have been | |
# updated since they were last indexed. | |
model.where('indexed_at IS NULL OR updated_at > indexed_at').find_in_batches do |batch| | |
batch = batch.select{ |record| record.indexable? } | |
Sunspot.index(batch) | |
batch.update_all('indexed_at = ?', Time.now) | |
end | |
end | |
# Poll on an interval | |
if Time.now - last_run < interval | |
sleep interval - (Time.now - last_run) | |
end | |
end | |
end | |
end | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Let me stress: this is an untested rough draft, for inspirational and discussion purposes only.