https://github.com/ankane/searchkick
By default, simply adding the call 'searchkick' to a model will do an unclever indexing of all fields (but not has_many or belongs_to attributes).
In practice, you'll need to customize what gets indexed. This is done by defining a method on your model called search_data
def search_data
{
id: id;
stringified_id: id.to_s,
tags: tags.join(" "),
user: user.full_name,
pass_rate: calculate_pass_rate
}
end
When you change the search_data hash structure, you'll need to reindex that model. You can do that in the rails console by typing Model.reindex
but you can also use the rake task searchkick:reindex:all
, or index just one specific model.
In all recent versions of Elasticsearch, you need to explicitly specify the fields you'll search.
Your search should look something like this:
ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])
By default, an integer field can only be searched as an integer, but if you coerce the field to be a string it's searchable with full text search.
def search_data
{
id: id;
_stringified_id: id.to_s,_
}
end
Your search should look something like this:
ModelName.search(query, fields: ['stringified_id', 'name', 'description', ...])
It's worth noting that because you can use non-string types (including arrays of non-string types), it sometimes comes in handly to do more of your searching/filtering in Elastic than in Postgres. You can combine a full text query with some specific fields.
def search_data
{
blog_id: blog_id,
author: user.name,
author_id: user.id,
publish_year: publish_at.year,
publish_month: publish_at.month,
publish_day: publish_at.day,
publish_at: publish_at,
created_at: created_at,
updated_at: updated_at,
tags: tag_list,
story: story,
title: title,
approved: approved
}
end
Then a flexible, type-aware search that still does full text search on some fields, like title and story:
search_params = { approved: true, publish_at: { lte: 'now/m' } }
search_params = search_params.merge(blog_id: @blog.id) if @blog.present?
search_params = search_params.merge(publish_year: @year) if @year.present?
search_params = search_params.merge(publish_month: @month) if @month.present?
search_params = search_params.merge(publish_day: @day) if @day.present?
search_params = search_params.merge(tags: {all: @tags}) if @tags.present?
if @query.present?
@posts = Post.search(@query, fields: [:title, :story], where: search_params, page: params[:page])
else
@posts = Post.search(fields: [:title, :story], where: search_params, page: params[:page], per_page: 20,
order: [{publish_at: :desc}])
end
logger.info({ query: @query, params: search_params })
Define a scope by this name, and invoke appropriate #joins or #includdes.
scope :search_import, -> { includes(study_tracking: study_tracking_details) }
Similar to the above solution,
scope :search_import, -> { where(deleted: false) }
However, this scope is used only for batch import. When an individual entity is saved, it is updated separately, so you'll also want to implement:
def should_index?
!deleted
end
By default, misspelling-gentle search is turned on in searchkick. So the two ways to reduce unwanted search results are to turn off or adjust the misspelling-friendly feature, or to query with a relevancy score filter.
For example, UserCourse.search(params[:query], { fields: ["name^5", "id"], misspellings: {below: 5}
Alternatively, MyModel.search query, body_options: {min_score: 1}
tunes out a lot of noise.
Make sure the index includes this configuration for the field you want:
searchkick word_start: [:name]
Match word start on a specific search:
UserCourse.search(query,
fields: ['stringified_id', 'name', 'description', ...]
match: :word_start
)
Make sure the index includes this configuration for the field you want:
searchkick word_middle: [:name]
Then for the search:
UserCourse.search(query,
fields: ['stringified_id', 'name', 'description', ...]
match: :word_middle
)
Don't match "新潟大学" organization with "新人" (Disabling ambiguity)
UserCourse.search(query,
fields: [{email: :exact}, :name]
match: :word_middle
)
This is a case sensitive search, however, and probably not exactly what you want. More likely you'll want a tokenizer to treat an email address as a single word, which is a little more complicated. An article below covers this, but requires a custom mapping to implement, and a reconfigured analzyer.
https://medium.com/linagora-engineering/searching-email-address-in-elasticsearch-3b09a11e3c2b
This will require something like:
searchkick merge_mappings: true, mappings: {...}
And may require using an explicit search body.
However one solution to avoid this complexity would be to use the exact matching above, and index the field as lowercase, and maybe to pre-filter strings that look like email addresses in queries to lower-case.
By default searches are case insensitive. To override that for everything, you can alter the searchkick call searchkick case_sensitive: [:field :list]
, or use exact matching:
UserCourse.search(query,
fields: [{my_field: :exact}, :other_field]
While there's reasonable support out of the box for Japanese search, you can get additional features with the elasticsearch analysis-kuromoji plugin.
searchkick language: "japanese"
If you go down this route, and want to support multiple analyzers, you need to use the searchkick mappings feature and multiple fields. It's not terribly hard, but it's more involved than a quick FAQ can handle.
See https://www.elastic.co/guide/en/elasticsearch/guide/current/mixed-lang-fields.html for some possible options, and the searchkick docs for how to do custom mappings and custom/advanced search.
Generally combinations are supported by choosing the right field to query. Most of the parameters that normally take a symbol can be replaced with a hash from that symbol to various options. ( https://github.com/ankane/searchkick will have better examples than I can provide).
In principle, you can create several fields that have their own analyzers and behaviors. When you build up the Search call, you can combine options. I'm not aware of specific incompatibilities but relevancy weighting may appear better or worse depending on the user's expectations. So for example, if you have a dilemma about how to search something, you could potentially use very dissimilar pseudo_fields with different search rules, and just include all of them, with potentially different boosting rules, in your search call.
In conjunction with a scheduled background job, you can call ModelName.reindex(:custom_reindexer) and have a method like that returns only the fields that need special treatment.
def custom_reindexer
{
just_the_field_that_matters: calculation_method
}
end
Thanks for this document, it helped me with defining some of the scopes for my model definitions!