Elasticsearch in Rails
Elasticsearch is a modern full-text search-engine with a RESTful interface that is very customizable, easy to use, easy to set up, fast and scalable. Here i show how i set it up in my rails environment.
- Powerful query language (nested queries, similarity queries, …)
- Realtime indexing (1 second delay by default)
- Configurable and accessible via REST and JSON
- ES is a key-value-store and can actually be used as such
- Schema-free
- An ES node automatically discovers other nodes in the same network and joins their cluster
- Sharding and replication is built-in
- Easy to install and run
First download it:
cd
VERSION=elasticsearch-0.90.7.zip
wget http://download.elasticsearch.org/elasticsearch/elasticsearch/$VERSION
unzip $VERSION
Store the configuration files in your rails project:
$RAILS_APP
app
config
services
development
$VERSION
elasticsearch.yml
logger.yml
test
production
I use separate configuration files for each environment. I also run separate instances for each environment. By using separate config files you can optimize the configuration for each environment. E.g. the test environment does not need replicas and can also be a pure in-memory database since it gets wiped after the tests anyway. Copy the config files to your rails project:
cd $VERSION/config
mkdir -p $RAILS_APP/config/services/development/$VERSION
cp elasticsearch.yml $RAILS_APP/config/services/development/$VERSION
Some things to tweak in the configuration:
# elasticsearch.yml
cluster.name: my_app_development # make sure to have an unique name so you dont accidentally join some clusters
node.name: '...'
index.store.type: memory # for test
index.refresh_interval: -1 # for test
index.number_of_shards: 1
index.number_of_replicas: 0
network.host: 127.0.0.1
http.port: 9210 # for test
discovery.zen.ping.multicast.enabled: false
Start elasticsearch with a custom configuration and generate a pid-file:
$ES_DIR/bin/elasticsearch -f -Des.path.conf=$RAILS_APP/config/services/test/elasticsearch -p $RAILS_APP/tmp/pids/elasticsearch_test.pid
In your (rails) project add the stretcher-gem to your Gemfile:
gem 'stretcher'
Then write some initializer for ES:
#/config/initializers
module Search
if Rails.env.test?
ADDRESS = 'http://localhost:9210'
OPTIONS = {:log_level => :debug}
else
ADDRESS = 'http://localhost:9200'
OPTIONS = {}
end
def self.initialize_connection
server = Stretcher::Server.new(ADDRESS, OPTIONS)
server.up?
return server
end
def self.connection
@connection ||= initialize_connection
end
end
Create a directory for your search related classes in your (rails) project:
mkdir -p $RAILS_APP/app/classes/search
The classes I use to implement all search-related features are the following:
Index # Defines an index (like a sql database)
IndexType # Defines an index type for a model (like a sql table)
Definition # A class that defines the mapping for a specific model/index
DocumentMapper # Converts a model to a JSON document
Engine # Base class for model-specific engines
Query # Used to construct an ES query
Highlighter # Defines which parts of a search result should be highlighted
SearchResultAdapter # Wrapper around a stretcher result that loads the model instances
FullReindexer # Reindexes all documents, is run at night or after deploys
DirtyReindexer # Reindexes changed documents
The Index-class is responsible for defining, creating and deleting the index, as well as indexing records in bulk. Both Index and IndexType are wrappers around the corresponding Stretcher-classes. The mapping from a database record to json is done by the DocumentMapper-class. For each model that has to be indexed in ES there is an corresponding DocumentMapper-class. There can be multiple DocumentMappers for a single model, so a model can be indexed in different ways and into different indices.
# Base class for all index-type-mapping-definitions
class Search::Definition
def general_type(type)
{ type: type, include_in_all: false }
end
def string_type
general_type('string')
end
def integer_type
general_type('integer')
end
def to_hash
raise NotImplementedError
end
end
This is straight forward. It declares a to_hash
method and some helper methods that can be used in subclasses of this class. The to_hash
method is used to return the raw definition hash that can be passed to ES.
An concrete definition would then look like this:
# Defines the mapping for users
class Search::Users::Definition < Search::Definition
def properties
{
id: integer_type,
username: string_type
}
end
def to_hash
{
settings: { ... },
mappings: {
:User => {
dynamic: 'strict',
properties: properties
}
}
}
end
end
This next class is used to define indices. To define an index you just instantiate this class.
class Search::Index
attr_reader :name, :types
def initialize(name, types)
@name = scoped_name(name)
@types = types
@types.each { |t| t.index = self }
@index = connection.index(@name)
end
# returns merged mappings and settings of index-types stored in this index
def definition
@definition ||= types.inject({}) { |hash, type| Definition.merge_hash(hash, type) }
end
# delegate create/delete/bulk_index
# find the type that stores a certain model or a subtype of that model
def index_type_for(model)
types.find{ |t| model <= t.model } || raise("No index type found for #{model.name}")
end
# group the resources by class so that each can use an
# appropriate document-mapper to convert the resources
# to documents and index them.
def index_resources(resources)
resource.group_by(&:class).each do |cls, resources|
mapper = index_type_for(cls).document_mapper.new
docs = mapper.to_documents(resources)
bulk_index(docs)
end
end
private
# scopes the index name by rails env
def scoped_name(name)
[Rails.env, name].compact.join(' ')
end
end
So an index takes a name and some index types. The index contains the index types. You can query an index for all its contained types. Lets first see the IndexType
-Class before we see how to create indices:
class Search::IndexType
# delegate methods to @type
def initialize(model, definition, mapper)
@model = model
@definition = definition
@mapper = mapper.new
end
# this is require due to the way the indexes and types are defined globally
def index=(idx)
@type = idx.type(model.name)
end
end
# Converts a single or multiple records to hashes that can be indexed in ES
class Search::DocumentMapper
def to_documents(records)
records.map{ |r| to_document(r) }
end
def to_document(record)
# returns a hash
end
end
An IndexType
contsists of a model, the mapping definition and a mapper that is used to convert resources to documents.
The application-wide indices can the be created like this:
module Search
# SNIP: see beginning of this blog-post
def self.main_index
@main_index ||= Search::Index.new(nil, [
Search::IndexType.new(User, Search::Users::Definition, Search::Users::Mapper)
])
end
def self.other_index
@other_index ||= Search::Index.new('other', [ ... ])
end
end
Now you can access your indices with Search.main_index
and even list all the types/models that it contains with Search.main_index.types.map(&:model)
The Engine-class is responsible for firing off search queries and wrapping the result in a SearchResultAdapter-instance.
class Search::Engine
# delegate some methods to Stretcher::Server
def initialize(model_class)
# store model_class to pass it to SearchResultAdapter
end
def search(query_object, options)
# convert query object to JSON
# convert optional highlighter in options to JSON
# execute query with highlighter
# wrap the result with the result adapter using model_class
end
end
The QueryObject-class is the base-class for all ES-queries. It is passed to the engines search-method. The engine then invokes the call-method of the query-object to get a hash-representation of the query.
class Search::Query
def initialize(query_string, options)
end
def call
# returns the query as a hash
end
end
The SearchResultAdapter class wraps the raw Hashie::Mash object that is returned by stretcher. It mixes in the Enumerable module so that it is traversable and also implements record-loading for the document results:
module Search
class SearchResultAdapter
include Enumerable
attr_reader :options, :records
def initialize(result, options={})
@result = result || raise(ArgumentError, 'result was nil')
@options = options
load_records! if options[:load]
end
def total_count
@result.total
end
def length
documents.length
end
def documents
@result.documents
end
def ids
documents.map(&:_id)
end
def model_class
@options[:load]
end
def loaded?
@records.present?
end
def each(&block)
to_ary.each(&block)
end
def with_hits
load_records! if !loaded?
documents.zip(records).map do |doc, record|
OpenStruct.new(model_class.to_s.downcase => record, document: doc)
end
end
def each_with_hits(&block)
with_hits.each(&block)
end
# require for rails collection-rendering
def to_ary
if loaded?
with_hits
else
documents
end
end
private
def load_records!
records = model_class.all(id: ids)
mapping = records.inject({}) do |hash, record|
hash.merge(record.id => record)
end
@records = documents.map do |doc|
mapping[doc._id.to_i]
end
end
end
end
Tips
You wonder why your document was not found by your query? Use the explain-api of ES. You pass in a document-id and the query and get an explanation of why the document was found or why it wasn’t found.
For simple searches use the match_query
with a cutoff_frequency
to treat stopwords correctly and with phrase_prefix
to find partial matches. The query is very robust agains malformed input and does not throw exceptions like the query_string_query
. Use the simple_query_string_query
if you need simple search operator support. It is still very robust like the match_query
. You can actually specify which operators should be allowed and which operators shouldn’t.