# Elasticsearch in Rails

Elasticsearch is a modern full-text search-engine with a RESTful interface that is very customizable, easy to use, easy to set up, fast and scalable. Here i show how i set it up in my rails environment.

• Powerful query language (nested queries, similarity queries, …)
• Realtime indexing (1 second delay by default)
• Configurable and accessible via REST and JSON
• ES is a key-value-store and can actually be used as such
• Schema-free
• An ES node automatically discovers other nodes in the same network and joins their cluster
• Sharding and replication is built-in
• Easy to install and run

First download it:

cd
VERSION=elasticsearch-0.90.7.zip
wget http://download.elasticsearch.org/elasticsearch/elasticsearch/$VERSION unzip$VERSION


Store the configuration files in your rails project:

$RAILS_APP app config services development$VERSION
elasticsearch.yml
logger.yml
test
production


I use seperate configuration files for each environment. I also run seperate instances for each environment. By using seperate config files you can optimize the configuration for each environment. E.g. the test environment does not need replicas and can also be a pure in-memory database since it gets wiped after the tests anyway. Copy the config files to your rails project:

cd $VERSION/config mkdir -p$RAILS_APP/config/services/development/$VERSION cp elasticsearch.yml$RAILS_APP/config/services/development/$VERSION  Some things to tweak in the configuration: # elasticsearch.yml cluster.name: my_app_development # make sure to have an unique name so you dont accidentally join some clusters node.name: '...' index.store.type: memory # for test index.refresh_interval: -1 # for test index.number_of_shards: 1 index.number_of_replicas: 0 network.host: 127.0.0.1 http.port: 9210 # for test discovery.zen.ping.multicast.enabled: false  Start elasticsearch with a custom configuration and generate a pid-file: $ES_DIR/bin/elasticsearch -f -Des.path.conf=$RAILS_APP/config/services/test/elasticsearch -p$RAILS_APP/tmp/pids/elasticsearch_test.pid


In your (rails) project add the stretcher-gem to your Gemfile:

gem 'stretcher'


Then write some initializer for ES:

#/config/initializers
module Search

if Rails.env.test?
ADDRESS = 'http://localhost:9210'
OPTIONS = {:log_level => :debug}
else
ADDRESS = 'http://localhost:9200'
OPTIONS = {}
end

def self.initialize_connection
server = Stretcher::Server.new(ADDRESS, OPTIONS)
server.up?
return server
end

def self.connection
@connection ||= initialize_connection
end

end


Create a directory for your search related classes in your (rails) project:

mkdir -p \$RAILS_APP/app/classes/search


The classes I use to implement all search-related features are the following:

Index                 # Defines an index (like a sql database)
IndexType             # Defines an index type for a model (like a sql table)
Definition            # A class that defines the mapping for a specific model/index
DocumentMapper        # Converts a model to a JSON document
Engine                # Base class for model-specific engines
Query                 # Used to construct an ES query
Highlighter           # Defines which parts of a search result should be highlighted
SearchResultAdapter   # Wrapper around a stretcher result that loads the model instances
FullReindexer         # Reindexes all documents, is run at night or after deploys
DirtyReindexer        # Reindexes changed documents


The Index-class is responsible for defining, creating and deleting the index, as well as indexing records in bulk. Both Index and IndexType are wrappers around the corresponding Stretcher-classes. The mapping from a database record to json is done by the DocumentMapper-class. For each model that has to be indexed in ES there is an corresponding DocumentMapper-class. There can be multiple DocumentMappers for a single model, so a model can be indexed in different ways and into different indices.

# Base class for all index-type-mapping-definitions
class Search::Definition

def general_type(type)
{ type: type, include_in_all: false }
end

def string_type
general_type('string')
end

def integer_type
general_type('integer')
end

def to_hash
raise NotImplementedError
end

end


This is straight forward. It declares a to_hash method and some helper methods that can be used in subclasses of this class. The to_hash method is used to return the raw definition hash that can be passed to ES.

An concrete definition would then look like this:

# Defines the mapping for users
class Search::Users::Definition < Search::Definition

def properties
{
id:       integer_type,
username: string_type
}
end

def to_hash
{
settings: { ... },
mappings: {
:User => {
dynamic: 'strict',
properties: properties
}
}
}
end

end


This next class is used to define indices. To define an index you just instantiate this class.

class Search::Index

attr_reader :name, :types

def initialize(name, types)
@name   = scoped_name(name)
@types  = types
@types.each { |t| t.index = self }
@index = connection.index(@name)
end

# returns merged mappings and settings of index-types stored in this index
def definition
@definition ||= types.inject({}) { |hash, type| Definition.merge_hash(hash, type) }
end

# delegate create/delete/bulk_index

# find the type that stores a certain model or a subtype of that model
def index_type_for(model)
types.find{ |t| model <= t.model } || raise("No index type found for #{model.name}")
end

# group the resources by class so that each can use an
# appropriate document-mapper to convert the resources
# to documents and index them.
def index_resources(resources)
resource.group_by(&:class).each do |cls, resources|
mapper  = index_type_for(cls).document_mapper.new
docs    = mapper.to_documents(resources)
bulk_index(docs)
end
end

private

# scopes the index name by rails env
def scoped_name(name)
[Rails.env, name].compact.join(' ')
end

end


So an index takes a name and some index types. The index contains the index types. You can query an index for all its contained types. Lets first see the IndexType-Class before we see how to create indices:

class Search::IndexType

# delegate methods to @type

def initialize(model, definition, mapper)
@model = model
@definition = definition
@mapper = mapper.new
end

# this is require due to the way the indexes and types are defined globally
def index=(idx)
@type = idx.type(model.name)
end
end

# Converts a single or multiple records to hashes that can be indexed in ES
class Search::DocumentMapper
def to_documents(records)
records.map{ |r| to_document(r) }
end

def to_document(record)
# returns a hash
end
end


An IndexType contsists of a model, the mapping definition and a mapper that is used to convert resources to documents. The application-wide indices can the be created like this:

module Search
# SNIP: see beginning of this blog-post

def self.main_index
@main_index ||= Search::Index.new(nil, [
Search::IndexType.new(User, Search::Users::Definition, Search::Users::Mapper)
])
end

def self.other_index
@other_index ||= Search::Index.new('other', [ ... ])
end
end


Now you can access your indices with Search.main_index and even list all the types/models that it contains with Search.main_index.types.map(&:model)

The Engine-class is responsible for firing off search queries and wrapping the result in a SearchResultAdapter-instance.

class Search::Engine
# delegate some methods to Stretcher::Server

def initialize(model_class)
# store model_class to pass it to SearchResultAdapter
end

def search(query_object, options)
# convert query object to JSON
# convert optional highlighter in options to JSON
# execute query with highlighter
# wrap the result with the result adapter using model_class
end
end


The QueryObject-class is the base-class for all ES-queries. It is passed to the engines search-method. The engine then invokes the call-method of the query-object to get a hash-representation of the query.

class Search::Query
def initialize(query_string, options)
end

def call
# returns the query as a hash
end
end


The SearchResultAdapter class wraps the raw Hashie::Mash object that is returned by stretcher. It mixes in the Enumerable module so that it is traversable and also implements record-loading for the document results:

module Search
class SearchResultAdapter

include Enumerable

attr_reader :options, :records

def initialize(result, options={})
@result = result || raise(ArgumentError, 'result was nil')
@options = options
load_records! if options[:load]
end

def total_count
@result.total
end

def length
documents.length
end

def documents
@result.documents
end

def ids
documents.map(&:_id)
end

def model_class
@options[:load]
end

def loaded?
@records.present?
end

def each(&block)
to_ary.each(&block)
end

def with_hits
load_records! if !loaded?

documents.zip(records).map do |doc, record|
OpenStruct.new(model_class.to_s.downcase => record, document: doc)
end
end

def each_with_hits(&block)
with_hits.each(&block)
end

# require for rails collection-rendering
def to_ary
if loaded?
with_hits
else
documents
end
end

private

def load_records!
records = model_class.all(id: ids)

mapping = records.inject({}) do |hash, record|
hash.merge(record.id => record)
end

@records = documents.map do |doc|
mapping[doc._id.to_i]
end
end

end
end


## Tips

You wonder why your document was not found by your query? Use the explain-api of ES. You pass in a document-id and the query and get an explanation of why the document was found or why it wasn’t found.

For simple searches use the match_query with a cutoff_frequency to treat stopwords correctly and with phrase_prefix to find partial matches. The query is very robust agains malformed input and does not throw exceptions like the query_string_query. Use the simple_query_string_query if you need simple search operator support. It is still very robust like the match_query. You can actually specify which operators should be allowed and which operators shouldn’t.