Elasticsearch in Rails

Elasticsearch is a modern full-text search-engine with a RESTful interface that is very customizable, easy to use, easy to set up, fast and scalable. Here i show how i set it up in my rails environment.

  • Powerful query language (nested queries, similarity queries, …)
  • Realtime indexing (1 second delay by default)
  • Configurable and accessible via REST and JSON
  • ES is a key-value-store and can actually be used as such
  • Schema-free
  • An ES node automatically discovers other nodes in the same network and joins their cluster
  • Sharding and replication is built-in
  • Easy to install and run

First download it:

cd
VERSION=elasticsearch-0.90.7.zip
wget http://download.elasticsearch.org/elasticsearch/elasticsearch/$VERSION
unzip $VERSION

Store the configuration files in your rails project:

$RAILS_APP
  app
  config
    services
      development
        $VERSION
          elasticsearch.yml
          logger.yml
      test
      production

I use separate configuration files for each environment. I also run separate instances for each environment. By using separate config files you can optimize the configuration for each environment. E.g. the test environment does not need replicas and can also be a pure in-memory database since it gets wiped after the tests anyway. Copy the config files to your rails project:

cd $VERSION/config
mkdir -p $RAILS_APP/config/services/development/$VERSION
cp elasticsearch.yml $RAILS_APP/config/services/development/$VERSION

Some things to tweak in the configuration:

# elasticsearch.yml
cluster.name: my_app_development              # make sure to have an unique name so you dont accidentally join some clusters
node.name: '...'
index.store.type: memory                      # for test
index.refresh_interval: -1                    # for test
index.number_of_shards: 1
index.number_of_replicas: 0
network.host: 127.0.0.1
http.port: 9210                               # for test
discovery.zen.ping.multicast.enabled: false

Start elasticsearch with a custom configuration and generate a pid-file:

$ES_DIR/bin/elasticsearch -f -Des.path.conf=$RAILS_APP/config/services/test/elasticsearch -p $RAILS_APP/tmp/pids/elasticsearch_test.pid

In your (rails) project add the stretcher-gem to your Gemfile:

gem 'stretcher'

Then write some initializer for ES:

#/config/initializers
module Search

  if Rails.env.test?
    ADDRESS = 'http://localhost:9210'
    OPTIONS = {:log_level => :debug}
  else
    ADDRESS = 'http://localhost:9200'
    OPTIONS = {}
  end

  def self.initialize_connection
    server = Stretcher::Server.new(ADDRESS, OPTIONS)
    server.up?
    return server
  end

  def self.connection
    @connection ||= initialize_connection
  end

end

Create a directory for your search related classes in your (rails) project:

mkdir -p $RAILS_APP/app/classes/search

The classes I use to implement all search-related features are the following:

Index                 # Defines an index (like a sql database)
IndexType             # Defines an index type for a model (like a sql table)
Definition            # A class that defines the mapping for a specific model/index
DocumentMapper        # Converts a model to a JSON document
Engine                # Base class for model-specific engines
Query                 # Used to construct an ES query
Highlighter           # Defines which parts of a search result should be highlighted
SearchResultAdapter   # Wrapper around a stretcher result that loads the model instances
FullReindexer         # Reindexes all documents, is run at night or after deploys
DirtyReindexer        # Reindexes changed documents

The Index-class is responsible for defining, creating and deleting the index, as well as indexing records in bulk. Both Index and IndexType are wrappers around the corresponding Stretcher-classes. The mapping from a database record to json is done by the DocumentMapper-class. For each model that has to be indexed in ES there is an corresponding DocumentMapper-class. There can be multiple DocumentMappers for a single model, so a model can be indexed in different ways and into different indices.

# Base class for all index-type-mapping-definitions
class Search::Definition

  def general_type(type)
    { type: type, include_in_all: false }
  end

  def string_type
    general_type('string')
  end

  def integer_type
    general_type('integer')
  end

  def to_hash
    raise NotImplementedError
  end

end

This is straight forward. It declares a to_hash method and some helper methods that can be used in subclasses of this class. The to_hash method is used to return the raw definition hash that can be passed to ES.

An concrete definition would then look like this:

# Defines the mapping for users
class Search::Users::Definition < Search::Definition

  def properties
    {
      id:       integer_type,
      username: string_type
    }
  end

  def to_hash
    {
      settings: { ... },
      mappings: {
        :User => {
          dynamic: 'strict',
          properties: properties
        }
      }
    }
  end

end

This next class is used to define indices. To define an index you just instantiate this class.

class Search::Index

  attr_reader :name, :types

  def initialize(name, types)
    @name   = scoped_name(name)
    @types  = types
    @types.each { |t| t.index = self }
    @index = connection.index(@name)
  end

  # returns merged mappings and settings of index-types stored in this index
  def definition
    @definition ||= types.inject({}) { |hash, type| Definition.merge_hash(hash, type) }
  end

  # delegate create/delete/bulk_index

  # find the type that stores a certain model or a subtype of that model
  def index_type_for(model)
    types.find{ |t| model <= t.model } || raise("No index type found for #{model.name}")
  end

  # group the resources by class so that each can use an
  # appropriate document-mapper to convert the resources
  # to documents and index them.
  def index_resources(resources)
    resource.group_by(&:class).each do |cls, resources|
      mapper  = index_type_for(cls).document_mapper.new
      docs    = mapper.to_documents(resources)
      bulk_index(docs)
    end
  end

private

  # scopes the index name by rails env
  def scoped_name(name)
    [Rails.env, name].compact.join(' ')
  end

end

So an index takes a name and some index types. The index contains the index types. You can query an index for all its contained types. Lets first see the IndexType-Class before we see how to create indices:

class Search::IndexType

  # delegate methods to @type

  def initialize(model, definition, mapper)
    @model = model
    @definition = definition
    @mapper = mapper.new
  end

  # this is require due to the way the indexes and types are defined globally
  def index=(idx)
    @type = idx.type(model.name)
  end
end

# Converts a single or multiple records to hashes that can be indexed in ES
class Search::DocumentMapper
  def to_documents(records)
    records.map{ |r| to_document(r) }
  end

  def to_document(record)
    # returns a hash
  end
end

An IndexType contsists of a model, the mapping definition and a mapper that is used to convert resources to documents. The application-wide indices can the be created like this:

module Search
  # SNIP: see beginning of this blog-post

  def self.main_index
    @main_index ||= Search::Index.new(nil, [
      Search::IndexType.new(User, Search::Users::Definition, Search::Users::Mapper)
    ])
  end

  def self.other_index
    @other_index ||= Search::Index.new('other', [ ... ])
  end
end

Now you can access your indices with Search.main_index and even list all the types/models that it contains with Search.main_index.types.map(&:model)

The Engine-class is responsible for firing off search queries and wrapping the result in a SearchResultAdapter-instance.

class Search::Engine
  # delegate some methods to Stretcher::Server

  def initialize(model_class)
    # store model_class to pass it to SearchResultAdapter
  end

  def search(query_object, options)
    # convert query object to JSON
    # convert optional highlighter in options to JSON
    # execute query with highlighter
    # wrap the result with the result adapter using model_class
  end
end

The QueryObject-class is the base-class for all ES-queries. It is passed to the engines search-method. The engine then invokes the call-method of the query-object to get a hash-representation of the query.

class Search::Query
  def initialize(query_string, options)
  end

  def call
    # returns the query as a hash
  end
end

The SearchResultAdapter class wraps the raw Hashie::Mash object that is returned by stretcher. It mixes in the Enumerable module so that it is traversable and also implements record-loading for the document results:

module Search
  class SearchResultAdapter

    include Enumerable

    attr_reader :options, :records

    def initialize(result, options={})
      @result = result || raise(ArgumentError, 'result was nil')
      @options = options
      load_records! if options[:load]
    end

    def total_count
      @result.total
    end

    def length
      documents.length
    end

    def documents
      @result.documents
    end

    def ids
      documents.map(&:_id)
    end

    def model_class
      @options[:load]
    end

    def loaded?
      @records.present?
    end

    def each(&block)
      to_ary.each(&block)
    end

    def with_hits
      load_records! if !loaded?

      documents.zip(records).map do |doc, record|
        OpenStruct.new(model_class.to_s.downcase => record, document: doc)
      end
    end

    def each_with_hits(&block)
      with_hits.each(&block)
    end

    # require for rails collection-rendering
    def to_ary
      if loaded?
        with_hits
      else
        documents
      end
    end

  private

    def load_records!
      records = model_class.all(id: ids)

      mapping = records.inject({}) do |hash, record|
        hash.merge(record.id => record)
      end

      @records = documents.map do |doc|
        mapping[doc._id.to_i]
      end
    end

  end
end

Tips

You wonder why your document was not found by your query? Use the explain-api of ES. You pass in a document-id and the query and get an explanation of why the document was found or why it wasn’t found.

For simple searches use the match_query with a cutoff_frequency to treat stopwords correctly and with phrase_prefix to find partial matches. The query is very robust agains malformed input and does not throw exceptions like the query_string_query. Use the simple_query_string_query if you need simple search operator support. It is still very robust like the match_query. You can actually specify which operators should be allowed and which operators shouldn’t.