ELK Stack Certification Training (3 Blogs)

Elasticsearch Tutorial – Power Up Your Searches

Last updated on May 22,2019 7K Views
Research Analyst at Edureka. A techno freak who likes to explore different... Research Analyst at Edureka. A techno freak who likes to explore different technologies. Likes to follow the technology trends in market and write about...
3 / 3 Blog from Introduction to ELK Stack

MI-new-launch

myMock Interview Service for Real Tech Jobs

myMock-mobile-banner-bg

myMock Interview Service for Real Tech Jobs

  • Mock interview in latest tech domains i.e JAVA, AI, DEVOPS,etc
  • Get interviewed by leading tech experts
  • Real time assessment report and video recording

In my previous blog on What is Elasticsearch, I have introduced Elasticsearch, talked about its advantages, and did the installation on windows. I have also discussed the basic concepts and different API conventions present in Elasticsearch. But let me tell you something interesting, whatever I have discussed in the previous blog, is just the tip of the iceberg. In this Elasticsearch tutorial blog, I will introduce all the features which make the Elasticsearch fastest and most popular among its competitors. Also, I will introduce you to the different API’s present in Elasticsearch and how you can perform different searches using them through this Elasticsearch tutorial blog.

Below are the topics that I will be discussing this Elasticsearch tutorial blog:

So, let’s get started with the very first topic of this Elasticsearch tutorial blog.

Elasticsearch APIs – Elasticsearch Tutorial

This section of Elasticsearch tutorial blog talks about various kinds of API’s supported by Elasticsearch. Let’s understand each of them in detail.

Document API

Elasticsearch provides both single document APIs and multi-document APIs.

  1. SINGLE DOCUMENT API
    • Index API
    • Get API
    • Update API 
    • Delete API
  2. MULTI-DOCUMENT API
    • Multi Get API
    • Bulk API
    • Delete By Query API
    • Update By Query API
    • Reindex API

Now that you know about different types of Document APIs, let’s try to implement CRUD operations to them.

Index API

The index API is responsible for adding and updating a typed JSON document in a specific index and then making it searchable. The following example inserts the JSON document into the “playlist” index, under a type called “kpop” with an id of 1:

PUT /playlist/kpop/1
{
 "title" : "Beautiful Life",
 "artist" : "Crush",
 "album" : "Goblin",
 "year" : 2017
}

GET API

The get API is responsible for fetching a typed JSON document from the index based on its unique id. The following example gets a JSON document from a “playlist” index, under a type called “kpop”, with id valued 2:

GET /playlist/kpop/2

UPDATE API

The updated API is responsible for updating a document based on a script provided. The operation fetches the document from the index, runs the script and then indexes back the result. To make sure no updates happen during the “get” and “reindex”, it uses versioning. The following example updates a JSON document from a “playlist” index, under a type called “kpop”, by adding a new field called “time”:

PUT /playlist/kpop/1
{
 "title" : "Beautiful Life",
 "artist" : "Crush",
 "album" : "Goblin",
 "year" : 2017,
 "time" : 5
}

DELETE API

The delete API is responsible for deleting a typed JSON document from a specific index based on its unique id. The following example gets a JSON document from a “playlist” index, under a type called “kpop”, with id valued 3:

DELETE /playlist/kpop/3

Search API

The search API is responsible for searching the content within the Elasticsearch. You can search either by sending a get request with a query having a string parameter or a query in the message body of a post request. Generally, the search APIs are multi-index or multi-type.

There are various parameters which can be passed in a search operation having Uniform Resource Identifier (URI):

ParameterDescription
qThis parameter specifies query string
lenientBy setting this parameter’s value to true, format based errors can be ignored
fieldsThis parameter fetches response from selective fields
sortThis parameter sorts the result
timeoutThis parameter helps in restricting the search time
terminate_afterThis parameter restricts the response to a specific number of documents in each shard
fromThis parameter specifies the start index
sizeThis parameter specifies the number of hits to return

Now that you are familiar with the search parameter, let’s see how you can perform the search through multiple indexes and types.

  1. Multi-Index

    In Elasticsearch, you can search for the documents present in all the indices or in some particular indices. The following example searches for JSON documents from all the indexes, where the year is 2014:

    GET playlist,my_playlist/_search?q=2014
    {
     "title" : "MAMACITA",
     "artist" : "SuJu",
     "album" : "MAMACITA",
     "year" : 2014,
     "time" : 4
    }
  2. Multi-Type

    You can also search all the documents in a particular index across all types or in some specified type. The following example searches for JSON documents from a “playlist” index, under all types, where the year is 2017:

    GET playlist/_search?q=2017

The next section of Elasticsearch tutorial will talk about the aggregations and its types supported by Elasticsearch.

Aggregations

In Elasticsearch, aggregations framework is responsible for providing the aggregated data based on a search query. Aggregations can be composed together in order to build complex summaries of the data. For a better understanding, consider it as a unit-of-work. It develops analytic information over a set of documents that are available in Elasticsearch. Various types of aggregations are available, each of them having its own purpose and output. For simplification, they are generalized to 4 major families:

  1. Bucketing

    Here each bucket is associated with a key and a document. Whenever the aggregation is executed, all the buckets criteria are evaluated on every document. Each time a criterion matches, the document is considered to “fall in” the relevant bucket.

  2. Metric

    Metrics are the aggregations which are responsible for keeping a track and computing the metrics over a set of documents.

  3. Matrix

    Matrix are the aggregations which are responsible for operating on multiple fields. They produce a matrix result out of the values extracted from the requested document fields. Matrix does not support scripting.

  4. Pipeline

    Pipeline are the aggregations which are responsible for aggregating the output of other aggregations and their associated metrics together.

The following example shows how a basic aggregation is structured:

"aggregations" : {
 "<aggregation_name>" : {
 "<aggregation_type>" : {
 <aggregation_body>
 }
 [,"meta" : { [<meta_data_body>] } ]?
 [,"aggregations" : { [<sub_aggregation>]+ } ]?
 }
 [,"<aggregation_name_2>" : { ... } ]*
}

Index API

In Elasticsearch, the index APIs or the indices APIs are responsible for managing individual indices, index settings, aliases, mappings, and index templates. Following are some of the operations that we can perform on Index APIs:

  • Create Index

    The create index API is responsible for instantiating an index. Whenever a user passes a JSON object, an index is created automatically. The following example creates one index called “courses” with some settings: 

    PUT courses
    {
     "settings" : {
     "index" : {
     "number_of_shards" : 3, 
     "number_of_replicas" : 2 
     }
     }
    }
  • Get Index

    The get API is responsible for fetching the information about the index. By sending the get request to one or more indices, you can call it. The following example retrieves index called “courses”:

    GET /courses
  • Delete Index

    The delete index API is responsible for deleting an existing index. The following example deletes an index called “courses”:

    DELETE /courses
  • Open/ Close Index API

    The open and close index APIs are responsible for closing an index and then opening it. A closed index is blocked for any read/ write operations. But you can still open it, which will then go through the normal recovery process. The following example closes and opens an index called “courses”:

    POST /courses/_close
    
    POST /courses/_open
    
    
  • Index Aliases

    APIs in the Elasticsearch can accept an index name when working against a specific index when required. The index aliases API permits aliasing an index with a name, with all APIs automatically converting the alias name to the actual index name. The following example adds and removes an index alias:

    POST /_aliases
    {
     "actions" : [
     { "add" : { "index" : "courses", "alias" : "subjects" } }
     ]
    }
    
    POST /_aliases
    {
     "actions" : [
     { "remove" : { "index" : "courses", "alias" : "subjects" } }
     ]
    }
  • Analyse

    In Elasticsearch, it performs the analysis process on a text and returns the tokens breakdown of the text. You can perform analysis without specifying any index. The following example performs a simple analysis:

    GET _analyze
    {
     "analyzer" : "standard",
     "text" : "this is a demo"
    }
  • Index Template

    Index templates are responsible for defining the templates that will be automatically applied when new indices are created. The following example shows a template format:

    PUT _template/template_1
    {
     "template": "te*",
     "settings": {
     "number_of_shards": 1
     },
     "mappings": {
     "type1": {
     "_source": {
     "enabled": false
     },
     "properties": {
     "host_name": {
     "type": "keyword"
     },
     "created_at": {
     "type": "date",
     "format": "EEE MMM dd HH:mm:ss Z YYYY"
     }
     }
     }
     }
    }
  • Index Stats

    In Elasticsearch, indices level stats is responsible for providing statistics on different operations which are happening on an index. The API generally provides the statistics on the index level. The following example shows an index level stats for all indices and a specific index stats as well:

    GET /_stats
    
    GET /playlist/_stats
    
    
  • Flush

    The flush API is responsible for flushing one or more indices through an API. Basically, its a process of releasing memory from the index by pushing the data to the index storage and clearing the internal transaction log. The following example shows an index being flushed:

    POST playlist/_flush
  • Refresh

    The refresh API is responsible for refreshing one or more index explicitly. This makes all operations performed since the last refresh available for the search. The following example shows an index being refreshed:

    POST /courses/_refresh
    
    POST /playlist,courses/_refresh
    
    

Cluster API

The Cluster API in Elasticsearch is responsible for fetching information about a cluster and its nodes and making further changes in them. 

  • Cluster Health

    This API is responsible for retrieving cluster’s health status by appending health keyword. The following example shows cluster health:

    GET _cluster/health
  • Cluster State

    This Cluster State API is responsible for retrieving the state information about a cluster by appending ‘state’ keyword URL. Various information like version, master node, other nodes, routing table, metadata, and blocks are contained by the state. The following example shows cluster state:

    GET /_cluster/state
  • Cluster Stats

    The Cluster Stats API is responsible for retrieving statistics from a cluster-wide perspective. It returns a basic index metrics and information about the current node which forms the cluster. The following example shows cluster stats:

    GET /_cluster/stats
  • Pending Cluster Tasks

    This API is responsible for monitoring pending tasks in any cluster. Tasks may include create an index, update, mapping, allocate shard, fail shard etc. The following example shows cluster stats:

    GET /_cluster/pending_tasks
  • Node Stats

    This cluster node stats API is responsible for retrieving one or more of the cluster nodes statistics. The following example shows cluster nodes stats:

    GET /_nodes/stats
  • Nodes hot_thread

    This API is responsible for retrieving the current hot threads on each of the node in the cluster. The following example shows cluster’s hot threads:

    GET /_nodes/hot_threads

Next section of this Elasticsearch Tutorial blog talks about the Query DSL provided by Elasticsearch.

Query DSL – Elasticsearch Tutorial

Elasticsearch provides a full Query DSL which is based on JSON and is responsible for defining queries. The Query DSL consisting of two types of clauses:

  1. Leaf Query Clauses

    In Elasticsearch, the leaf query clauses search for a particular value in a particular field like match, term or range queries. These queries can be used by themselves as well.

  2. Compound Query Clauses

    In Elasticsearch, the compound query clauses wrap up other leaf or compound queries. These queries are used for combining multiple queries in a logical fashion or for altering their behavior.

  • Match All Query

    This is the most simple query, which matches all the documents and returns a score of 1.0 for every object. The following example shows the match query:

    GET /_search
    {
     "query": {
     "match_all": {}
     }
    }
  • Full Text Queries

    These queries are used for running full-text queries on full text fields. These are basically high-level queries which understand how a field being queried is analyzed. Then it applies each field’s analyzer to the query a  string before executing. The following example shows a simple full-text query:

    POST /playlist*/_search
    {
    "query":{
    "match" : {
    "title":"Beautiful Life"
    }
    }
    }

    Some of the full-text queries are:

    QueryDescription
    matchThis query is used for performing full-text queries.
    match_phraseThis query is used for matching exact phrases or word proximity matches.
    match_phrase_prefixThis query is used for wildcard search on the final word.
    multi_matchThis query is used for matching the multi-field versions.
    common_termsThis query is used for providing more preference to uncommon words.
    query_stringThis query is used for specifying AND|OR|NOT conditions and multi-field search within a single query string.
    simple_query_stringThis query is a robust version of query_string.
  • Term Level Queries

    Rather than a full-text field, these types of queries are used for structured data like numbers, dates, and enums. You can also craft low-level queries using them. The following example shows term level query:

    POST /playlist/_search
    {
     "query":{
     "term":{"title":"Silence"}
     }
    }

    Some of the full-text queries are:

    QueryDescription
    termThis query is used for finding the documents containing the exact term specified.
    termsThis query is used for finding the documents which contain any of the exact terms specified.
    rangeThis query is used for finding the documents where the range specified must be contained in the specified fields.
    exitsThis query is used for finding the documents where any non-null value is contained by the specified field.
    prefixThis query is used for finding the documents containing the terms beginning with the exact prefix specified.
    wildcardThis query is used for finding the documents containing the terms matching the pattern specified.
    regexpThis query is used for finding the documents containing the terms matching the regular expression.
    fuzzyThis query is used for finding the documents containing the terms fuzzily similar to the specified term. 
    typeThis query is used for finding the documents of the specified type.
    idsThis query is used for finding the documents with the specified type and IDs.
  • Compound Queries

    The compound queries in Elasticsearch, are responsible for wrapping up the other compound or leaf queries together. This is done either to combine their results and scores, to change their behavior or to switch from query to filter contextThe following example shows a simple full-text query:

    POST /playlist/_search
    {
     "query": {
     "match": {
     "title": "Lucifer"
     }
     }
    }

    Some of the full-text queries are:

    QueryDescription
    constant_scoreThis query is used for wrapping up another query and executing it in filter context. 
    boolThis query is used for combining multiple leaf or compound query clauses, by default.
    dis_maxThis query accepts multiple queries and then returns the documents matching any of the query clauses.
    function_scoreThis query is used for modifying the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.
    boostingThis query is used for returning documents matching a positive query, but reducing the score of documents matching a negative query.
    indicesThis query is used for executing one query for the specified indices and another for other indices.
  • Joining Queries

    In a distributed system like Elasticsearch, performing full SQL-style joins is very expensive. Thus, Elasticsearch provides two forms of join which are designed to scale horizontally.

    1. nested query

      This query is used for the documents containing nested type fields. Using this query, you can query each object as an independent document.

    2. has_child & has_parent queries

      This query is used to retrieve the parent-child relationship between two document types within a single index. The has_child query returns the matching parent documents, while the has_parent query returns the matching child documents.

The following example shows a simple join query:

POST /my_playlist/_search
{
 "query":
 {
 "has_child" : {
 "type" : "kpop", "query" : {
 "match" : {
 "artist" : "EXO"
 }
 }
 }
 }
}
  • Geo Queries

    In Elasticsearch, two types of geo data are supported:

  1. geo_point: These are the fields which support lat/ lon pairs
  2. geo_shape: These are the fields which support points, lines, circles, polygons, multi-polygons etc.
{
 "query":{
 "filtered":{
 "filter":{
 "geo_distance":{
 "distance":"150km",
 "location":[42.056098, 86.674299]
 }
 }
 }
 }
}

Next part of this Elasticsearch Tutorial blog talks about different mappings available in Elasticsearch.

Mapping – Elasticsearch Tutorial

In Elasticsearch, mapping is responsible for defining how a document and its fields are stored and indexed. The following example shows a simple mapping query:

POST /playlist
POST /playlist
{
 "mappings": {
 "report": {
 "_all": {
 "enabled": true
 },
 "properties":{
 "title":{ "type":"string"}, "artist":{ "type":"string"},
 "album":{ "type":"string"}, "year":{ "type":"integer"}
 }
 }
}
  • Field Types

    Elasticsearch supports various data types for the fields in a document like:

    DatatypesDescription
    CoreThese are the basic data types that are supported by almost all the systems. The basic datatypes are integer, long, double, short, byte, double, float, string, date, Boolean and binary.
    ComplexThese are the data types that are the combination of core data types. For example array, JSON object and nested data type.
    GeoThese are the data types which are used for defining geographic properties. 
    SpecializedThese are the data types that are used for special purposes. 
  •  Mapping Types

    In Elasticsearch, each index has one or more mapping types. These mapping types are used to divide the documents of an index into logical groups/ units. Mapping can be differentiated on the basis of the following parameters:

    1. Meta-Fields: The meta-fields are responsible for customizing how a document’s associated metadata is treated. Meta-fields in Elasticsearch includes the document’s _index, _type,_id and _source fields.
    2. Fields or Properties: In Elasticsearch, each mapping type has a list of fields or properties which are specific it only. In an index, fields with the same name but in different mapping types should have the same mapping.
    3. Dynamic Mapping: Elasticsearch allows the automatic creation of mapping called dynamic mapping. Using dynamic mapping a user can post data to any undefined mapping.

Following section of this Elasticsearch Tutorial blog will introduce you to the analysis processes in Elasticsearch.

Analysis – Elasticsearch Tutorial

In Elasticsearch, analysis is the process of conversion of text into tokens or terms. These tokens are then added to the inverted index for the searching purpose. This process of analysis is performed by an analyzer. An analyzer can be of two types:

  1.  Built-in analyzer 
  2.  custom analyzer defined per index.

Thus, if no analyzer is defined, then by default the built-in analyzers will perform the analysis. The following example shows a simple analysis query:

PUT cities
{
 "mappings": {
 "metropolitan": {
 "properties": {
 "title": {
 "type": "text",
 "analyzer": "standard"
 }
 }
 }
 }
}
  • Analyzers

    In Elasticsearch, a tokenizer and optional token filters make up an analyzer. Inside the analysis module, these analyzers are registered with logical names. Using names, the analyzers can be referenced either in mapping definitions or in some APIs. Following are some of the default analyzers −

    AnalyzersDescription
    StandardUsing this analyzer you can set stopwords and max_token_length. 
    SimpleThe lowercase tokenizer composes this analyzer.
    WhitespaceThe whitespace tokenizer composes this analyzer.
    StopUsing this analyzer, stopwords, and stopwords_path can be configured. 
    KeywordUsing this analyzer, an entire stream can be tokenized into a single token. 
    PatternUsing this analyzer you can configure regular expressions like lowercase, pattern, flags, stopwords etc.
    LanguageUsing this analyzer you can analyze different languages like Hindi, Arabic, Dutch etc.
    SnowballThis analyzer utilizes a standard tokenizer, with standard filter, lowercase filter, stop filter, and snowball filter.
    CustomUsing this analyzer, a customized analyzer along with a tokenizer with optional token filters and char filters is created.
  • Tokenizer

    In Elasticsearch, tokenizers are responsible for generating tokens from a text. Using whitespace or other punctuations,  the text can be broken down into tokens. Elasticsearch provides a list of built-in tokenizers, which are used in a custom analyzer. Following are the some of the tokenizers used in Elasticsearch:

    TokenizerDescription
    StandardDeveloped on grammar-based tokenizer for which max_token_length can also be configured.
    Edge NGramDifferent configurations can be set for this tokenizer like min_gram, max_gram, token_chars.
    KeywordThis tokenizer is responsible for generating the entire input as an output and setting the buffer_size.
    LetterThis tokenizer is responsible for capturing the whole word unless a non-letter is encountered.
    LowercaseThis tokenizer works similar to the letter tokenizer. Once the tokens are created, it changes them into lower case.
    NGramYou can set min_gram, max_gram, and token_chars etc., for this tokenizer. 
    WhitespaceOn the basis of whitespaces, this tokenizer divides the text.
    PatternThis tokenizer uses the regular expressions as a token separator. 
    UAX Email URLThis works similar to the standard tokenizer but refers email and URL as a single token.
    Path HierarchyThis tokenizer is responsible for generating all the possible paths present inside the input directory path. 
    ClassicThis tokenizer uses grammar based tokens for its functioning.
    ThaiThis is used for the Thai language which uses built-in Thai segmentation algorithm for processing.
  • Token Filters

    In Elasticsearch, tokenizers send input to the token filters. These token filters can further modify, delete or add text into that input. 

  • Character Filters

    Before the tokenizers, the text is processed by the character filters. Character filters search for the special characters or HTML tags or specified patterns. After which it either deletes them or changes them to appropriate words.

Next part of this Elasticsearch Tutorial blog talks about different modules provided by Elasticsearch.

Modules – Elasticsearch Tutorial

Elasticsearch is composed of different modules, which are responsible for various aspects of its functionality. Each of these modules can have any one of the following settings:

  1. static – These settings must be done at the node level and must be set on every relevant node.
  2. dynamic – These settings can be updated dynamically on a live cluster.
ModulesDescription
Cluster-level routing and shard allocationResponsible for the settings which control where, when, and how shards are allocated to nodes.
DiscoveryResponsible for discovering a cluster and maintaining the state of all the nodes in it.
GatewayResponsible for maintaining the cluster state and the shard data across full cluster during restarts.
HTTPResponsible for managing the communication between HTTP client and Elasticsearch APIs.
IndicesResponsible for maintaining the settings that are set globally for every index.
NetworkResponsible for controlling default network settings.
Node ClientResponsible for starting a node in a cluster.
PainlessDefault scripting language responsible for safe use of inline and stored scripts.
PluginsResponsible for enhancing the basic elasticsearch functionality in a custom manner.
ScriptingEnables user to use scripts to evaluate custom expressions.
Snapshot/ RestoreResponsible for creating snapshots of individual indices or an entire cluster into a remote repository.
Thread poolsResponsible for holding several thread pools in order to improve how threads memory consumption are managed within a node.
TransportResponsible for configuring the transport networking layer.
Tribe NodesResponsible for joining one or more clusters and act as a federated client across them.
Cross-Cluster SearchResponsible for executing the search requests across more than one cluster without joining them and act as a federated client across them.

This brings us to the end of the blog on Elasticsearch tutorial. I hope through this blog on Elasticsearch tutorial I was able to clearly explain different Elasticsearch APIs and how to use them. 

Elasticsearch Tutorial | Getting Started with Elasticsearch 

If you want to get trained in Elasticsearch and wish to search and analyze large datasets with ease, then check out the ELK Stack Training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe.

Got a question for us? Please mention it in the comments section and we will get back to you.

Comments
0 Comments

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.