Thursday, June 21, 2018

Elastic Search - blog 1 ( inverted Index )



Elastic Search uses inverted Index data structure

Inverted Index Data Structure representation of sample documents
Doc1: This is first sample document
Doc2: Second document for the Inverted Index
Doc3: Final sample document

Inverted Index of the above three documents

           dictionary                                     referred Documents
      term         frequency
   
       this          1                                     1
       is               1                                    1
      first           1                                      1
      sample      2                                      1, 3
      document  3                                     1,2,3
      second       1                                     2
      for              1                                    2
      the              1                                    2
      inverted      1                                   2
       index          1                                   2
      final          1                                   3

Index :
         lists terms in specific document

Some Advantages of inverted Index :
        getting the list of all document that contains the given term or terms
        AND and OR of the terms
       prefix based searching
       suffix based searching ( reverse the terms , reverse the search term , search by prefix
                                            example: original term : fantastic , search suffix : astic then
                                                            reverse term : citsatnaf, reverse search suffix : citsa , now do prefix based search , I.e all terms ( reverse terms ) started with reverse search suffix )

        finding substrings ( by splitting the terms in n-grams and search for strings )
        Numbers searching e.g. between 100 to 199 ( Lucene stores 123 as "1"-hundreds,"2"-tens and "3", so searching for 100 to 199 will get all terms with prefix "1"-hundreds and it will avoid getting others numbers like 1234 )

   
   


References :
   https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up