Elastic Search uses inverted Index data structure
Inverted Index Data Structure representation of sample documents
Doc1: This is first sample document
Doc2: Second document for the Inverted Index
Doc3: Final sample document
Inverted Index of the above three documents
dictionary referred Documents
term frequency
this 1 1
is 1 1
first 1 1
sample 2 1, 3
document 3 1,2,3
second 1 2
for 1 2
the 1 2
inverted 1 2
index 1 2
final 1 3
Index :
lists terms in specific document
Some Advantages of inverted Index :
getting the list of all document that contains the given term or terms
AND and OR of the terms
prefix based searching
suffix based searching ( reverse the terms , reverse the search term , search by prefix
example: original term : fantastic , search suffix : astic then
reverse term : citsatnaf, reverse search suffix : citsa , now do prefix based search , I.e all terms ( reverse terms ) started with reverse search suffix )
finding substrings ( by splitting the terms in n-grams and search for strings )
Numbers searching e.g. between 100 to 199 ( Lucene stores 123 as "1"-hundreds,"2"-tens and "3", so searching for 100 to 199 will get all terms with prefix "1"-hundreds and it will avoid getting others numbers like 1234 )
References :
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up