boost exact phrase search results ranking in elasticsearch


Elasticsearch use the DSL format to create query.

One easy search is to use multi_match by passing the query key word, and give the fields to search for.

Here is an example:


from elasticsearch import Elasticsearch
es = Elasticsearch()
indexname = 'myindex'

keyword = 'test'
dsl={
"query": {
"multi_match" : {
"query": keyword,
"fields": [ "content"]
}

}

}



dsize=10
result_r = es.search(index=indexname, body=dsl,size=dsize)

The problem with the above query is that:if your query keyword is a phrase, you might find many results that have the exact match are ranked lower.

To solve this problem, you might want to try this new dsl format:

dsl={

"query": {
"bool": {
"must": [
{
"multi_match": {
"query": keyword,
"fields": [
"content1","content2"
]
}
}
],
"should": [
{
"multi_match": {
"query": keyword,
"fields": [
"content1","content2"
],
"type": "phrase",
"boost": 10
}
},
{
"multi_match": {
"query": keyword,
"fields": [
"content1","content2"
],
"operator": "and",
"boost": 4
}
}
]
}
}

}

one thing to explain in the above the query, is the ‘operator’; According to the official elasticsearch webpage:

operator and minimum_should_match
The best_fields and most_fields types are field-centric — they generate a match query per field. This means that the operator and minimum_should_match parameters are applied to each field individually, which is probably not what you want.

Take this query for example:

GET /_search
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "best_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}

This query is executed as:

(+first_name:will +first_name:smith)
| (+last_name:will +last_name:smith)

In other words, all terms must be present in a single field for a document to match.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC