match
单个词查询在Elasticsearch
中进行全文搜索时,如果要给字段指定查询的特定字词,可以使用match
类型的查询。范例如下:
# 数据准备
POST /myindex-match-search/_bulk
{"index": {"_id": 1}}
{"title": "The flower and the dog"}
{"index": {"_id": 2}}
{"title": "The flower and the dog are beautiful"}
{"index": {"_id": 3}}
{"title": "the dog are beautiful"}
# 使用match类型的查询
GET /myindex-match-search/_search
{
"query": {
"match": {
"title": "flower"
}
}
}
以上语句执行match
查询的步骤如下:
使用match
查询时,返回结果中文档的评分时和该文档中字段的内容长度有关的,即字段内容越短,评分就越高,执行结果如下所示:
可以看到,_id
等于1和_id
等于2的文档都符合查询要求,但是_id
等于的文档内容短,所以评分较高。
match
多个词查询范例如下:
GET /myindex-match-search/_search
{
"query": {
"match": {
"title": "flower dog"
}
}
}
因为match
查询必须查找两个单词(flower
和dog
),它在内部实际上先执行两次term
查询,然后将两次查询的结果合并起来作为最终的查询结果。为了做到这点,它将两个term
查询嵌入到一个布尔查询中,范例如下:
GET /myindex-match-search/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"title": "flower"
}
},
{
"term": {
"title": "dog"
}
}
]
}
}
}
上面两个查询语句返回的结果是一致的,执行结果如下所示:
可以看到,_id
等于1和_id
等于2的文档数据都匹配到了这两个单词,而_id
等于1的文档内容短,所以分数较高,_id
等于3的文档内容只是匹配到了一个单词,所以分数低。
match
的匹配精度根据前面范例中的索引数据,如果用户给定3个查询单词,想查找只包含其中两个的文档,那么我们将逻辑运算符设置成and
或者or
都不合适。而match
查询支持minimum_should_match
(最小匹配参数)选项,我们可以将其设置为某个具体数字。范例如下:
GET /myindex-match-search/_search
{
"query": {
"match": {
"title": {
"query": "flower dog the",
"minimum_should_match": 3
}
}
}
}
执行结果如下所示:
可以看到,返回结果符合我们查询的要求,文档内容必须满足匹配到3个单词。需要注意的是,实际应用中更常用的做法是将其设置为一个百分数,因为我们无法控制用户查询时输入的单词数量。范例如下:
GET /myindex-match-search/_search
{
"query": {
"match": {
"title": {
"query": "flower dog the",
"minimum_should_match": "80%"
}
}
}
}
执行结果如下所示:
minumum_should_match
参数的值类型如下表所示:
Type | Example | Description |
---|---|---|
Integer | 3 | Indicates a fixed value regardless of the number of optional clauses. |
Negative integer | -2 | Indicates that the total number of optional clauses, minus this number should be mandatory. |
Percentage | 75% | Indicates that this percent of the total number of optional clauses are necessary. The number computed from the percentage is rounded down and used as the minimum. |
Negative percentage | -25% | Indicates that this percent of the total number of optional clauses can be missing. The number computed from the percentage is rounded down, before being subtracted from the total to determine the minimum. |
Combination | 3<90% | A positive integer, followed by the less-than symbol, followed by any of the previously mentioned specifiers is a conditional specification. It indicates that if the number of optional clauses is equal to (or less than) the integer, they are all required, but if it’s greater than the integer, the specification applies. In this example: if there are 1 to 3 clauses they are all required, but for 4 or more clauses only 90% are required. |
Multiple combinations | 2<-25% 9<-3 | Multiple conditional specifications can be separated by spaces, each one only being valid for numbers greater than the one before it. In this example: if there are 1 or 2 clauses both are required, if there are 3-9 clauses all but 25% are required, and if there are more than 9 clauses, all but three are required. |