Friday, 26 February 2021

Scroll in python Elasticsearch not working

I tried to scroll all documents with python when I query Elasticsearch so I can get over 10K results:

from elasticsearch import Elasticsearch
es = Elasticsearch(ADDRESS, port=PORT)


result = es.search(
    index="INDEX",
    body=es_query,
    size=10000,
    scroll="3m")


scroll_id = result['_scroll_id']
scroll_size = result["hits"]["total"]
counter = 0
print('total items= ' + scroll_size)

while(scroll_size > 0):
    counter += scroll_size
   

    result = es.scroll(scroll_id=scroll_id, scroll="1s")
    scroll_id = result['_scroll_id']
    scroll_size = len(result['hits']['hits'])
    
print('found = ' +counter)

The problem is that sometimes the counter (the sum of the results at the end of the program) is smaller than result["hits"]["total"]. Why is that? Why does scroll not iterate over all the results?

ElasticSearch version : 5.6
lucence version :6.6


from Scroll in python Elasticsearch not working

No comments:

Post a Comment