Wednesday, 29 August 2018

running mongo queries against data in memory

I have a mongodb collection against which I need to run many count operations (each with a different query) every hour. When I first set this up, the collection was small, and these count operations ran in approx one minute, which was acceptable. Now they take approx 55 minutes, so they're running nearly continuously.

The query associated with each count operation is rather involved, and I don't think there's a way to get them all to run with indices (i.e. as COUNT_SCAN operations).

The only feasible solution I've come up with is to:

  • Run a full collection scan every hour, pulling every document out of the db
  • Once each document is in memory, run all of the count operations against it myself

Without my solution the server is running dozens and dozens of full collection scans each hour. With my solution the server is only running one. This has led me to a strange place where I need to take my complex queries and re-implement them myself so I can come up with my own counts every hour.

So my question is whether there's any support from mongo drivers (pymongo in my case, but I'm curious in general) in interpreting query documents but running them locally against data in memory, not against data on the mongodb server.

Initially this felt like an odd request, but there's actually quite a few places where this approach would probably greatly lessen the load on the database in my particular use case. So I wonder if it comes up from time to time in other production deployments.



from running mongo queries against data in memory

No comments:

Post a Comment