Saturday, 16 January 2021

Django ORM annotate performance

I'm using Django and Django REST Framework at work and we've been having some performance issues with couple endpoints lately. We started by making sure that the SQL part is optimized, no unnesecary N+1 queries, indexes where possible, etc.

Looking at the database part itself, it seems to be very fast (3 SQL queries total, under a second), even with larger datasets, but the API endpoint still took >5 seconds to return. I started profiling the Python code using couple different tools and the majority of time is always spent inside the annotate and set_group_by functions in Django.

debug_toolbar_screenshot

I tried Googling about annotate and performance, looking at Django docs, but there's no mention of it being a 'costly' operation, especially when used with the F function.

The annotate part of the code looks something like this:

qs = qs.annotate(
    foo_name=models.F("foo__core__name"),
    foo_birth_date=models.F("foo__core__birth_date"),
    bar_name=models.F("bar__core__name"),
    spam_id=models.F("baz__spam_id"),
    spam_name=models.F("baz__spam__core__name"),
    spam_start_date=models.F("baz__spam__core__start_date"),
    eggs_id=models.F("baz__spam__core___eggs_id"),
    eggs_name=models.F("baz__spam__eggs__core___name"),
)

qs = (
    qs.order_by("foo_id", "eggs_id", "-spam_start_date", "bar_name")
    .values(
        "foo_name",
        "foo_birth_date",
        "bar_name",
        "spam_id",
        "spam_name",
        "eggs_id",
        "eggs_name",
    )
    .distinct()
)

The query is quite big, spans multiple relatonships, so I was sure that the problem is database related, but it doesn't seem to be. All the select_related and prefetch_related are there, indexes too.

I tried rewriting the code without annotate at all, but it didn't seem to help. I started wondering wether the time spent in annotate is really a red herring and it's only how the profiler sees it, but all profilers I tried showed the same thing.

While I feel like I know Django quite well and had success optimizing API endoints before, I'm not sure what 'thread' to pull in this case. I tried looking at Django internals, especially around annotate and set_group_by but couldn't pin point the time spent there. My last ditch effort will be trying to rewrite those couple endpoints with raw SQL, but I'd very much like to avoid that.

All help will be much appriciated : )



from Django ORM annotate performance

No comments:

Post a Comment