Saturday, 5 October 2019

Caching intermediate table with psycopg2

Take this block of psycopg2 calls which involve two SELECTs:

import psycopg2

with psycopg2.connect("dbname=test user=postgres") as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT a, b, c FROM table WHERE a > 5 and d < 10;")
        r1 = cur.fetchall()
        cur.execute("SELECT a, b, c FROM table WHERE a > 5 and d > 20;")
        r2 = cur.fetchall()

This is a bit inefficient; the potentially O(N) check WHERE a > 5 is performed twice when it seems that it could be performed just once, with subqueries performed on that intermediate result.

What's the canonical way to do this via the psycopg2 API?

Something like:

with psycopg2.connect("dbname=test user=postgres") as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT a, b, c FROM table WHERE a > 5")
        # ...
        cur.execute("SELECT a, b, c FROM temp_table WHERE d < 10;")
        r1 = cur.fetchall()
        cur.execute("SELECT a, b, c FROM temp_table WHERE d > 20;")
        r2 = cur.fetchall()

Is the best solution to use a literal "CREATE TEMP TABLE..."?

I'm coming to this from a Django ORM perspective, where subsequent evaluations of the QuerySet reuse the cached results. Is there anything similar offered by the psycopg2 API?



from Caching intermediate table with psycopg2

No comments:

Post a Comment