Thursday 22 June 2023

duckdb query takes too long to process and return inside Flask application

I have a Flask app and want to use duckdb as a database for several endpoints. My idea is to query the data and return it as a .parquet file. When I test my database with a simple Python script outside of the Flask app, it can query the data and save it as a .parquet in under a second. When I bring that same methodology to the Flask app, it still successfully queries the data and returns it as a .parquet file but it takes roughly 45 seconds. Other endpoints that return a .parquet file -- ones that are pre-staged and do not need to be queried -- can do so in just a second or two. So the issue, apparently, is incorporating duckdb inside my Flask application. Here is a sample boiler plate of what I have:

@test.route('/duckdb', methods = ['GET'])
def duckdb_test():

    con = duckdb.connect(database = '~/flask_db/test.db')

    # get tempfile .parquet
    tmp = tempfile.NamedTemporaryFile(suffix = '.parquet', mode = 'w+b', delete = False)

    # get data
    df = con.sql("SELECT * FROM tbl WHERE name = 'John'").to_df()

    # write to temporary .parquet
    df.to_parquet(tmp.name, engine='pyarrow', index=False)

    return send_file(tmp.name, mimetype='application/octet-stream', as_attachment=True, download_name="request.parquet")

I want to save it as a temp file. Not really sure what's wrong here. Again, it does work, but it just takes way, way too much time. The data being returned is about 12,000 rows in a ~5.5M row database -- but given that it works fairly quickly outside of the Flask app, on the same VM, the size itself shouldn't be an issue.



from duckdb query takes too long to process and return inside Flask application

No comments:

Post a Comment