Hemant Vishwakarma: Parallelising a select query in Python's sqlite does not seem to improve performance

Friday, 24 February 2023

Parallelising a select query in Python's sqlite does not seem to improve performance

After reading this post, I have been trying to compare parallelization with non parallelization in sqlite. I am using Python's sqlite3 library to create a database containing a table called randNums which contains a two columns, an id and a val. The val is a random number between 0 and 1. I then select all rows with val greater than a half. I have done this twice so as to compare run times of the parallelized version and unparallelized version, however they take the same amount of time. I'd like to know if I am using the keyword 'PARALLEL' incorrectly, or if I need to first enable parallelization with a python command.

Finally, I'd also like to know how parallelization differs for different databases, for example, mysql and postgresql.

import sqlite3
from time import time

con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()

cmd  = ['PRAGMA table_info(randNums);']
cmd += ['SELECT count (*) from randNums where val>.5;']
cmd += ['SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;']

for c in cmd:
    t = time()
    cursorObj.execute(c)
    print('command: %s' % c)
    print(cursorObj.fetchall())
    print('run time in seconds: %.3f\n\n' % (time()-t))

Running a Python script containing the above code results in the following output:

command: PRAGMA table_info(randNums);
[(0, 'id', 'INTEGER', 0, None, 1), (1, 'val', 'REAL', 0, None, 0)]
run time in seconds: 0.000


command: SELECT count (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.604


command: SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.598

I first generated the database with the following code:

import sqlite3
from random import uniform as rand

con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()
cursorObj.execute("CREATE TABLE IF NOT EXISTS randNums(id integer PRIMARY KEY, val real)")
try:
    for i in range(10**8):
        if i%10**5==0: print(i)
        cursorObj.execute("INSERT INTO randNums VALUES(%d, '%f')" % (i,rand(0,1)))
except:
    print('datbase is already filled with data')
    pass
con.commit()

from Parallelising a select query in Python's sqlite does not seem to improve performance

Hemant Vishwakarma

Friday, 24 February 2023

Parallelising a select query in Python's sqlite does not seem to improve performance

No comments:

Post a Comment