After reading this post, I have been trying to compare parallelization with non parallelization in sqlite. I am using Python's sqlite3 library to create a database containing a table called randNums which contains a two columns, an id and a val. The val is a random number between 0 and 1. I then select all rows with val greater than a half. I have done this twice so as to compare run times of the parallelized version and unparallelized version, however they take the same amount of time. I'd like to know if I am using the keyword 'PARALLEL' incorrectly, or if I need to first enable parallelization with a python command.
Finally, I'd also like to know how parallelization differs for different databases, for example, mysql and postgresql.
import sqlite3
from time import time
con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()
cmd = ['PRAGMA table_info(randNums);']
cmd += ['SELECT count (*) from randNums where val>.5;']
cmd += ['SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;']
for c in cmd:
t = time()
cursorObj.execute(c)
print('command: %s' % c)
print(cursorObj.fetchall())
print('run time in seconds: %.3f\n\n' % (time()-t))
Running a Python script containing the above code results in the following output:
command: PRAGMA table_info(randNums);
[(0, 'id', 'INTEGER', 0, None, 1), (1, 'val', 'REAL', 0, None, 0)]
run time in seconds: 0.000
command: SELECT count (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.604
command: SELECT count /*+ PARALLEL(4) */ (*) from randNums where val>.5;
[(49996009,)]
run time in seconds: 3.598
I first generated the database with the following code:
import sqlite3
from random import uniform as rand
con = sqlite3.connect('mydatabase.db')
cursorObj = con.cursor()
cursorObj.execute("CREATE TABLE IF NOT EXISTS randNums(id integer PRIMARY KEY, val real)")
try:
for i in range(10**8):
if i%10**5==0: print(i)
cursorObj.execute("INSERT INTO randNums VALUES(%d, '%f')" % (i,rand(0,1)))
except:
print('datbase is already filled with data')
pass
con.commit()
from Parallelising a select query in Python's sqlite does not seem to improve performance
No comments:
Post a Comment