I have following pandas dataframe df
column1 column2 list_numbers sublist_column
x y [10,-6,1,-4]
a b [1,3,7,-2]
p q [6,2,-3,-3.2]
the sublist_column will contain the numbers from the column "list_numbers" that adds up to 0 (0.5 is a tolerance) I have written following code.
def return_list(original_lst,target_sum,tolerance):
memo=dict()
sublist=[]
for i, x in enumerate(original_lst):
if memo_func(original_lst, i + 1, target_sum - x, memo,tolerance) > 0:
sublist.append(x)
target_sum -= x
return sublist
def memo_func(original_lst, i, target_sum, memo,tolerance):
if i >= len(original_lst):
if target_sum <=tolerance and target_sum>=-tolerance:
return 1
else:
return 0
if (i, target_sum) not in memo:
c = memo_func(original_lst, i + 1, target_sum, memo,tolerance)
c += memo_func(original_lst, i + 1, target_sum - original_lst[i], memo,tolerance)
memo[(i, target_sum)] = c
return memo[(i, target_sum)]
Then I am using the "return_list" function on the "sublist_column" to populate the result.
target_sum = 0
tolerance=0.5
df['sublist_column']=df['list_numbers'].apply(lambda x: return_list(x,0,tolerance))
the following will be the resultant dataframe
column1 column2 list_numbers sublist_column
x y [10,-6,1,-4] [10,-6,-4]
a b [1,3,7,-2] []
p q [6,2,-3,-3.2] [6,-3,-3.2] #sum is -0.2(within the tolerance)
This is giving me correct result but it's very slow(takes 2 hrs to run if i use spyder IDE), as my dataframe size has roughly 50,000 rows, and the length of some of the lists in the "list_numbers" column is more than 15. The running time is particularly getting affected when the number of elements in the lists in the "list_numbers" column is greater than 15. e.g following list is taking almost 15 minutes to process
[-1572.35,-76.16,-261.1,-7732.0,-1634.0,-52082.42,-3974.15,
-801.65,-30192.79,-671.98,-73.06,-47.72,57.96,-511.18,-391.87,-4145.0,-1008.61,
-17.53,-17.53,-1471.08,-119.26,-2269.7,-2709,-182939.59,-19.48,-516,-6875.75,-138770.16,-71.11,-295.84,-348.09,-3460.71,-704.01,-678,-632.15,-21478.76]
How can i significantly improve my running time?
from Efficiently count all the combinations of numbers having a sum close to 0
No comments:
Post a Comment