Friday, 25 March 2022

Efficiently count all the combinations of numbers having a sum close to 0

I have following pandas dataframe df

column1 column2 list_numbers          sublist_column
x        y      [10,-6,1,-4]             
a        b      [1,3,7,-2]               
p        q      [6,2,-3,-3.2]             

the sublist_column will contain the numbers from the column "list_numbers" that adds up to 0 (0.5 is a tolerance) I have written following code.

def return_list(original_lst,target_sum,tolerance):
    memo=dict()
    sublist=[]
    for i, x in enumerate(original_lst):
    
        if memo_func(original_lst, i + 1, target_sum - x, memo,tolerance) > 0:
            sublist.append(x)
            target_sum -= x          
    return sublist  

def memo_func(original_lst, i, target_sum, memo,tolerance):
    
    if i >= len(original_lst):
        if target_sum <=tolerance and target_sum>=-tolerance:
            return 1
        else:
            return 0
    if (i, target_sum) not in memo:  
        c = memo_func(original_lst, i + 1, target_sum, memo,tolerance)
        c += memo_func(original_lst, i + 1, target_sum - original_lst[i], memo,tolerance)
        memo[(i, target_sum)] = c  
    return memo[(i, target_sum)]    
    

Then I am using the "return_list" function on the "sublist_column" to populate the result.

target_sum = 0
tolerance=0.5

df['sublist_column']=df['list_numbers'].apply(lambda x: return_list(x,0,tolerance))

the following will be the resultant dataframe

column1 column2 list_numbers          sublist_column
x        y      [10,-6,1,-4]             [10,-6,-4]
a        b      [1,3,7,-2]               []
p        q      [6,2,-3,-3.2]            [6,-3,-3.2]  #sum is -0.2(within the tolerance)

This is giving me correct result but it's very slow(takes 2 hrs to run if i use spyder IDE), as my dataframe size has roughly 50,000 rows, and the length of some of the lists in the "list_numbers" column is more than 15. The running time is particularly getting affected when the number of elements in the lists in the "list_numbers" column is greater than 15. e.g following list is taking almost 15 minutes to process

[-1572.35,-76.16,-261.1,-7732.0,-1634.0,-52082.42,-3974.15,
-801.65,-30192.79,-671.98,-73.06,-47.72,57.96,-511.18,-391.87,-4145.0,-1008.61,
-17.53,-17.53,-1471.08,-119.26,-2269.7,-2709,-182939.59,-19.48,-516,-6875.75,-138770.16,-71.11,-295.84,-348.09,-3460.71,-704.01,-678,-632.15,-21478.76]

How can i significantly improve my running time?



from Efficiently count all the combinations of numbers having a sum close to 0

No comments:

Post a Comment