Monday, 1 March 2021

Generate candidate itemsets based on Apriori algorithm

I'm trying to implement Apriori Algorithm. For that, I need to generate itemsets of length k+1 from itemsets of length k (given as a dictionary L). The Apriori principle must be followed in generating the combinations. The principle states: A set of length k+1 can only be generated if ALL its subsets are present in the input, L.

I have a dictionary from which I need to generate itemsets.

My current attempt is this:

import itertools as it
def generateItemsets(Lk,k):

    comb = sum(Lk.keys(), tuple())
    Ck = set(it.combinations(comb, k))
    return Ck

But the function takes forever and get interrupted at the error : IOPub data rate exceeded.

Example-1:

Input (dictionary): {(150,): 2, (160,): 3, (170,): 3, (180,): 3}

Output (set): {(150, 160), (150, 170), (150, 180), (160, 170), (160, 180), (170, 180)}

Update-1

The dataset contains almost 16000 transactions. It looks like this:

Dataset

The unique items range from 0-999

What I want to achieve is this: Pictorial representation

As you can see, this function will be given an input L_k and it should output C_k+1. Input L_k is a dictionary like ({(301,350): 46, (966,970): 612, (310,350): 216, (548, 550): 457}) while the output C_k+1 should be a set (for example: {(250,350),(360,370),(380,390),...}



from Generate candidate itemsets based on Apriori algorithm

No comments:

Post a Comment