I'm trying to implement Apriori Algorithm. For that, I need to generate itemsets of length k+1 from itemsets of length k (given as a dictionary L). The Apriori principle must be followed in generating the combinations. The principle states: A set of length k+1 can only be generated if ALL its subsets are present in the input, L.
I have a dictionary from which I need to generate itemsets.
My current attempt is this:
import itertools as it
def generateItemsets(Lk,k):
comb = sum(Lk.keys(), tuple())
Ck = set(it.combinations(comb, k))
return Ck
But the function takes forever and get interrupted at the error : IOPub data rate exceeded.
Example-1:
Input (dictionary): {(150,): 2, (160,): 3, (170,): 3, (180,): 3}
Output (set): {(150, 160), (150, 170), (150, 180), (160, 170), (160, 180), (170, 180)}
Update-1
The dataset contains almost 16000 transactions. It looks like this:
The unique items range from 0-999
What I want to achieve is this: 
As you can see, this function will be given an input L_k and it should output C_k+1. Input L_k is a dictionary like ({(301,350): 46, (966,970): 612, (310,350): 216, (548, 550): 457}) while the output C_k+1 should be a set (for example: {(250,350),(360,370),(380,390),...}
from Generate candidate itemsets based on Apriori algorithm

No comments:
Post a Comment