Monday, 8 November 2021

Finding weekly combinations of items bought together using pandas groupby

I have a df:

date            category            subcategory         order_id        product_id       branch
2021-05-04      A                   aa                  10              5                web
2021-06-04      A                   dd                  10              2                web
2021-05-06      B                   aa                  18              3                shop
2021-07-06      A                   aa                  50              10               web
2021-07-06      C                   cc                  10              15               web
2021-07-05      A                   ff                  101             30               shop
2021-10-04      D                   aa                  100             15               shop  

I am trying to answer a question which items categories and subcategories are bought together per branch type weekly. I am thinking of grouping the order_ids and aggregating the category & subcategory to a list like so:

a = (df.set_index('date')
     .groupby(['order_id','branch'])
     .resample('W-MON', label = 'left')
     .agg({'category':list, 'subcategory':list}))

Which returns :

                                                category            subcategory
order_id        branch          date            [A, A, A]           [aa, dd, aa]
10              web             2021-05-04      ...                 ...
18              shop            ...
50              web
100             web
101             shop

I am trying to build a structure which would show the frequency of each variation of the categories and subcategories bought each week per branch, something similar to this:

branch                                  date                        
                                        2021-05-04                      2021-05-011
                                                                        ...
web                  category            3, [A, A, A] 
                                         2, [A, A] 
                                         2, [A, A, B, B]

                     subcategory         5, [aa, dd, aa]
                                         4, [dd, aa]
                                         1, [dd]

shop                 category            3, [A, A, A] 
                                         2, [A, A] 
                                         2, [A, A, B, B]

                     subcategory         5, [aa, dd, aa]
                                         4, [dd, aa]
                                         1, [dd]

Where the number before the list denotes the number of times a certain combinations of categories and subcategories were bought in the same order. I am unsure how to achieve such a structure or a similar one that would show the weekly combination frequencies by branch. The order of the product_id in the order does not matter as the final basket is the same.

So the goal is to see the frequency of categories, subcategories & product_ids bought in the same order weekly. So if 2 different orders have the same products, the aggregated result would show 2, [A,B] [aa, bb] [5, 2] where the lists hold category, subcategory & product_id combinations.



from Finding weekly combinations of items bought together using pandas groupby

No comments:

Post a Comment