Efficient algorithms to perform Market Basket Analysis

277 Views Asked by At

I want to perform Market Basket Analysis (or Association Analysis) on retail ecommerce dataset.

The problem I am facing is the huge data size of 3.3 million transactions in a single month. I cannot cut down the transactions as I may miss some products. Provided below the structure of the data:

Order_ID = Unique transaction identifier

Customer_ID = Identifier of the customer who placed the order

Product_ID = List of all the products the customer has purchased

Date = Date on which the sale has happened

When I feed this data to the #apriori algorithm in Python, my system cannot handle the huge memory requirements to run. It can run with just 100K transactions. I have 16gb RAM.

Any help in suggesting a better (and faster) algorithm is much appreciated.

I can use SQL as well to sort out data size issues, but I will get only 1 Antecedent --> 1 Consequent rule. Is there a way to get multiset rules such as {A,B,C} --> {D,E} i.e, If a customer purchases products A, B and C, then there is a high chance to purchase products D and E.

1

There are 1 best solutions below

0
Reoun On

For a huge data size try FP Growth, as it is an improvement to the Apriori method. It also only loop data twice when compared to Apriori.

from mlxtend.frequent_patterns import fpgrowth

Then just change:

apriori(df, min_support=0.6)

To

fpgrowth(df, min_support=0.6)

There also an research that compare each algorithm, for memory issue I recommend : Evaluation of Apriori, FP growth and Eclat association rule miningalgorithms or Comparing the Performance of Frequent Pattern Mining Algorithms.