I am using mlxtend to find association rules:
Here is the code:
df = apriori(dum_data, min_support=0.4, use_colnames=True)
rules = association_rules(df, metric="lift", min_threshold=1)
rules2=rules[ (rules['lift'] >= 1) & (rules['confidence'] >= 0.7) ]
Output:
antecedents consequents antecedentsupport consequentsupport support confidence lift leverage conviction
frozenset({'C'}) frozenset({'B'}) 0.63 0.705 0.45 0.726 1.030 0.013 1.077
frozenset({'A'}) frozenset({'B'}) 0.98 0.705 0.69 0.70 1.003 0.0007 1.00081
frozenset({'A', 'C'}) frozenset({'B'}) 0.63 0.705 0.45 0.72 1.030 0.013 1.0776
I have given a min support=0.4. What is the difference between antecedentsupport, consequentsupport and support?
What do mean by lift and leverage? How to judge if its good or bad?
Confidence I can understand that is how many times C and B occured together for first rule in output. ? Is that correct
Let's take the third rule
({A,C} => {B})as an example:support = support of {A, B, C} | support means, that you count the number of transactions that contain all three of {A, B, C} and divide it by the total number of transactions.
antecedentsupport = support of what precedes the
=>, means support of {A,C}consequentsupport = support of what comes after the
=>, means support of {B}confidence = how likely is it, that after we observed {A,C} that the transaction additionally contains {B}. Think of it as the conditional probability
p(B given {A,C}).Lift: The definition for lift can e.g. be found here: wikipedia. This means, that if lift < 1 then {A,C} and {B} occur together less often than expected. If lift is larger than one then {A,C} and {B} appear together more often than expected.
Leverage is roughly the same. It also compares the expected co-occurrence and the observed one. Further explanation e.g. here
What makes a good lift/leverage is subjective but I'd suggest a lift of > 1. If it comes to rules I would look more at confidence.