Lec08: Data Mining

Data Mining Tasks

{LHS}->{RHS} (Shopping cart example.)

Different from “function dependency” in database?
- FD: SSN -> name (one with the same SSN must have the same name.)
- Rules: {pen} -> {ink}

Support for a set of items: % of transactions containing $LHS\cup RHS$
- Statistical significance
Confidence for a rule: % of tx containing LHS that also contain RHS
- “Strength” of association
- “confidence” threshold usually higher than “support” threshold.

pen, ink, diary, soap
pen, ink, diary
pen, diary
pen, ink, soap

$conf([pen, ink]\rightarrow[soap])=67\%$ and $supp([pen,ink,soap]) =50\%$

Find all “large” (or “frequent”) item sets together with their support.
Derive all association rules with sufficient confidence from each large itemset.

:grey_question: Every subset of a large (frequent) itemset is also large (frequent).

we should make sup(LHS) as a subroutine of solving sup(LHS & RHS) ?

Priori algorithm: many scans but in each scan we calculate much less.

2020-3-31 15:45:08