How C4.5 algorithm handles data with same attributes but different results?

251 Views Asked by razorx At 24 March 2017 at 23:51

I'm trying to create a decision tree with C4.5 algorithm for a school project. The decision tree is for Haberman's Survival Data Set, attribute information is as follows.

Attribute Information:

1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
    1 = the patient survived 5 years or longer
    2 = the patient died within 5 year

And we need to implement a decision tree where each leaf has to have one distinct result (meaning the entropy of that leaf should be 0), however there are six instances where there is the same attributes, but different results.

For example:

66,58,0,2
66,58,0,1

What does C4.5 algorithm do in these type of situations, I've searched everywhere but couldn't find any information.

Thanks.

Original Q&A

There are 1 best solutions below

Calvin On 23 January 2020 at 19:52

Read Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993. (It is a good to study C4.5, if you have college assignment)

From what I studied. it seems like on page 137, source code listing build.c
There is a line of
//* if all case are the same.... or there are not enough case to divide (like your question)
it will return Node
This Node come from
Node = Leaf(ClassFreq, BestClass, Cases, Cases-NoBestClass);

ClassFreq store the count of each classes
BestClass store which is the dominant class (most freq) Cases store how many data is there
NoBestClass store how many data of BestClass

This Leaf function comes from file Trees.c this leaf function will return a node with leaf of bestClass (Best class become the leaf).

All of this information reference on Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.

Anyone that has knowledge of this, please comment if I have made something wrong. Thankss

How C4.5 algorithm handles data with same attributes but different results?

There are 1 best solutions below

Related Questions in ALGORITHM

Related Questions in DECISION-TREE

Related Questions in J48

Related Questions in C4.5

Trending Questions

Popular # Hahtags

Popular Questions