Discretize weka attributes into specific intervals

694 Views Asked by At

I need to discretize a column in weka. The column name is age. It has numerical attributes. for an example values from 2-90.

I need to perform discretization based on a specific range of values process to discretize Age attribute based on the following categories.

Youth: 15 -<=25,Adult:>25-<=64,Senior:>64

How this is possible in Weka?

How can I label and adjust the intervals

enter image description here

1

There are 1 best solutions below

0
fracpete On

Neither the supervised nor the unsupervised version of the Discretize filter will allow you to do that.

But you can achieve that goal by building a filter chain using MultiFilter:

  1. Use MathExpression to apply your manual binning strategy using nested ifelse expressions. Set the ignoreRange to the attribute that you want to convert and also select invertSelection. As expression use something like: ifelse(A<=25,0,ifelse(A<=64,1,2)) (25 or lower will be turned into 0, 64 or lower into 1 and the rest into 2).
  2. Convert the generated bin values into nominal labels using NumericToNominal. Define the attribute you want to convert in attributeIndices.
  3. Finally, rename the numeric-looking labels 0,1,2 into more meaningful ones using RenameNominalValues. Specify the attribute you want to update in selectedAttributes and use 0:Youth,1:Adult,2:Senior as valueReplacements.

The following MultiFilter setup converts the 7th attribute in a dataset in such a fashion (just copy it and paste it in the Weka Explorer via the right-click menu):

weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.MathExpression -E ifelse(A<=25,0,ifelse(A<=64,1,2)) -V -R 7" -F "weka.filters.unsupervised.attribute.NumericToNominal -R 7" -F "weka.filters.unsupervised.attribute.RenameNominalValues -R 7 -N 0:Youth,1:Adult,2:Senior" -S 1