Java Weka API: Getting ROC Area values

25 Views Asked by At

I am trying to use the Weka API in a java class. I have performed 10-fold cross-validation and then binarized my data using different thresholds.

However I am new to using the Weka API so not sure what I have done is right. I am getting ROC values but note sure they are correct. Here is my code below:

for(int i = 0; i < threshold.length; i++)
{
     // Deep copy of learnSet
     ArrayList<ArrayList<int[]>> learnSetCopy = new ArrayList<>();
     for (ArrayList<int[]> innerList : learnSet)
     {
         ArrayList<int[]> innerCopy = new ArrayList<>();
         for (int[] array : innerList) 
         {
             int[] arrayCopy = Arrays.copyOf(array, array.length);
             innerCopy.add(arrayCopy);
         }
         learnSetCopy.add(innerCopy);
     }

     // Deep copy of validSet
     ArrayList<ArrayList<int[]>> validSetCopy = new ArrayList<>();
     for (ArrayList<int[]> innerList : validSet) 
     {
         ArrayList<int[]> innerCopy = new ArrayList<>();
         for (int[] array : innerList) 
         {
             int[] arrayCopy = Arrays.copyOf(array, array.length);
             innerCopy.add(arrayCopy);
         }
         validSetCopy.add(innerCopy);
     }

     //Binarize the chemical protein interaction values
     binarizeCpiAttributes(learnSetCopy, threshold[i]);
     //Generate an Arff file to be ran through Weka
     generateARFF(fileName + "LearningThreshold" + threshold[i] + "Fold" + j + ".arff", attributeNames, learnSetCopy);

     binarizeCpiAttributes(validSetCopy, threshold[i]);
     generateARFF(fileName + "ValidThreshold" + threshold[i] + "Fold" + j + ".arff", attributeNames, validSetCopy);

     //Create an Instances of the learning and valid arff files
     Instances learningInstances = DataSource.read(fileName + "LearningThreshold" + threshold[i] + "Fold" + j + ".arff");
     Instances validInstances = DataSource.read(fileName + "ValidThreshold" + threshold[i] + "Fold" + j + ".arff");

     //Set class label for learning and valid sets
     if(learningInstances.classIndex() == -1)
     {
         learningInstances.setClassIndex(learningInstances.numAttributes()-1);
     }

     if(validInstances.classIndex() == -1)
     {
         validInstances.setClassIndex(validInstances.numAttributes()-1);
     }

     RandomForest cls = new RandomForest();
     String[] options = {
         "-P", "100",
         "-I", "100",
         "-num-slots", "1",
         "-K", "0",
         "-M", "1.0",
         "-V", "0.001",
         "-S", "1"
     };
     cls.setOptions(options);
     cls.buildClassifier(learningInstances);

     Evaluation eval = new Evaluation(learningInstances);
     eval.evaluateModel(cls, validInstances);

     System.out.println("Area under ROC curve: " + eval.areaUnderROC(1));

     // Print or use the rocAuc value as needed
     System.out.println("Processing with Threshold: " + threshold[i]);
}

Here is an example of the outputs I am getting. I do think they should be higher which is making me question if what I have done is correct:

Area under ROC curve: 0.6000602772754672
Processing with Threshold: 0.4
Area under ROC curve: 0.5848854731766124
Processing with Threshold: 0.5
Area under ROC curve: 0.594831223628692
Processing with Threshold: 0.6
Area under ROC curve: 0.560051235684147
Processing with Threshold: 0.7

Is what I am doing here correct and is this the right way to get the ROC Area values or am I getting some other value from the Weka API?

I have tried to read through the Weka documentation provided but have gotten confused at parts.

0

There are 0 best solutions below