I'm working on creating a binary model. I thought everything was working alright, but when I found it weird how often the model was off, but when I tried to adjust the threshold I noticed that nothing changed, so that's when I started to investigate.
I checked my predictedValues of my binary classification and I noticed that most of them were negative.
Here is my model:
public ITransformer TrainCategorialModel(IEnumerable<TrainingCategorial> trainingData)
{
var columnNames = typeof(TrainingCategorial)
.GetProperties()
.Where(property => property.DeclaringType != typeof(TrainingCategorial))
.Select(property => property.Name)
.ToArray();
// Check for null values in training data
if (trainingData.Any(item => item == null))
{
throw new ArgumentException("Training data contains null values.");
}
var pipeline = mLContext.Transforms.Concatenate("Features", columnNames)
.Append(mLContext.BinaryClassification.Trainers.SdcaNonCalibrated(labelColumnName: "CHPlabels", featureColumnName: "Features"));
var data = mLContext.Data.LoadFromEnumerable(trainingData);
var model = pipeline.Fit(data);
return model;
}
Where my features are based on the parameter model in another class.
My prediction looks like this:
public List<bool> PredictCategorialModel(ITransformer model, IEnumerable<PredictionCategorial> input)
{
// 4. Transform data
IDataView testingData = mLContext.Data.LoadFromEnumerable(input);
// 5. Predict the new values based on the features.
List<float> predictedValues = mLContext.Data.CreateEnumerable<BinaryPrediction>(
model.Transform(testingData), reuseRowObject: false)
.Select(row => row.LabelPrediction)
.ToList();
// Apply a threshold (e.g., 0.5) to convert scores into boolean predictions
var threshold = 0.3;
List<bool> predictedLabels = predictedValues.Select(LabelPrediction => LabelPrediction > threshold).ToList();
return predictedLabels;
}
I've checked my data and it appears fine. How to fix this?
Update: I think the issue lies in the model, I've tried other ways of creating the predicted boolean, but I get the same errors. I've tried the LightGBM, as I know the model shouldn't be linear, but that created whole new problems (which I also have an unaswered question about). Does anyone know any good ways to check if a model works?