I'm a beginner in C# and ML.Net and I'm working on loading data in ML.Net, to do so I have some data stored in a .csv file. The first step is to create a class for the Data Model like this :
public class ModelInput [ColumnName("sepallength"), LoadColumn(1)] public float Sepallength { get; set; } [ColumnName("sepalwidth"), LoadColumn(2)] public float Sepalwidth { get; set; } [ColumnName("petallength"), LoadColumn(3)] public float Petallength { get; set; } [ColumnName("petalwidth"), LoadColumn(4)] public float Petalwidth { get; set; } [ColumnName("variety"), LoadColumn(5)] public string Variety { get; set; }
So I need to store all the features that I have on the first line of my .csv in this class, before loading the .csv file. I also need to know the type of variable represented by the features (string, float, DateTime,...), and the column in which the feature is represented in my .csv.
After having created this class, I will load my data from my .csv with this simple line of code :
var myData = mlContext.Data.LoadFromTextFile<ModelInput>("Iris.csv",
hasHeader: true, separatorChar: ',');
So this command uses the class I created first for the Data Model.
Here is the problem : I want to do the first step I described in a dynamic maneer, because before loading my .csv file I need to already know the structure of my .csv and implement the class. But what If I don't know it ? (For example, What if someone wants to load its own file ?)
My idea was to write some script that reads a first time my .csv and that creates a .cs file and fills this file with the code needed for my class, so the script will depend on the features that I have in my .csv and also their type of variable (and even other things...)
But I'm wondering about the feasibility of my idea, in the sense that after automatically creating my class for the Data Model by reading the .csv, it must be compiled, but I have already compiled all my files which means that I will need to recompile again somehow...
Is it the right way to do this or am I missing something that is more easier ?
Thank you in advance for your answers
If you need to load data without having an input schema class, there are actually a couple of ways you can do that.
Infer Columns AutoML
You can use the
Microsoft.ML.AutoML
package which allows you to use theInferColumns
method.From here you have two properties,
TextLoaderOptions
which you can pass into thecontext.Data.CreateTextLoader
Then, with the loader, you can load the file into an
IDataView
.DataFrame API
The DataFrame API comes with the
Microsoft.Data.Analysis
NuGet package and it offers an easy way to load CSV data with theLoadCsv
method.This gives you a
DataFrame
object and you can use this to do some processing and analysis on the data similar to what you can do with the pandas library in Python. However, note that this library is in the very early stages so it's a bit limited in what you can do with it right now.The
DataFrame
object is compatible with theIDataView
and you can do an explicit cast on it.And you can use this for any ML.NET operations.