I'm looking to make use of the parquet file type in C# to allow for interop with other data languages like Python and R used by others. I know the data I will be dealing with will be of type DataFrame (Python) and data.table (R). However at the time of reading the data I will not know the column types in the parquet data as I am looking to right a generic function to read/write parquet to/from Deedle Frames.
I've seen the doco here, but cannot seem to adapt it to perform the task. If I look in the metadata for a field I see the ElementType is of type Int32 but when I try
rowGroupReader.Column(0).LogicalReader<int>().ReadAll(numRows); // 1
it complains that it cannot convert from Nullable to int. If I use
rowGroupReader.Column(0).LogicalReader<int?>().ReadAll(numRows); // 2
I get a result as expected. To make it generic I was using
private LogicalColumnReader<T> GetLogicalReader<T>(T dataType, ColumnReader reader)
{
return reader.LogicalReader<T>();
}
Called by using
GetLogicalReader(rowGroupReader.Column(0).EntityType, rowGroupReader.Column(0))
as a method of effecting the code in (2) above.
How do I make this generic to cater for the nullable types desired? If I change the function to be of type T? I need to constrain T to be struct but then my call using ElementType will not compile.
I'm likely going about this the wrong way as a first attempt but I otherwise cannot see a use for the file format without a performant way of loading the data as typed correctly.