ParquetSharp - Read columns into correct types without knowing the types in the file

1k Views Asked by At

I'm looking to make use of the parquet file type in C# to allow for interop with other data languages like Python and R used by others. I know the data I will be dealing with will be of type DataFrame (Python) and data.table (R). However at the time of reading the data I will not know the column types in the parquet data as I am looking to right a generic function to read/write parquet to/from Deedle Frames.

I've seen the doco here, but cannot seem to adapt it to perform the task. If I look in the metadata for a field I see the ElementType is of type Int32 but when I try

rowGroupReader.Column(0).LogicalReader<int>().ReadAll(numRows);  // 1

it complains that it cannot convert from Nullable to int. If I use

rowGroupReader.Column(0).LogicalReader<int?>().ReadAll(numRows);  // 2

I get a result as expected. To make it generic I was using

private LogicalColumnReader<T> GetLogicalReader<T>(T dataType, ColumnReader reader)
{
    return reader.LogicalReader<T>();
}

Called by using

GetLogicalReader(rowGroupReader.Column(0).EntityType, rowGroupReader.Column(0))         

as a method of effecting the code in (2) above.

How do I make this generic to cater for the nullable types desired? If I change the function to be of type T? I need to constrain T to be struct but then my call using ElementType will not compile.

I'm likely going about this the wrong way as a first attempt but I otherwise cannot see a use for the file format without a performant way of loading the data as typed correctly.

0

There are 0 best solutions below