Copy contents of the Excel file from a stream asynchronously using SqlBulkCopy

682 Views Asked by At

I have a project where I need to copy the contents of the .xlsx file I received in Web API Controller (in the form of the Stream from MultipartReader) to SQL Server Database. I'm using SqlBulkCopy for copying itself (I already did a similar task for .csv files), but all of the solutions I was able to find suffer from one or more of the following problems:

  • Require saving the file to the disk first (not possible in my case)
  • Don't have any way of reading the file asynchronously
  • Load entire file into memory first (I'm expecting to deal with fairly large files, so this is not acceptable for me)
  • Are commercially licensed

Are there any ways of doing this?

1

There are 1 best solutions below

0
MarkPflug On

Jeroen is correct, in that it is not possible to handle Excel files in a purely streaming manner. While it might require loading the entire .xlsx file in memory, the efficiency of the library can have an even larger impact on the memory usage than the file size. I say this as the author of the most efficient Excel reader for .NET: Sylvan.Data.Excel.

In benchmarks comparing it to other libraries, you can see that not only is it significantly faster than other implementations, but it also uses only a tiny fraction of the memory that other libraries consume.

With the exception of "Load entire file into memory first", it should satisfy all of your requirements. It can process data out of a MemoryStream, it doesn't need to write to disk. It implements DbDataReader which provides ReadAsync. The ReadAsync implementation defaults to the base DbDataReader implementation which defers to the synchronous Read() method, but when the file is buffered in a MemoryStream this doesn't present a problem, and allows the SqlBulkCopy.WriteToServerAsync to process it asynchronously. Finally, it is MIT licensed, so you can do whatever you want with it.

using Sylvan.Data;
using Sylvan.Data.Excel;
using System.Data.Common;
using System.Data.SqlClient;

// provide a schema that maps the columns in the Excel file to the names/types in your database.
var opts = new ExcelDataReaderOptions
{
    Schema = MyDataSchemaProvider.Instance
};

var filename = "mydata.xlsx";
var ms = new MemoryStream();

// asynchronously load the file into memory
// this might be loading from an Asp.NET IFormFile instead
using(var f = File.OpenRead(filename))
{
    await f.CopyToAsync(ms);
    ms.Seek(0, SeekOrigin.Begin);
}

// determine the workbook type from the file-extension
var workbookType = ExcelDataReader.GetWorkbookType(filename);
var edr = ExcelDataReader.Create(ms, workbookType, opts);

// "select" the columns to load. This extension method comes from the Sylvan.Data library.
var dataToLoad = edr.Select("PartNumber", "ServiceDate");

// bulk copy the data to the server.
var conn = new SqlConnection("Data Source=.;Initial Catalog=mydb;Integrated Security=true;");
conn.Open();
var bc = new SqlBulkCopy(conn);
bc.DestinationTableName = "MyData";
bc.EnableStreaming = true;
await bc.WriteToServerAsync(dataToLoad);

// Implement an ExcelSchemaProvider that maps the columns in the excel file
sealed class MyDataSchemaProvider : ExcelSchemaProvider
{
    public static ExcelSchemaProvider Instance = new MyDataSchemaProvider();

    static readonly DbColumn PartNumber = new MyColumn("PartNumber", typeof(int));
    static readonly DbColumn ServiceDate = new MyColumn("ServiceDate", typeof(DateTime));
    // etc...

    static readonly Dictionary<string, DbColumn> Mapping = new Dictionary<string, DbColumn>(StringComparer.OrdinalIgnoreCase)
    {
        { "partnumber", PartNumber },
        { "number", PartNumber },
        { "prt_nmbr", PartNumber },
        { "servicedate", ServiceDate },
        { "service_date", ServiceDate },
        { "svc_dt", ServiceDate },
        { "sd", ServiceDate },
    };

    public override DbColumn? GetColumn(string sheetName, string? name, int ordinal)
    {
        if (string.IsNullOrEmpty(name))
        {
            // There was no name in the header row, can't map to anything.
            return null;
        }

        if (Mapping.TryGetValue(name, out DbColumn? col))
        {
            return col;
        }
        // header name is unknown. Might be better to throw in this case.
        return null;
    }

    class MyColumn : DbColumn
    {
        public MyColumn(string name, Type type, bool allowNull = false)
        {
            this.ColumnName = name;
            this.DataType = type;
            this.AllowDBNull = allowNull;
        }
    }

    public override bool HasHeaders(string sheetName)
    {
        return true;
    }
}

The most complicated part of this is probably the "schema provider" which is used to provide header name mappings and define the column types, which are required for SqlBulkCopy to operate correctly.

I also maintain the Sylvan.Data.Csv library, which provides very similar capabilities for CSV files, and is a fully asynchronous streaming CSV reader impelementation. The API it provides is nearly identical to the Sylvan ExcelDataReader. It is also the fastest CSV reader for .NET.

If you end up trying these libraries and have any troubles, open an issue in the github repo and I can take a look.