Export HTML table to Excel file using C# (WinForms .NET 5)

3.4k Views Asked by At

I need to convert a HTML table with styling to an .XLSX file.

I managed to do that using the free version of GemBox.Spreadsheet, for now I don't mind the limitation of the free version but the problem is that numbers are considered as text when I open the Excel file. Is there any solution to that without manually opening the Excel file and converting them myself ? Or even a free alternative to GemBox library ?

File.WriteAllText("Table.html", html);
ExcelFile.Load("Table.html", LoadOptions.HtmlDefault).Save("Test.xlsx");

My HTML looks like this

<html>
  <body>
    <center>
      <table border="1" cellpadding="0" cellspacing="0">
        <tr>
          <td>Crimson Witch</td>
          <td>HP</td>
          <td>ATK</td>
          <td>DEF</td>
        </tr>
        <tr>
          <td>Flower</td>
          <td>10</td>
          <td style="background-color: #808080"></td>
          <td style="background-color: #808080"></td>
        </tr>
        <tr>
          <td>Plume</td>
          <td style="background-color: #808080"></td>
          <td>10</td>
          <td style="background-color: #808080"></td>
        </tr>
      </table>
    </center>
  </body>
</html>

Solution :

File.WriteAllText("Table.html", html.Replace("<center>", string.Empty).Replace("</center>", string.Empty));
ExcelFile.Load("Table.html", LoadOptions.HtmlDefault).Save("Test.xlsx");
2

There are 2 best solutions below

0
Mario Z On BEST ANSWER

The problem occurs because the <table> is inside the <center>.

Try removing the <center> element.

EDIT:

This issue is now resolved in the current latest bugfix version: https://www.gemboxsoftware.com/spreadsheet/nightlybuilds/GBS47v1336.zip

And in the current latest NuGet package:
Install-Package GemBox.Spreadsheet -Version 47.0.1336-hotfix

0
Jim G. On

I agree with @Jonathan Wood.

It would be best to do this in two steps:

  1. Parse the HTML table with the HTML Agility Pack NuGet package.

    var query = from table in
                doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
                from row in table.SelectNodes("tr").Cast<HtmlNode>()
                from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
                select new {Table = table.Id, CellText = cell.InnerText};
    
  2. Use EPPlus and its LoadFromArrays method.

string firstName = "Bob";
string lastName = "Burton";
DateTime dateOfBirth = new DateTime(2000, 1, 1);
var testData = new List<object[]>()
                {
                    new object[] {"First name", firstName},
                    new object[] {"Last name", lastName},
                    new object[] {"Date of birth", dateOfBirth}
                };

using (var excelPackage = new ExcelPackage())
{
    ExcelWorksheet excelWorksheet = excelPackage.Workbook.Worksheets.Add("Test worksheet");
    //Load the summary data into the sheet, starting from cell A1. Print the column names on row 1
    excelWorksheet.Cells["A1"].LoadFromArrays(testData);
}