I'm using a Powershell script found at ((https://blog.netnerds.net/2015/01/powershell-high-performance-techniques-for-importing-csv-to-sql-server/) to load data from quotes delimited, comma separated .csv in to a SQL Server (2016) database.
The source csv files contain extended characters within the data such as umlauts in some first names and the m2 (only it's a little 2) for metre squared. I believe I need to need to specify a codepage of 65001 in order for such extended character values to flow from the csv file and be inserted into varchar datatypes.
I don't know the syntax to specify a particular codepage (or encoding) value?
I've attempted this:
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring,([System.Data.SqlClient.SqlBulkCopyOptions] 'TableLock'))
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize
#$bulkcopy.Codepage = $codepage
where $codepage is already assigned the value of 65001 but the above syntax errors? I've googled this for ages but not found an example. Any help will be much appreciated.
Using a simple split command to read CSV files is not recommended because the CSV format is more complex than it appears, with different delimiters, quoted fields, and line breaks within fields. To handle these complexities and correctly parse CSV files, it's important to use a dedicated CSV parsing library like CsvHelper, which supports a variety of CSV specifications, including RFC 4180. Additionally, if the CSV file contains characters that are not in the default encoding, the output might display incorrectly. To ensure that the file is read correctly, it's important to check the encoding of the file and set the appropriate encoding before reading it.
This here is a sample using csvhelper that is acutely a nuget package inside powershell. This sample adds records to System.Data.DataTable and to a PSCustomObject. You need to implement the batch bulk inserts from DataTable on your own. (This should not be too difficult).
As in the comment mentioned there can be several pitfalls here, incompliant source files, incorrect reading of the source files. Database collations, applications that read from db incorrectly. On top you don't have real control over all parts of these environments. This should give your script side most correct phrasing throught CSVHelper and some encoding control, if your editing is limited to the PowerShell script.
To test the script below just create a C:\temp\test.csv
Advantage: RFC 4180 Compliant and control over the read/write encoding to the datatable
This will endup in the datatable as
Cause i changed the encoding in between.