I have several .csv files in a folder with the second column empty and I would like to fill it with the data present in as many csv files with the same name+column3.
Example:
firstfile.csv
header1,translation,source
first,,"third"
one,,three
firstfile_column3.txt (name of the file+_column3.txt)
source
second, fifth
two
firstfileoutput.csv (source became translation)
header1,translation,source
first,"second, fifth","third"
one,two,three
I was able to get it to work, but the accents and Asian characters are wrong. I am using Powershell 5 on Windows and should modify the output so that it is read as BOM-less UTF-8. Trying to edit the last few strings still can't solve it.
param(
$SourceDir = $PWD,
$OutDir = $PWD,
$OutFileSuffix = "output" # Define the suffix for the output file.
)
# Get all primary CSV files in the source directory.
$csvFiles = Get-ChildItem -Path $SourceDir -Recurse -Filter "*.csv"
foreach ($csvFile in $csvFiles) {
# Construct the name for the corresponding _column3 file.
$column3FileName = "{0}_column3.txt" -f $csvFile.BaseName
$column3FilePath = Join-Path -Path $SourceDir -ChildPath $column3FileName
# Check if the _column3 file exists.
if (Test-Path $column3FilePath) {
# Import the primary CSV file and the corresponding _column3 file.
$primaryCsv = Import-Csv -Path $csvFile.FullName
$column3Data = Get-Content $column3FilePath
# Assuming the first line in the _column3 file is a header and we skip it.
$column3Values = $column3Data | Select-Object -Skip 1
# Update the second column (translation) in the primary CSV with data from the _column3 file.
for ($i = 0; $i -lt $primaryCsv.Count; $i++) {
$primaryCsv[$i].translation = $column3Values[$i]
}
# Construct the output file path.
$outputFilePath = Join-Path -Path $csvFile.DirectoryName -ChildPath ("{0}{1}.csv" -f $csvFile.BaseName, $OutFileSuffix)
# Export the updated CSV data to a new file.
$primaryCsv | Export-Csv -Path $outputFilePath -NoTypeInformation -Encoding UTF8
}
else {
Write-Warning "Corresponding column3 file not found for $($csvFile.Name)"
}
}
To ensure consistent, BOM-less UTF-8 handling in Windows PowerShell:
On reading:
Use
-Encoding utf8to ensure that BOM-less UTF-8 files are read as such:Import-Csv -Encoding utf8 $csvFile.FullNameGet-Content -Encoding utf8 $column3FilePathOn writing:
Avoid
-Encoding utf8, because it creates files with BOM.A non-obvious workaround for your
Export-Csvcall is needed, which relies on the fact thatNew-Itemdoes create BOM-less UTF-8 files, by default and invariably:Note that none of these things are necessary in PowerShell (Core) 7+, which consistently defaults to (BOM-less) UTF-8, across all built-in cmdlets (as well as when reading source code from files).
Background information:
Unfortunately, Windows PowerShell and its file-reading cmdlets exhibit inconsistent behavior with respect to what character encoding to assume in the absence of a BOM.
In your code
Import-Csvhappens to assume UTF-8, ...... while
Get-Contentassumes ANSI, i.e. the system's active legacy ANSI code page, as does the PowerShell engine itself when it reads source code.Similarly, there's no consistence when writing to files:
Set-Contentuses ANSI encoding, whereasOut-File(and its virtual aliases,>and>>)By contrast, PowerShell (Core) 7+ and its cmdlets - both reading and writing ones - now consistently default to (BOM-less) UTF-8
The upshot is:
In Windows PowerShell, to be safe, always use
-Encoding utf8if you know you're dealing with UTF-8-encoded files - whether they do or do not have a BOM (in the former case, using-Encoding utf8is technically redundant, but causes no harm).The same applies if you want to write to files using UTF-8, but note that in Windows PowerShell
-Encoding utf8invariably creates UTF-8 files with a BOM:See this answer for plain-text workarounds (the
New-Itemworkaround is shown above).By contrast, PowerShell (Core) 7+ creates BOM-less UTF-8 files both by default and with
-Encoding utf8; in case you do want a BOM there, use-Encoding utf8BOM