Get 3rd column and put it in another file for all CSV files with Powershell

136 Views Asked by At

I have a folder, with subfolders full of csv files. For each file I would like to get only the third column and save it in another file.

Example:

filedata1.csv

aa,bb,cc
cat,horse,"dog, bird"
4th,33,first and second

filedata1column3.csv

cc
dog, bird
first and second

what I tried but it turns out wrong:

param ( [String] $Filename  = '*.csv',
            [String] $SourceDir = 'C:\Data\',
    )
    
ForEach ($csvFile in Get-ChildItem -Path $SourceDir -Filter $Filename){
            (Import-Csv $csvFile)
            cut -d, -f 3 *.csv > *column3.csv
1

There are 1 best solutions below

17
mklement0 On BEST ANSWER
  • Use the intrinsic psobject property to determine each file's column names from the objects that Import-Csv parses the CSV data into, and extract the third name.

  • Then use member-access enumeration to extract all property values from the rows (objects), relying on PowerShell's ability to use expressions (variable references) as property names.

  • Use Join-Path in conjunction with -f, the format operator to determine a suitable output file path and name, based on the input file name.

# ... parameter declaration and foreach loop omitted.
# The following is what needs to go *inside the foreach loop*.

# Import all rows.
$rows = Import-Csv $csvFile
# Determine the name of the third column
$thirdColName = ($rows[0].psobject.Properties.Name)[2]

# Determine a suitable output file path and name.
# Here, the output file is placed in the same directory as the input file, 
# with '_colummn3' appended to the latter's base name.
$outFile =
  Join-Path $csvFile.DirectoryName ('{0}_column3.csv' -f $csvFile.BaseName)

# Output the header column
$thirdColName > $outFile
# Append the values of the third column
$rows.$thirdColName >> $outFile

Note:

  • In Windows PowerShell, > and >> create "Unicode" (UTF-16LE) files, whereas in PowerShell (Core) 7+ they create BOM-less UTF-8 files.

  • If you need a different encoding, use Set-Content and Add-Content with an -Encoding argument, but note that in Windows PowerShell you cannot create BOM-less UTF-8 files that way.

    • If you do need BOM-less UTF-8 files in Windows PowerShell, use the workaround from this answer (use this instead of the > and >> statements, and use "`r`n" instead of "`n" if you need Windows-format CRLF newlines):

      # Write the entire file as BOM-less UTF-8.
      $null =
        New-Item -Force $outFile -Value (
          $thirdColName + "`n" + ($rows.$thirdColName -join "`n") + "`n"
        )
      

To put it all together in a solution that:

  • parameterizes the inputs further while providing default values
  • creates BOM-less UTF-8 files.

Place the code in a .ps1 file and invoke the latter (e.g. ./Script.ps1), optionally with arguments that override the defaults.

param(
  $FileName = '*.csv', # File-name filter
  $SourceDir = $PWD,   # Input-files dir., default to current
  $OutDir = $PWD,      # Output-files dir., default to current
  $ColumnIndex = 3,    # 1-based index of the col. of interest
  # Suffix to append to the input file name's base name 
  # to form the output file name.
  $OutFileBaseNameSuffix = "_column$ColumnIndex"
)

foreach ($csvFile in Get-ChildItem -LiteralPath $SourceDir -Filter $Filename) {

  # If the output files are written to the same dir. as the input files,
  # exclude any output files from a previous run.
  if ($SourceDir -eq $OutDir -and $csvFile.BaseName -like "*$OutFileBaseNameSuffix") { continue }

  # Import all rows for the file at hand.
  $rows = Import-Csv $csvFile
  # Determine the name of the column with the index of interest.
  # Subtract -1 from $ColumnIndex, because array indices are *0*-based.
  $colName = ($rows[0].psobject.Properties.Name)[$ColumnIndex-1]

  # Determine the output file path and name.
  $outFile =
    Join-Path $OutDir ('{0}{1}.csv' -f $csvFile.BaseName, $OutFileBaseNameSuffix)

  # Write the entire file, as BOM-less UTF-8.
  $null =
    New-Item -Force $outFile -Value (
      $colName + "`n" + ($rows.$colName -join "`n") + "`n"
    )

}