PowerShell script to check and Skip rows that have yellow cells, gray cells or blank cells

90 Views Asked by At

I need to prepare s Powershell script to check my NSG rules design sheet and Skip rows that have yellow cells, gray cells or blank cells. I have prepared below script but it is not working as expected. Can someone please help me to fix this script.

Powershell script:

# Load the EPPlus library
Add-Type -Path "/home/lokesh/EPPlus.dll"  # Replace with the path to your EPPlus.dll file

# Create an Excel package object
$excelPackage = New-Object OfficeOpenXml.ExcelPackage

# Load the Excel file
$filePath = "/home/lokesh/NSG_Rules.xlsx"
$excelPackage = [OfficeOpenXml.ExcelPackage]::Load($filePath)

# Get the first worksheet from the Excel package
$worksheet = $excelPackage.Workbook.Worksheets[0]  # Change the index if needed

# Define a function to check cell color
function IsYellow($cell) {
    return ($cell.Style.Fill.BackgroundColor.Rgb -eq "FFFFFF00")  # Yellow color code
}

# Skip rows with yellow, gray, or blank cells
$rowsToKeep = @()
foreach ($row in $worksheet.Cells.Rows) {
    $skipRow = $false
    foreach ($cell in $row.Cells) {
        if (IsYellow($cell) -or $cell.Style.Fill.BackgroundColor.Rgb -eq "FFA9A9A9" -or -not $cell.Value) {
            $skipRow = $true
            break
        }
    }
    if (-not $skipRow) {
        $rowsToKeep += $row
    }
}
2

There are 2 best solutions below

0
mklement0 On
  • The solution below:

    • addresses your question as asked, based on use of the current version (v7) of the EPPlus .NET library alone.

      • Note that EPPlus versions v5 and above require a paid license for commercial use.

      • Direct use of this library is not very PowerShell-friendly; the ImportExcel PowerShell module - which actually builds on EPPlus - albeit on a locked-in v4.x version due to the v5+ license requirements - is usually the better choice.

        • The lack of PowerShell-friendliness starts with PowerShell's lack of proper support for NuGet packages, requiring extra effort to download and reference the EPPlus NuGet package - see this answer for background information.
    • It:

      • collects the (data) rows of interest as [OfficeOpenXml.ExcelRangeRow] instances in an output array for further processing; the cells of each such row can be enumerated via the .Range property, and each enumerated cell has a .Value property.

      • does not create PowerShell friendly [pscustomobject] instances, with properties named for the header cells, the way that Import-Excel from the ImportExcel does.

      • does not attempt to update the spreadsheet itself, i.e. it doesn't remove the rows you want to skip (omit).

      • The other answer shows an ImportExcel-based solution that performs these tasks.

# Specify the license type (required in v5+ of EPPlus)
$env:EPPlusLicenseContext = 'NonCommercial'

# Load the EPPlus assembly.
Add-Type -LiteralPath "/home/lokesh/EPPlus.dll"  # Replace with the path to your EPPlus.dll file

# The file path of the workbook to open.
$filePath = "/home/lokesh/NSG_Rules.xlsx"

# Create an Excel package object and load the workbook file.
# Convert-Path ensures that a *full path* is passed, which is required in
# .NET calls, given that .NET's working dir. usually differs from PowerShell's.
$excelPackage = 
  [OfficeOpenXml.ExcelPackage]::new((Convert-Path -LiteralPath $filePath))

# Get the first worksheet from the Excel package
$worksheet = $excelPackage.Workbook.Worksheets[0]

# Iterate over all rows that have cells with content and/or formatting
# and collect the data rows of interest.
$headerProcessed = $false
[array] $rowsOfInterest = 
  foreach ($row in $worksheet.Rows) { # process each row
    $rowCellCount = 0; $skipRow = $false
    foreach ($cell in $row.Range) { # process each cell in a row
      ++$rowCellCount
      if (-not $headerProcessed) { continue } # header row: just count the cells.
      # data row: check for a background color of interest. 
      $skipRow = $cell.Style.Fill.BackgroundColor.Rgb -in 'FFFFFF00', 'FFA9A9A9'  
      if ($skipRow) { break }
    }
    if (-not $headerProcessed) { 
      # header row: Save the count of cells in the header row (assumed to be the first one),
      # i.e. the count of columns
      $headerRowCellCount = $rowCellCount
      $headerProcessed = $true
      continue
    } 
    # data row: Also check if not all cells in the rows had values.
    $skipRow = $skipRow -or $rowCellCount -lt $headerRowCellCount
    if (-not $skipRow) {
      $row # output
    }
  }

# $rowsOfInterest is now an array of [OfficeOpenXml.ExcelRangeRow] instances
# containing only those data rows that neither have empty cells nor
# cells with the specified background colors.

# Sample output: Print the address and value of the cells in these rows.
$rowsOfInterest.Range | Select-Object Address, Value
0
mklement0 On

As an alternative to the EPPlus-only solution, this is an ImportExcel-based solution, which improves on the former in the following ways:

  • Since ImportExcel wraps the EPPlus library and is distributed as a PowerShell module, installation is as simple as Install-Module ImportExcel -Scope CurrentUser, for instance.

    • Note: As discussed in the other answer, ImportExcel wraps an old version of the EPPlus library for licensing reasons. This actually makes the code that analyzes the spreadsheet more cumbersome, because the spreadsheet objects don't have a .Rows property for row-by-row processing and the .Cells property (cell-by-cell processing) must be used instead.
  • It modifies the spreadsheet to delete the unwanted rows.

    • Note: To be safe, the modifications aren't saved in the code below, but you can do so by removing the -NoSafe switch from the Close-ExcelPackage call.
  • It then uses Import-Excel to parse the remaining rows - i.e. only those of interest - into PowerShell-friendly [pscustomobject] instances reflecting the spreadsheet rows in an OO fashion, with properties named for the header cells.

$path = '/home/lokesh/NSG_Rules.xlsx'

# Open the Excel package (workbook) and obtain a reference to its
# first worksheet.
$pkg = Open-ExcelPackage $path
$ws = $pkg.Workbook.Worksheets[1]

$rowsToDeleteIndices = [System.Collections.Generic.List[int]] @()

$startRow = 0; $headerColCount = $colCount = 0; $prevRow = 0; $prevSkippedRow = 0
foreach ($cell in $ws.Cells) { # enumerate all cells with values and/or formatting
  $currRow = $cell.Start.Row
  # Header row: Only count the number of cells.
  if ($startRow -eq 0) { $startRow = $prevRow = $currRow }
  if ($currRow -eq $startRow) { ++$headerColCount; continue }
  # Data row:
  if ($currRow -ne $prevRow) { # new data row reached
    # If the previous row wasn't the header row and didn't have all cells filled, skipt it too
    # (unless already skipped)
    if ($prevRow -ne $startRow -and $colCount -ne $headerColCount -and $prevRow -ne $prevSkippedRow) { 
      $rowsToDeleteIndices.Add($prevRow)
      $prevSkippedRow = $prevRow
    }
    $colCount = 1
    $prevRow = $currRow
  } else {
    ++$colCount
  }
  # If the current row has already been skipped, ignore its remaining cells.
  if ($currRow -eq $prevSkippedRow) { continue }
  # Skip any row that has a cell with the specified background colors.
  if ($cell.Style.Fill.BackgroundColor.Rgb -in 'FFFFFF00', 'FFA9A9A9') {
    $rowsToDeleteIndices.Add($currRow)
    $prevSkippedRow = $currRow
  }
}
# If the *last* row was incomplete, skip it too.
if ($currRow -ne $prevSkippedRow -and $colCount -ne $headerColCount) {
  $rowsToDeleteIndices.Add($currRow)
}

# Now delete the unwanted rows from the spreadsheet.
$i = 0
$rowsToDeleteIndices | ForEach-Object { $ws.DeleteRow($_ + $i--) }

# Now use Import-Excel to parse the remaining rows - i.e. only the
# ones of interest - as PowerShell-friendly [pscsutomobject] instances.
# (The first worksheet is targeted by default.)
$rowsOfInterest = 
  Import-Excel -ExcelPackage $pkg

# Close the package.
# Note:
#  To also save the modified workbook, omit -NoSave
Close-ExcelPackage $pkg -NoSave

# Display the results.
$rowsOfInterest | Format-Table