Collections and the Powershell Pipeline

130 Views Asked by At

I have a regular requirement to remove a large number of small files (sometimes >100,000) from a server. These files contained monitoring data from remote sensors and are generated on different schedules from different devices. Unfortunately, I can't optimise the input.

[Edit] Updated the code to the version that originally sparked the question. I had posted a later version that had similar problems.

I can do something like

$filePath = '\my\path'
$CutoffDate = (Get-Date).AddDays(-30) # Calculate date thirty days ago. 
Get-ChildItem -File -Path $filePath -Recurse | Where-Object {$_.lastWriteTime -le $CutoffDate} | remove-item 

This works well for small numbers of files, but for the numbers of files I have to work with it can use a huge amount of memory, and can take a long time.

It appears that the Get-ChildItem cmdlet is building the complete collection before submitting it to the pipeline.

I can't filter on date with Get-ChildItem, so every file in the target folders is read, and there can be millions.


Is my assumption correct about the initial collection?

Is there some way to modify the pipeline operation so that each element is submitted to the pipeline as it is found?

Alternatively, is there some way to move the date filtering to Get-ChildItem so that the initial search is reduced in size?

3

There are 3 best solutions below

2
Santiago Squarzon On BEST ANSWER

Get-ChildItem is pretty slow, it's a known issue, if you want to have faster code you need to use the .NET APIs. This code should be pretty fast compared to your current one, and should consume less memory. It is worth noting, this implementation will not exclude hidden files and folders, if you need to exclude them a need condition has to be added, please provide feedback in that case and I'll update my answer (essentially need to check if .Attributes.HasFlag([System.IO.FileAttributes]::Hidden) and then exclude, continue).

$filePath = Get-Item '\my\path'
$CutoffDate = (Get-Date).AddDays(-30) # Calculate date thirty days ago.

$enum = $filePath.EnumerateFiles('*', [System.IO.SearchOption]::AllDirectories).
    GetEnumerator()

while ($true) {
    try {
        if (-not $enum.MoveNext()) {
            break
        }
    }
    catch {
        # ignore inaccessible folders, go next
        continue
    }

    if ($enum.Current.LastWriteTime -le $CutoffDate) {
        try {
            $enum.Current.Delete()
        }
        catch {
            # you can handle files that couldn't be delete here,
            # possible permission issue, otherwise leave this empty
            # to ignore any error
        }
    }
}

EDIT: Just noticed the tag, in which case there is a much better and easier approach using the EnumerationOptions Class, this class isn't available in .NET Framework.

$filePath = Get-Item '\my\path'
$CutoffDate = (Get-Date).AddDays(-30) # Calculate date thirty days ago.

$options = [System.IO.EnumerationOptions]@{
    IgnoreInaccessible    = $true
    RecurseSubdirectories = $true
    AttributesToSkip      = [System.IO.FileAttributes]::Hidden # Remove this if you want to delete hidden files
}
foreach ($file in $filePath.EnumerateFiles('*', $options)) {
    if ($file.LastWriteTime -le $CutoffDate) {
        $file.Delete()
    }
}
1
TheMadTechnician On

If you are confident that you won't get any errors trying to enumerate files and folders you could probably speed this up greatly by building a custom type that will crawl the directory for files, pass the full file path down the pipeline, and build an object with just the Path and LastWriteTime for each file. In my testing it was able to run through about 75k files in 8 or 9 seconds, which was about half of what Get-ChildItem took to do the same.

Add-Type -TypeDefinition @"
  public class QuickDir
  {
    public static System.Collections.ArrayList ListDir(string dir)
    { 
      System.Collections.ArrayList ret = new System.Collections.ArrayList();
      ListDir(dir, ret);
      return ret;
    }
    public static void ListDir(string dir, System.Collections.ArrayList list)
    {
      foreach(string file in System.IO.Directory.GetFiles(dir))
        list.Add(file);
      string[] subdirs = System.IO.Directory.GetDirectories(dir);
//      foreach(string d in subdirs)
//        list.Add(d);
      foreach(string d in subdirs)
        ListDir(d, list);
    }
  }
"@

$AllFiles = [QuickDir]::ListDir($Path) | Select-Object @{l='Path';e={$_}},@{l='LastWriteTime';e={[System.IO.File]::GetLastWriteTime($_)}}

You should be able to modify that to filter for dates simply enough, and pipe to Remove-Item

1
Tangentially Perpendicular On

After some investigation it transpires that Get-ChildItem wasn't, itself the root of the problem.

My original monolithic approach created a huge collection of file objects that forced the server to start using a pagefile instead of keeping everything in memory. This slowed the whole process dramatically.

The solution I finally adopted was to use Get-ChildItem to recurse down the folder tree looking just for Directories. This returned a collection of around 4000 objects.

Then, iterate through this collection, using Get-ChildItem to retrieve files. There's no need to recurse since I have all the directories anyway.

By splitting the search this way the number of files is reduced to a few hundred (maximum of around 2000) at each iteration which could be filtered, deleted and the collection discarded. This eliminated the Page File requirement.

This change reduced execution time from about 11 hours to around 40 minutes - quite fast enough for the purpose.

Here's the code

$delPath = '\my\path'
$folderList = get-ChildItem -Directory -Path $delPath -Recurse
        $fileCount = 0
        $folderList | Foreach-Object -Process {
            # Read all the files from the current folder, and filter the result
            $staleFiles = get-ChildItem -File -Path $_  | where-object {$_.LastWriteTime -le $CutoffDate} 
            if ($staleFiles){
                $fileCount += $staleFiles.length
                $staleFiles | remove-item -WhatIf:$wotif
            }
        }