I'm curious to test out the performance/usefulness of asynchronous tasks in PowerShell with Start-ThreadJob, Start-Job and Start-Process. I have a folder with about 100 zip files and so came up with the following test:
New-Item "000" -ItemType Directory -Force # Move the old zip files in here
foreach ($i in $zipfiles) {
$name = $i -split ".zip"
Start-Job -scriptblock {
7z.exe x -o"$name" .\$name
Move-Item $i 000\ -Force
7z.exe a $i .\$name\*.*
}
}
The problem with this is that it would start jobs for all 100 zip, which would probably be too much, so I want to set a value $numjobs, say 5, which I can change, such that only $numjobs will be started at the same time, and then the script will check for all 5 of the jobs ending before the next block of 5 will start. I'd like to then watch the CPU and memory depending upon the value of $numjobs
How would I tell a loop only to run 5 times, then wait for the Jobs to finish before continuing?
I see that it's easy to wait for jobs to finish
$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange
but how might I wait for Start-Process tasks to end?
Although I would like to use Parallel-ForEach, the Enterprises that I work in will be solidly tied to PowerShell 5.1 for the next 3-4 years I expect with no chance to install PowerShell 7.x (although I would be curious for myself to test with Parallel-ForEach on my home system to compare all approaches).
ForEach-Object -ParallelandStart-ThreadJobhave built-in functionalities to limit the number of threads that can run at the same time, the same applies for Runspace with their RunspacePool which is what is used behind the scenes by both cmdlets.Start-Jobdoes not offer such functionality because each Job runs in a separate process as opposed to the cmdlets mentioned before which run in different threads all in the same process. I would also personally not consider it as a parallelism alternative, it is pretty slow and in most cases a linear loop will be faster than it. Serialization and deserialization can be a problem in some cases too.How to limit the number of running threads?
Both cmdlets offer the
-ThrottleLimitparameter for this.How would the code look?
How to achieve the same having only PowerShell 5.1 available and no ability to install new modules?
The RunspacePool offer this same functionality, either with it's
.SetMaxRunspaces(Int32)Method or by targeting one of theRunspaceFactory.CreateRunspacePooloverloads offering amaxRunspaceslimit as argument.How would the code look?
Note that for all examples, it's unclear if the 7zip code is correct or not, this answer attempts to demonstrate how async is done in PowerShell not how to zip files / folders.
Below is a helper function that can simplify the process of parallel invocations, tries to emulate
ForEach-Object -Paralleland is compatible with PowerShell 5.1, though shouldn't be taken as a robust solution:NOTE This Q&A offers a much better and robust alternative to below function.
An example of how it works: