Working with Functions in Windows PowerShell

  • 11/11/2015

Understanding filters

A filter is a special-purpose function. It is used to operate on each object in a pipeline and is often used to reduce the number of objects that are passed along the pipeline. Typically, a filter does not use the Begin or the End parameters that a function might need to use. So a filter is often thought of as a function that only has a Process block. Many functions are written without using the Begin or End parameters, but filters are never written in such a way that they use the Begin or the End parameters. The biggest difference between a function and a filter is a bit subtler, however. When a function is used inside a pipeline, it actually halts the processing of the pipeline until the first element in the pipeline has run to completion. The function then accepts the input from the first element in the pipeline and begins its processing. When the processing in the function is completed, it then passes the results along to the next element in the script block. A function runs once for the pipelined data. A filter, on the other hand, runs once for each piece of data passed over the pipeline. In short, a filter will stream the data when in a pipeline, and a function will not. This can make a big difference in the performance. To illustrate this point, let’s examine a function and a filter that accomplish the same things.

In the MeasureAddOneFilter.ps1 script, which follows, an array of 50,000 elements is created by using the 1..50000 syntax. (In Windows PowerShell 1.0, 50,000 was the maximum size of an array created in this manner. In Windows PowerShell 5.0, this ceiling has a maximum size of an [Int32] (2,146,483,647). The use of this size is dependent upon memory. This is shown here.

PS C:\> 1..[Int32]::MaxValue
Array dimensions exceeded supported range.
At line:1 char:1
+ 1..[Int32]::MaxValue
+ ~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], OutOfMemoryException
    + FullyQualifiedErrorId : System.OutOfMemoryException

The array is then pipelined into the AddOne filter. The filter prints out the string add one filter and then adds the number 1 to the current number on the pipeline. The length of time it takes to run the command is then displayed. On my computer, it takes about 2.6 seconds to run the MeasureAddOneFilter.ps1 script.

MeasureAddOneFilter.ps1

Filter AddOne
{
 "add one filter"
  $_ + 1
}

Measure-Command { 1..50000 | addOne }

The function version is shown following. In a similar fashion to the MeasureAddOneFilter.ps1 script, it creates an array of 50,000 numbers and pipelines the results to the AddOne function. The string Add One Function is displayed. An automatic variable is created when pipelining input to a function. It is called $input. The $input variable is an enumerator, not just a plain array. It has a moveNext method, which can be used to move to the next item in the collection. Because $input is not a plain array, you cannot index directly into it—$input[0] would fail. To retrieve a specific element, you use the $input.current property. When I run the following script, it takes 4.3 seconds on my computer (that is almost twice as long as the filter).

MeasureAddOneFunction.ps1

Function AddOne
{
  "Add One Function"
  While ($input.moveNext())
   {
     $input.current + 1
   }
}

Measure-Command { 1..50000 | addOne }

What was happening that made the filter so much faster than the function in this example? The filter runs once for each item on the pipeline. This is shown here.

add one filter
2
add one filter
3
add one filter
4
add one filter
5
add one filter
6

The DemoAddOneFilter.ps1 script is shown here.

DemoAddOneFilter.ps1

Filter AddOne
{
 "add one filter"
  $_ + 1
}

1..5 | addOne

The AddOne function runs to completion once for all the items in the pipeline. This effectively stops the processing in the middle of the pipeline until all the elements of the array are created. Then all the data is passed to the function via the $input variable at one time. This type of approach does not take advantage of the streaming nature of the pipeline, which in many instances is more memory-efficient.

Add One Function
2
3
4
5
6

The DemoAddOneFunction.ps1 script is shown here.

DemoAddOneFunction.ps1

Function AddOne
{
  "Add One Function"
  While ($input.moveNext())
   {
     $input.current + 1
   }
}

1..5 | addOne

To close this performance issue between functions and filters when used in a pipeline, you can write your function so that it behaves like a filter. To do this, you must explicitly call out the Process block. When you use the Process block, you are also able to use the $_ automatic variable instead of being restricted to using $input. When you do this, the script will look like DemoAddOneR2Function.ps1, the results of which are shown here.

add one function r2
2
add one function r2
3
add one function r2
4
add one function r2
5
add one function r2
6

The complete DemoAddOneR2Function.ps1 script is shown here.

DemoAddOneR2Function.ps1

Function AddOneR2
{
   Process {
   "add one function r2"
   $_ + 1
  }
} #end AddOneR2

1..5 | addOneR2

What does using an explicit Process block do to the performance? When run on my computer, the function takes about 2.6 seconds, which is virtually the same amount of time taken by the filter. The MeasureAddOneR2Function.ps1 script is shown here.

MeasureAddOneR2Function.ps1

Function AddOneR2
{
   Process {
   "add one function r2"
   $_ + 1
  }
} #end AddOneR2

Measure-Command {1..50000 | addOneR2 }

Another reason for using filters is that they visually stand out, and therefore improve readability of the script. The typical pattern for a filter is shown here.

Filter FilterName
{
 #insert code here
}

The HasMessage filter, found in the FilterHasMessage.ps1 script, begins with the Filter keyword, and is followed by the name of the filter, which is HasMessage. Inside the script block (the braces), the $_ automatic variable is used to provide access to the pipeline. It is sent to the Where-Object cmdlet, which performs the filter. In the calling script, the results of the HasMessage filter are sent to the Measure-Object cmdlet, which tells the user how many events in the application log have a message attached to them. The FilterHasMessage.ps1 script is shown here.

FilterHasMessage.ps1

Filter HasMessage
{
 $_ |
 Where-Object { $_.message }
} #end HasMessage

Get-WinEvent -LogName Application | HasMessage | Measure-Object

Although the filter has an implicit Process block, this does not prevent you from using the Begin, Process, and End script blocks explicitly. In the FilterToday.ps1 script, a filter named IsToday is created. To make the filter a stand-alone entity with no external dependencies required (such as the passing of a DateTime object to it), you need the filter to obtain the current date. However, if the call to the Get-Date cmdlet was done inside the Process block, the filter would continue to work, but the call to Get-Date would be made once for each object found in the input folder. So, if there were 25 items in the folder, the Get-Date cmdlet would be called 25 times. When you have something that you want to occur only once in the processing of the filter, you can place it in a Begin block. The Begin block is called only once, whereas the Process block is called once for each item in the pipeline. If you wanted any post-processing to take place (such as printing a message stating how many files were found today), you would place the relevant code in the End block of the filter.

The FilterToday.ps1 script is shown here.

FilterToday.ps1

Filter IsToday
{
 Begin {$dte = (Get-Date).Date}
 Process { $_ |
           Where-Object { $_.LastWriteTime -ge $dte }
         }
}

Get-ChildItem -Path C:\fso | IsToday