Presenting the PowerShell Pipeline
Little makes sense in PowerShell without an understanding of the pipeline.
July 23, 2015
The pipeline is a core PowerShell concept. Little makes sense in PowerShell without an understanding of the pipeline. Unix shells initiated the pipeline concept, Cmd.exe copied it, and PowerShell takes it to the next level. Before we get into the PowerShell pipeline, I need to provide some background on standard input and output.
Standard Input and Output
In Cmd.exe, the pipeline is closely related to standard input, input redirection, standard output, and output redirection. Briefly: Standard input is input you enter at the keyboard unless you tell the shell to read the input from somewhere else (i.e., input redirection). Standard output is the normal output that commands display on the screen unless you tell the shell to store the output somewhere else (i.e., output redirection).
To see how standard input works, enter the following command at a PowerShell or Cmd.exe prompt:
sort.exe
When you do this, the cursor will sit and wait for you to enter something. This is because, by default, Sort.exe sorts standard input. We didn’t provide any input, so it will wait for us to enter something. (Press Ctrl+C to cancel.)
Now, suppose you have a file called MyData.txt that you want to sort. Here is how you would display the sorted output of the file on the screen (standard output):
type MyData.txt | sort.exe
In this example, the Type command sends the content of the file MyData.txt to standard output. The pipe (|) takes this output and uses it as input to the Sort.exe program.
The core concept is this: When you use the pipe (|) character in a command, you are creating a pipeline. In a pipeline, the shell uses the output of the command on the pipe’s left-hand side as the input of the command on the right-hand side.
In most command shells (such as Cmd.exe), standard output and standard input are text only. This can make many kinds of data manipulation and extraction an awkward exercise. Figure 1 shows a simple example of the contortions that Cmd.exe makes us go through to list text files last written in the current year.
Figure 1 - Cmd.exe shell scripts to list text files created in the current year
The Sample1.cmd script outputs the time for each file, followed by a hard tab character, followed by the file name. Sample2.cmd gets the current year and runs Sample1.cmd, only outputting files where each file’s year matches. (A red arrow marks the hard tab character in both scripts.) Finally, Figure 1 shows the output of Sample2.cmd (only File1.txt and File3.txt).
Notice that both scripts are forced to use string parsing that depends on the format of the date string (%%~tF in Sample1.cmd and %DATE% in Sample2.cmd). On non-US English versions of Windows, the lines of code that use these date strings will have to be updated, because different locales use different date formats. In addition, the arcane nature of the Cmd.exe script syntax makes readability and maintainability of these scripts a challenge (what does %DATE:~10,4% mean, anyway?).
The point of this example is to show that a seemingly simple example (list files created in the current year) is clumsy and awkward to do in a batch file, and a big part of the problem is that we’re forced to parse strings to determine the year. In addition, the year parsing depends on the locale, which may be a significant problem for some environments that share scripts. Also notice that if the requirements (for example, delete files last written before this year), the scripts are going to get even more complicated and even harder to read. There has to be a better way! Let’s see how PowerShell solves these kinds of problems.