Importing and Exporting CSV and XML Files in PowerShellImporting and Exporting CSV and XML Files in PowerShell
Manually parsing CSV and XML files is a thing of the past
August 19, 2013
The classic Cmd.exe shell in Windows OSs provides a very simple means of processing text. For example, the For /f command lets you read lines of text from a file and process them one line at a time, and the > operator lets you write a command's output to a text file.
However, simple line-by-line text parsing starts to break down when you need to process structured data. For example, comma-separated value (CSV) files are an extremely common data exchange format. I've lost count of how many times I've seen the following question in online forums: How can I read input from a CSV file using a batch file (i.e., a Cmd.exe shell script)? It's possible, but you're forced to do all the parsing manually and it's fraught with problems. For example, if an input string contains special characters such as < or >, the parsing will fail. Parsing XML files using Cmd.exe is even more difficult, if not downright impossible.
Using PowerShell for Importing and Exporting Structured Data
Windows PowerShell overcomes the aforementioned difficulties by providing a set of cmdlets for importing and exporting structured data. All these cmdlets have either CSV or XML in their names, so you can get a list of them by entering the following command at a PowerShell prompt:
Get-Command | Where-Object { ($_.Name -like "*csv*") -or ($_.Name -like "*xml*") } | Select-Object Name
When you run this command, you'll see a list of all the cmdlets in your PowerShell session that contain CSV or XML in their names or their aliases. I'm going to focus on the cmdlets containing the verbs Export and Import so that you can get a grip on the basics.
Importing CSV Files
As I noted previously, CSV is an extremely common format for exchanging data. A CSV file is a plain-text file that represents a table of data. Each line of the file represents one record (row) of data. The first line of the file usually (but not always) specifies the names the fields (columns). Data items within each row are separated by a delimiter character. A comma is commonly used as a delimiter (especially when dealing with textual data), so the data items in a CSV file are usually enclosed within double quotes (") or some other quoting character. Table 1 shows a sample data table.
DisplayName |
---|
Table 1: Sample Data Table |
Garvin, Fred |
Flynn, Phineas |
Bates, Gil |
Figure 1 shows how this data would be represented in a CSV file.
Figure 1: Sample.csv
The Import-Csv cmdlet reads a CSV file and outputs a list of custom PowerShell objects, one for each row of the input data. PowerShell uses the first row of the CSV file as the objects' properties, and the subsequent lines of the file are the output objects. For example, if you run the command
Import-Csv Sample.csv
PowerShell will output three objects with two properties each: DisplayName and Mail, as shown in Figure 2.
Figure 2: Output from Using Import-Csv to Read a CSV File
If the CSV file you want to import doesn't have a header row, you can use the -Header parameter to name the object properties. That is, if Sample1.csv was missing the first line (the header row), you'd use a command like this instead:
Import-Csv Sample.csv -Header DisplayName,EmailAddress
Import-Csv uses the comma character as its default delimiter, but you can use the -Delimiter parameter to specify a different delimiter character. For example, if Sample.csv used a tab character as the delimiter, you'd use this command:
Import-Csv Sample.csv -Delimiter "`t"
Because Import-Csv outputs PowerShell objects, you can then use other PowerShell cmdlets to process the objects. For example, suppose you want to sort the output by DisplayName, but you only want to output the Mail property from each object. To do this, you can use the Sort-Object and Select-Object cmdlets:
Import-Csv Sample.csv | Sort-Object DisplayName | Select-Object Mail
You can also pass these objects along to the ForEach-Object cmdlet for processing:
Import-Csv Sample.Csv | ForEach-Object { '"{0}" <{1}>' -f $_.DisplayName,$_.Mail}
This command uses the -f operator to output a formatted string for each object and produces the output shown in Figure 3. If you're unfamiliar with how to use the ForEach-Object, Sort-Object, and Select-Object cmdlets, see "PowerShell Basics: Filtering Objects" and "PowerShell Basics: Select-Object."
Figure 3: Output from Using Import-Csv and ForEach-Object to Read and Process a CSV File
Exporting CSV Files
Sometimes you need to create a CSV file from PowerShell output objects. To do so, you pipe PowerShell's output to the Export-Csv cmdlet and specify a filename. PowerShell will then write the output objects to a CSV file. It really is that simple, with one minor caveat. By default, Export-Csv writes a line starting with the string #TYPE as the first line of the CSV file. Export-Csv's -NoTypeInformation parameter omits this extra line of output, so I usually include this parameter.
For example, suppose you want to create a copy of Sample.csv sorted by the DisplayName property. All you need to do is import the file, pipe its contents to the Sort-Object cmdlet, then export the contents to a new CSV file:
Import-Csv Sample.csv | Sort-Object DisplayName | Export-Csv Sample-Sorted.csv -NoTypeInformation
Note that Export-Csv can output any PowerShell objects, not just objects produced from using Import-Csv. For example, consider the command:
Get-ChildItem | Sort-Object Length | Select-Object FullName,LastWriteTime,Length | Export-Csv Data.csv -NoTypeInformation
This command creates a CSV file containing the files in the current directory, sorted by file size. Note that this command uses the Select-Object cmdlet to select each file's full filename, last write time, and file size (length), so these three properties will be the columns in the CSV file.
Importing XML Files
XML is another type of text file that stores structured data. Listing 1 shows an XML representation of the data in Table 1.
Garvin, Fred [email protected] Flynn, Phineas [email protected] Bates, Gil [email protected]
The data in an XML document is arranged hierarchically. In Sample.xml (Listing 1), you have a root element () and three child elements of the root element (). The elements are in pairs, and elements can contain other elements. An opening element uses angle brackets around its name (e.g., ), and its closing element uses a forward slash before the element's name (e.g., ). When working with XML data in PowerShell, you must have a single root element. The other elements are contained within the root element.
PowerShell has an Import-Clixml cmdlet, but Import-Clixml can't import Sample.xml because Sample.xml isn't in the exact format needed by the cmdlet. (I'll discuss this more in the next section.) Instead, you can use the Get-Content cmdlet and the [Xml] type accelerator:
$Data = [Xml] (Get-Content Sample.xml)
After entering this command, the $Data variable contains an XmlDocument object. The XmlDocument object contains two properties: xml (the element at the top of the file) and database (the root element). You can output the data from the XML file as follows:
$Data.database.record
This command produces the exact same output as Figure 2—that is, it outputs three objects with two properties each (DisplayName and Mail).
If the XML data file you want to import was saved by the Export-Clixml cmdlet, you don't need the [Xml] type accelerator and Get-Content cmdlet. Instead, you can use the Import-Clixml cmdlet, as discussed in the next section.
Exporting XML Files
You can export an XmlDocument object to a file by using the Export-Clixml cmdlet. Just like Export-Csv, Export-Clixml requires a filename. Consider the following commands:
$Data = [Xml] (Get-Content Sample.xml)$Data | Export-Clixml Data.xml
The first command imports Sample.xml (Listing 1) as an XmlDocument object. The second command exports the XmlDocument object to Data.xml.
Import-Clixml is the inverse of Export-Clixml. Import-Clixml retrieves an XML file exported by Export-Clixml as an XmlDocument object. For example, in the following command, Import-Clixml is retrieving Data.xml:
$Data2 = Import-Clixml Data.xml
After you run this command, the $Data2 variable contains a duplicate of the same XmlDocument object stored in $Data.
Keep in mind that you can only use the Import-Clixml command to import an XML file created by Export-Clixml—that is, the XML file must contain a specific set of elements in order for Import-Clixml to import it. If the XML file isn't in the specific format required, you need to use the [Xml] type accelerator and Get-Content, as discussed in the previous section.
Take Control of CSV and XML Files
CSV and XML text files are both extremely common data interchange formats. PowerShell's designers provided some exceptionally powerful and easy-to-use cmdlets to help you import and export both of these formats. Thankfully, manually parsing CSV and XML files is now a thing of the past.
About the Author
You May Also Like