Using Bzip2 Compression with Cmd.exe and PowerShell
This useful free utility can help you transparently use compressed data at the command line or in scripts
February 5, 2008
Executive Summary:
Bzip2 is a free command-line tool for compressing and decompressing data that you can also use within scripts. You can write bzip2-compressed data to a new file or append the data to an existing file. If you want to output or append bzip2 data from within Windows PowerShell, you'll have to use a batch file to spawn cmd.exe from within PowerShell. |
Although the graphical Windows shell has built-in support for compressing data into .zip files and then extracting it, Windows shell doesn't include tools that you can use from a command prompt or in a batch file. One of the handier free tools you can find for compressing data is the open-source bzip2 console application. However, if you're used to graphical tools such as WinZip, bzip2 isn't an intuitive tool. So, after telling you how to get bzip2, I'll demonstrate how to use it from scripts and at the command prompt. I'll also examine the tradeoffs in using bzip2. Although it offers good compression, it can be very CPU-intensive.
Getting Bzip2
You can download bzip2 from the bzip2 Web site's download page (http://www.bzip.org/downloads.html). The downloaded file is the actual executable, with a name in the form bzip2-version-target cpu-target platform.exe. I'm using version 1.0.4, compiled for x86 processors and 32-bit Windows OSs; the name of the file therefore is bzip2-104-x86-win32.exe. Be sure to download the current version, then rename it to bzip2.exe.
To further simplify using bzip2, you might want to do one more thing: Create copies of bzip2.exe with the names bunzip2.exe and bzcat.exe. From a command prompt in the same directory as bzip2.exe, just enter the following cmd.exe commands to make the copies:
copy bzip2.exe bunzip2.execopy bzip2.exe bzcat.exe
The renamed copies are useful because bzip2 is actually three different tools: a file compressor (bzip2.exe), a file decompressor (bunzip2.exe), and a tool for reading compressed data and decompressing it directly to a console window (bzcat.exe). Bzip2 is aware of the names it has and automatically switches to the appropriate mode when invoked as bunzip2 or bzcat. In the rest of my explanations, I'll assume you’ve created the duplicate files.
Bzip2 is designed to work in a command prompt window, and it does so very well. Any tool that reads or writes console streams should work with bzip2. For simplicity, I'll put bzip2 through its paces using standard cmd.exe commands, but you can substitute any tool that uses console input and output.
Compressing Data
When migrating users to new PCs, I usually make a catalog of the files present in key locations on their computers. One of the places I check is the local Documents and Settings folder. Typically, I make a generic listing of the complete path to each file, using a command such as
dir /s /b "C:Documents and Settings" > userfiles.txt
The resulting file is usually very large, so there might be some benefit to compressing it. Although you could use a standard GUI zipping tool for occasionally compressing result data, it's easier to compress the data as you collect it—which bzip2 lets you do. Within the cmd.exe environment, you can pipe the data into bzip2, then redirect bzip2's output to a file:
dir /s /b "C:Documents and Settings" | bzip2 > userfiles.txt.bz2
Note that I've added .bz2, the standard extension for bzip2-compressed files, to the target file.
Appending Compressed Data
If you want to add more information to the output file, bzip2's design lets you append data using file redirection. Bzip2 writes data as blocks with clearly defined headers and footers. This method means that if you want to add more compressed data to a file, you can simply append it using standard console redirection; you won't mangle the .bz2 file in the process.
For example, if you want to add a listing of the Program Files directory to the existing userfiles.txt file, you use the >> redirection symbol within cmd.exe to append the new data:
dir /s /b "C:Program Files" >> userfiles.txt
The same procedure works with bzip2:
dir /s /b "C:Program Files" | bzip2 >> userfiles.txt.bz2
Reading Bzip2-Compressed Data
When using a console tool, you usually use the Type command to get data from a text file into the console window:
type userfiles.txt
For compressed files, the Bzcat command is analogous to the Type command. The Bzcat command decompresses and sends the content to the console window. You use a command such as
bzcat userfiles.txt.bz2
Because the output in this example is plain text, text-manipulation tools will work with bzcat output. If you want to page through the output, you can use the More command:
bzcat userfiles.txt.bz2 | more
When you've found what you're after, you can exit More by using Ctrl+c; bzcat terminates as well.
If you want to search the listing for RDP files used by Terminal Services, you can use the Find command:
bzcat userfiles.txt.bz2 | find /i ".rdp"
This method works the same for any similar tools. You can even create an uncompressed copy of the data by redirecting the output to a file:
bzcat userfiles.txt.bz2 > userfiles.txt
Decompressing and Compressing Files
There might be situations where you want to convert the data to decompressed form, or compress pre-existing uncompressed files. You can decompress a bzip2 file by using bunzip2 with the compressed file provided as an argument:
bunzip2 userfiles.txt.bz2
This command creates the file userfiles.txt and deletes the file userfiles.txt.bz2. Bunzip2 essentially toggles the state of the file from compressed to decompressed, which can help you avoid confusion about which file is the original data source. You can modify the file or use it as necessary, then recompress it using
bzip2 userfiles.txt
which creates the compressed file userfiles.txt.bz2 and deletes userfiles.txt. If you want to hang onto the file when compressing or decompressing, use the -k (for keep) option:
bunzip2 -k userfiles.txt.bz2
or
bzip2 -k userfiles.txt
PowerShell and Bzip2: Output and Append
As I mentioned earlier, the simple output and append methods for bzip2 work from cmd.exe. You can't use the same method from within Windows PowerShell. PowerShell's > and >> redirection operators don't work for binary data. In fact, it's safest to think of PowerShell's > and >> operators as screen dump facilities because of how they handle data.
You can still use bzip2 with PowerShell, however. The trick is to use cmd.exe within PowerShell, which you can do transparently. I have two standard batch files I use for PowerShell compatibility: Write-Bzip2.cmd and Add-Bzip2.cmd.
Write-Bzip2 is one line of code that directly overwrites or creates the specified file from within cmd.exe, avoiding the PowerShell pipeline:
@bzip2 > %1
Because this code is a .cmd file, the cmd.exe shell automatically spawns and runs the script. PowerShell pipes the raw text data to the script, and bzip2 reads it. Then, within cmd.exe, the output is redirected to the file you specified as an argument of Write-Bzip2.
The Add-Bzip2 batch file works similarly but appends instead of overwrites. Again, it's just one line of code:
@bzip2 >> %1
So, you can use this command from within PowerShell to write a new archive:
Get-ChildItem "C:Documents and Settings" -Name –Recurse | Write-Bzip2 userfiles.txt.bz2
The following command appends data to an existing archive:
Get-ChildItem "C:Program Files" -Name -Recurse | Add-Bzip2 userfiles.txt.bz2
PowerShell works well with bzcat for reading files. You can also use bzip2 and bunzip2 with PowerShell for compressing and decompressing files without a problem.
Tradeoffs of Using Bzip2
You should have a good idea of how to use bzip2 now, but the real question is whether it makes sense for you to use it. Let's look at some of the characteristics bzip2 has as a tool from the perspective of IT support work.
If you want transparent compression for scripts, bzip2 is just about as simple a solution as possible. Pre-existing scripts or tools don't need to know anything about compression or decompression; they just use the data. If a tool you need to use doesn't work with command-line input or output, you can still use bzip2 separately to decompress input files before using the tool or compress output files after running the tool. If high compression is your goal, bzip2 also comes out ahead of most commonly available tools. Although there are some predictive compression schemes with 10 to 15 percent higher compression than bzip2, bzip2 provides much higher compression than alternatives such as WinZip, gzip, and Info-ZIP.
Performance is probably the weakest point of bzip2. Although it doesn't hog memory, it is CPU-intensive. Scripts that do very little processing work but are reading or writing large amounts of data will most likely produce the biggest performance hit. In general, if you're running a script locally on an overloaded server and speed of completion of the script is a major factor, bzip2 probably isn't your best choice. In such a case, it might be better to work with decompressed local data instead. This problem isn't unique to bzip2, though—all compression tools take extra processing power to work—but bzip2 is generally the most CPU-intensive.
Bzip2 is available for free under the GNU General Public License (GPL), so there aren't licensing restrictions on reuse and redistribution. It also runs on every Windows OS from Windows 95 forward as well as most Unix-like OSs.
You might prefer to have a graphical interface when working with .bz2 files. Although bzip2 doesn't have a GUI, there are some GUI archiving tools which can handle bzip2 compression and decompression. The most notable is the free 7-Zip (http://www.7-zip.org).
Despite the performance limitations of bzip2, it can make an excellent addition to any administrative toolkit. The method I used for making bzip2 compression work from within PowerShell is also a useful concept to take with you. When you need to use a command-line application within PowerShell and PowerShell mangles the application's output stream, you can always use a batch file wrapper to redirect the output to a file from within cmd.exe, just as I've shown you with bzip2.
About the Author
You May Also Like