Lean, Mean Storage

Scripting data cleanup

Dick Lewis

June 12, 2005

9 Min Read
ITPro Today logo in a gray background | ITPro Today


Although the cost of storage is at an all-time low, stored information incurs other hidden and not-so-hidden costs. An important part of data management is keeping your storage lean and mean—specifically, keeping storage clean of out-of-scope and obsolete files. Files such as employee vacation photos, resumes, stock portfolio records, gaming files, and the entire backup of a user's C drive are likely to be out of scope for server storage. Many companies aren't overly concerned about such files if they're stored on an individual user PC hard drive but don't want server storage space and backup resources used for these nonbusiness files.

I've created three data-cleanup scripts that you can clip to your scripting tool belt. DeleteFilesByExtension.bat deletes files according to their extension, DeleteOlderFiles.bat deletes files that exceed an age threshold, and DeleteLargeFiles.bat deletes files that exceed a size threshold. I'll discuss each script after a brief review of some of the costs you can avoid by keeping your storage space cleaned up.

Obviously, scripts that delete files have the potential to do real damage. For ways to reduce the risk of misusing these and other scripts, see the sidebar "Practice Safe Scripting."

The Cost of Keeping Data
If the My Documents folder on my PC is any indication, you're probably storing source documents, old versions, works in progress, and just plain junk that you have no idea why you keep. Even on my own PC, it takes me a few minutes to sort through this data clutter and find the correct version of a file. If I upload that file to a shared storage area that contains 80 percent junk or out-of-scope files and point another user to the uploaded file, that user can have a difficult time finding the file because of all the extraneous data he or she has to wade through to locate it. Sure, disk space is cheap, but storage clutter causes users to waste time searching for data and exacerbates version confusion.

Virus scanning overhead, the cost of user downtime associated with server or storage migration, and general costs of retaining data (e.g., replacements for failed drives, backup tapes and hardware, floor space, power, climate control for storage resources) all go up as the space used increases. In a disaster recovery situation, the time—and thus cost—of partially or totally recovering a data area is directly related to the amount of data it contains. The more data, the longer recovery will take. Furthermore, users often don't remember the entire name or path of a file they need to recover, and the time you spend looking for the correct file adds to the recovery cost.

Identifying Out-of-Scope Files
In any data structure, you'll have three types of files: clearly out-of-scope files, clearly in-scope files, and files whose status can't be determined without actually opening them. Scripts are most useful for deleting files that are obviously out-of-scope. The characteristics of those files vary depending on your corporate environment.

The biggest explosion of out-of-scope files I've experienced lately are MP3 music files. Some users store personal music files on their work PCs, and despite policies against saving MP3s on servers, some of those files eventually find their way onto servers. If you have a large employee base, I can almost guarantee that you're storing at least some—and possibly many—MP3 files. Besides the obvious concerns about using corporate bandwidth to download music files, MP3s can take up a fair chunk of space. Because my company doesn't use the MP3 format for storing business-related sound files, I target those files for automatic deletion.

Another group of problematic files are executables, which have extensions such as .exe, .com, .bat, and .scr. Although I exercise tight configuration control of all desktop PCs and installing unapproved software is against corporate policy, such programs sometimes make their way onto shared storage areas and need to be deleted. Deleting executable files not only saves space and reduces clutter but also minimizes the possibility that apparently harmless files (e.g., greeting cards, games) that hide harmful components (e.g., viruses) are circulated. I give server and desktop administrators dedicated areas for storing executable files used in administrative tasks, and these areas aren't subject to automatic file deletions.

Before you delete any files, management and users must agree on which file types are indisputably out of scope. And you need to implement a way to allow user feedback in the event that files that were previously out of scope begin to be used for authorized work. For example, let's say you've found several Moving Pictures Experts Group (MPEG) media files on your server that contain commercials and other cute videos that your user community seems to enjoy exchanging. Working with management, you determine that the company uses no known business-related MPEG files, so you implement an automated deletion script that regularly removes such files. A few months later, the manufacturing division begins using MPEG files to document production procedures and make training videos. The last thing you want is to delete business-related files, so regular communication with users about which file types are targeted for automatic deletion is vital.

If you have a small group of users who need to store file types that are deletion targets, you can give that group a dedicated storage area for those files and disable automated deletion for that storage area. If you're concerned about possibly deleting files that someone will need, you can schedule your scripts to run after a regular backup. Although that approach uses backup resources for files that might be out of scope, it eliminates the possibility of deleting hundreds of the CEO's irreplaceable vacation photos and being unable to recover them.

If your shared folders contain files that have unfamiliar extensions, you'll need to determine what file type is associated with those extensions. The following Web sites can help you make that determination:

  • TechTutorials list of common file formats at http://www.techtutorials.com/reference/fileformats.shtml

  • Computer Knowledge's FILExt Web site at http://filext.com

Deleting Out-of-Scope Files
There are several ways to automatically delete files according to their extension. My file deletion scripts use the ForFiles utility, which is available in the resource kits for Windows 2000 and later.

The ForFile utility's basic syntax is

forfiles -p path -s -m -c command

where path is the path used to begin the search and command is the command that ForFiles will execute on each file that meets the search criteria. The -p switch precedes the path, and the -c switch precedes the command. The -s switch tells ForFiles to include subdirectories in the search. The -m switch lets you specify the types of files to search for. This switch's default value of *.* searches for files of all types, but DeleteFilesByExtension.bat targets specific extensions. (Although ForFiles appears on multiple lines here, the entire command must be entered on one line if you run it from a command prompt.)

The ForFiles utility's only limitations are that it runs best on a local machine, and if you use it to target remote shares, it doesn't support Universal Naming Convention (UNC) paths. You can map a drive to use ForFiles against a remote location, but running the tool on the local machine, where the data resides, is better for performance reasons and to avoid running afoul of the UNC path limitation. For more information about ForFiles' switches and its usage possibilities, go to http://www.microsoft.com/technet/prodtechnol/windowsserver2003/library/serverhelp/9660fea1-65c7-48cf-b466-204ba159381e.mspx, or type

ForFiles -?

to view the online Help file. (The common /? syntax for online Help doesn't work with ForFiles.)

Listing 1 shows DeleteFilesByExtension.bat. As you can see, I've put the ForFiles command inside a For command loop so the script can locate and delete multiple file types. Note that you must enclose commands that contain spaces in double quotes. And remember that ForFiles is a resource kit tool, so you might need to specify its path. All the deletion scripts in this article contain a place to configure the path to the ForFiles utility.

Deleting Files by Age
Deleting files according to their age and extension is another way to reduce file clutter. My company has a shared folder area that the user community uses to exchange files. The original idea was that the shared folder would be a collaboration area in which users would place files for other users to pick up to read or modify. Conceptually, the receiving users were to delete the copy on the server share. In reality, they often forgot to delete files, so the collaboration area turned into a gigantic unmanaged public share. After a couple of months, we had to begin deleting files.

In DeleteOlderFiles.bat, which you can see in Listing 2, I again use the ForFiles tool, this time with a date switch that tests for file age. The -d switch lets you specify the age or age range of the files in which you're interested. For example, the switch -d-60 tells ForFiles to search for files that are 60 days old or older.

Deleting Files by Size
In some situations, you might want to delete files according to their size and extension. For example, my company uses MPEG media files for business purposes. Recently, some very large (25MB or more) personal MPEG files have shown up. These files are definitely out of scope.

For cases like that, I wrote DeleteLargeFiles.bat, which you can see in Listing 3. This script includes the @FSIZE file size variable in the Del command to be executed, as callout A shows. The @FSIZE variable looks for files of a certain size—in this case, files that exceed 26,214,400 bytes (25MB). The ForFile utility's @FSIZE variable must be specified in bytes, so make certain that you use bytes rather than kilobytes or megabytes lest you delete many more files than you intend to.

Using the Scripts in Your Environment
I've tested DeleteFilesByExtension

To download these scripts from the Windows Scripting Solutions Web site, go to http://www.windowsitpro.com/windowsscripting, enter 46486 in the InstantDoc ID text box, then click the 46486.zip hotlink. Each script contains comments detailing how to configure the script for your environment. In addition, keep in mind the following items:

  • The utility path should not contain spaces.

  • ForFiles works on local paths but not on UNC paths unless you map a drive.

  • Test the scripts in List mode by running them with no parameters, and inspect the log file thoroughly before running the script in Delete mode.

  • If you need to run the scripts periodically, you can use Task Scheduler to run them as often as needed.

Minimizing clutter on your shared folders is a never-ending task. DeleteFilesByExtension.bat, DeleteOlderFiles.bat, and DeleteLargeFiles.bat can save you time as well as money.

.bat>

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like