Folder FAQs for Scripted Data Cleanup
Keep storage lean and mean
July 10, 2005
In the article "Lean, Mean Storage," July 2005, InstantDoc ID 46486, I offered scripting techniques for minimizing data clutter on shared folders. As a result of that article, I received many questions about data management and cleaning up files, and I share them here as FAQs.
Keep in mind that the code snippets and script in this article can delete or move directories and directory structures. Be sure to carefully test code in a nonproduction environment before you deploy it against production data. In this article, I use For command iterators with double percent signs, as they would be used in a script (e.g., %%i, %%j). If you run code directly at a command prompt, use single percent signs (e.g., %i, %j).
I want to get a count of empty folders in a file structure. I thought that would be simple to script, but after I tried I realized that command shell scripting doesn't seem to have an Is Empty test. Is there an easy way to do what I want?
The Rmdir command lets you delete empty folders, but you can't test for whether a folder is empty other than to try to delete it. However, if you run a Dir command on a folder and nothing is returned, obviously the folder is empty. Here's an example of code that will test a directory structure for empty folders and return the number of empty folders it finds:
Set folcntr=&For /F "tokens=*" %%i in ('Dir /AD /B /S "R:sharedfolder"') Do For /F "tokens=*" %%j in ('Dir /B "%%i" ^| Findstr $^ >NUL ^|^| Echo blah') Do Set /A folcntr +=1
This somewhat cryptic code works by checking whether the second Dir command ('Dir /B "%%i" ^| Findstr $^) produces any file or folder output. The Findstr command determines whether there's output by looking for an end-of-line character, which the $ regular-expression character represents. When Findstr fails to find an end-of-line character, the logical OR (||) operator prompts the Echo command to display the word blah. In other words, the Echo command runs only when the Findstr command fails. When the Echo command runs, the number-counter variable folcntr advances by 1. Note that all reserved characters inside the For command must be preceded by a carat character. The | symbol is a reserved character; hence, ^|^| is used to make the embedded OR operator work properly.
We have a very large user community with multiple shared folders. I performed a search on several of these folders and discovered many empty New Folder directories. How did these directories get created in our folder structure, and what's an easy way to get rid of them to reduce our folder clutter?
A couple of years ago I did a similar search and discovered several thousand New Folder directories on our shared folder areas. Many of these directories were empty. The directories appear to be created when users right-click and select New, Folder. When the new folder is created in the folder structure, it often drops to the bottom of the file and folder list in the Windows Explorer window, making it difficult for the user to locate and rename it. The user loses track of the folder, then creates and renames another folder. The orphaned folder is never renamed or used until it is eventually discovered and deleted. What complicates matters is that some folders with the name New Folder might actually have contents, and those folders obviously need to be retained.
If you want only a list of empty New Folder directories, you can use a derivation of the Dir command Is Empty test I introduced in the preceding FAQ. Use the code:
Forfiles -p"R:sharedfolder" -s -m"*new folder*" -c"CMD /C If @ISDIR
TRUE Dir /B 0x22@PATH@FILE0x22 |Findstr $>NUL||Echo @PATH@FILE Is empty"
To delete the folders, use the following command:
Forfiles -p"R:sharedfolder" -s -m"*new folder*" -c"CMD /C If @ISDIRTRUE Rmdir 0x22@PATH@FILE0x22 2>NUL&Echo @PATH@FILE"
Remember that Forfiles works only on local paths, not on UNC paths. If you need to address files on a remote server, you can use a mapped drive. Also, notice that unlike many other built-in commands and resource kit tools, there are no spaces between Forfiles switches and parameter information. As an example, in the case of the path switch -p"R:sharedfolder", note the absence of a space between the p and the leading double quote of the path. This is true for all Forfiles switches. If you add a space, Forfiles will fail.
We have multiple empty folders on our shares. We need to delete them but want to exempt any folders that were created in the previous 30 days in the event users have created folders or folder structures but not yet added data. I was trying to use the Dir command with the Find command, but this seemed to result in having to look for files on each of the dates separately. How can I easily set up folder deletions with a date threshold?
Again, Forfiles is the best tool. Use the -d switch date option to identify the older folders, using the code:
Forfiles -p"R:sharedfolder" -s -m*.* -d-30 -c"CMD /C If @ISDIR
TRUE Rmdir 0x22@PATH@FILE0x22 2>NUL&Echo @PATH@FILE"
We have a problem with users creating folder structures with multiple levels of nested folders. If the business project that the folder structure is being created for doesn't proceed, we end up with a large structure that can be deleted. How can we easily clean out these structures while also ensuring that we're not deleting anything that has file contents?
I deal with a lot of cancelled projects and have this same problem. I've seen a folder structure with literally hundreds of folders that contain no files. The problem is that Rmdir can't delete folders with contents; therefore, Rmdir as used in the code in the preceding FAQ will delete only folders at the bottom of the structure. You would need to keep repeating the code run, and each run would remove only a single bottom folder level from the structure. However, if you use Rmdir with the /S and /Q switches, you could unintentionally delete an entire folder structure that has file contents. So, the goal you need to achieve is safely deleting only empty folders from the bottom up. The Dir command (i.e., Dir /S /AD /B "R:sharedfolder") will net you a list of folders in the default top-down format. The Sort command (^| Sort /R) reverses the display of the results so that the Rmdir folder deletions take place in a bottom-up order. Here's code that uses the Sort command to perform these inverted deletions:
For /F "Tokens=*" %%i in ('Dir /S /AD /B "R:sharedfolder" ^| Sort /R') Do Rmdir "%%i"
If you're concerned that you might be deleting folders that are recently created but not yet populated with file data, use the following Forfiles code that performs a 60-day date check, then makes the inverted deletions on only the folders that are over that date threshold:
For /F "Tokens=*" %%i in ('Forfiles -p"R: sharedfolder" -s -m*.* -d-60 -c"CMD /C If @ISDIRTRUE Echo @PATH@FILE" ^| Sort /R') Do Rmdir "%%i"
I need to determine the size of individual top-level folders in a file structure, as well as their combined size. This doesn't seem easy to accomplish with a Dir command. Is there a better way?
Yes, the Dir command would require a lot of filtering to get the output you want. A better way is to use the DirUse tool from either the Microsoft Windows Server 2003 Resource Kit, Windows XP resource kit, or Windows 2000 resource kit. This tool has many options that let you specify target folders and folder size. To query top-level folders and look for folders larger than 25MB, use the following basic syntax:
DirUse /m /q:25 /* R:sharedfolder
where R:sharedfolder is your target folder. This syntax will return the size of all the top-level folders and mark any larger than 25MB with an exclamation point. See the DirUse online help (DirUse /?) for more switch options.
We are using Robocopy to copy a data area to a new server location. We are getting a ton of failures that seem to be caused by path-length problems. How can we copy the data even if it is over the traditional 256-character limit?
Early Robocopy versions (1.95 and earlier) don't support long paths. Robocopy version XP010, which comes with the Windows 2003 and XP resource kits, supports by default copy paths longer than 256 characters. If you want to force Robocopy XP010 to fail on files and folders exceeding that threshold, use the /256 switch, which prevents those files and folders from copying and will write errors to the log file. If you force failures, be sure to use the switches /R:1 (number of retries) and /W:1 (seconds to wait before retrying). If you're curious about why you might want to force Robocopy to fail, remember that a number of applications exist that might not be able to access file paths longer than 256 characters. Running Robocopy with the /256 switch lets you identify those problem areas and perhaps get data owners to truncate the paths prior to your migration.
Another trick to help you get around the long paths during your copy operation is to map drives down into the folder structure on the source and destination locations. You can use this technique to reduce the effective length of the path. When you copy data from \ServerAShare1mydatamylongfilepathdata, you have 42 characters used just for getting to the basic location, not including subfolders and filenames. If you map a drive to this location and use the drive letter as the source location in your Robocopy script, you can remove almost 40 characters from the path.
We have a very large folder structure with a lot of seldom-used reference data. Users want to have this data available online, but keeping it in the main folder structure results in a lot of clutter. What are some architecture solutions and scripted ways to improve this situation?
One of the strategies I've used over the years is to offload files into a duplicate parallel folder structure. The situation that prompted this strategy was that two main types of documents were showing up on our storage areas: older reference documents and newer "in process" documents. The parallel data folder can be an effective way to split up data between two locations, making it a bit easier to deal with. Also, depending on the permissions you apply on the reference area, you can lock down files that are in the older-file area and by using read-only permissions at the share level prevent those files from being modified.
Here's how to set up duplicate parallel folders. First, determine the suitability of this approach with management and your user community. Then, if you get a green light, determine the age or date threshold you'll use to move files from the current area to the reference area. You can choose a date and move all the files that are older than that date to the parallel area. Alternatively, you can choose a number-of-days figure and use that as the move threshold.
To create the older-data area, you need to determine the size of the reference area in which the older files will reside by performing a search according to the age of the files on your current shared location. You can use Windows Search or Agent Ransack (available at http://www.mythicsoft
.com/agentransack) to search for files that match your date specifications. After you determine the age threshold and sizing information, configure the ParallelRobo.bat script and do a test run in the List mode. Here's the command syntax for Robocopy in the /MOV file move mode:
"D:Robocopy.exe" "\ServerNameSrcShare" "\ServerNameDestShare" /L /S /NP /TEE /COPY:DATSOU /MOV /MINAGE:730 /R:1 /W:1 /LOG+:"D:Log.txt"
I used the following Robocopy switches in ParallelRobo.bat:
/S: Copies subdirectories, but not empty ones.
/L: List only. This switch gives you a list of what would have been moved in an actual run. No files are copied or moved.
/NP: Suppresses display of progress information.
/TEE: Displays Robocopy activity on the console while output is still going to the log file.
/COPY:DATSOU: You can use the /COPY switch with the D (data), A (attributes), T (timestamp), S (NTFS security), O (ownership), and U (audit) settings.
/MOV: Moves files but leaves original folder structure intact. (This switch is different from /MOVE, which can delete source structure.)
/MINAGE:730: Specifies the minimum age of files for move activity. Can be specified as the number of days (e.g., 365, 730) or a date in the past following the format YYYYMMDD (e.g., 20010121).
/R:1: Retry this many times if copy or move (in the case of ParallelRobo.bat) fails. The default is 1 million!
/W:1: Wait this many seconds before retrying. The default is 30 seconds.
/LOG+:"D:Log.txt": Write results to log file. The plus sign specifies appending results to the log file and not overwriting previous results.
To put ParallelRobo.bat to work in your environment, follow these steps:
Download the script by opening this article online and clicking the .zip file link at the beginning of the article.
Configure the Robocopy utility location, the target area for deletion, and the other items you're prompted to configure at the top of the script. The paths you configure must not contain spaces.
Test the script in list (/L) mode by running the script with no parameters. Carefully inspect the log file results before running the script in move (/MOV) mode. (You should always test your scripts carefully before implementation against a production area.)
Before running the script against a production area, make sure you have an adequate backup of that area's data in case data recovery is required.
About the Author
You May Also Like