JSI Tip 3538. Amended batch file to remove duplicate records.

Jerold Schulman

April 1, 2001

2 Min Read
ITPro Today logo in a gray background | ITPro Today


Phil Robyn, at Berkeley, pointed out that the batch file I had scripted at tip 3530 did not handle blank records and it did not preserve leading space characters in the records. Phil submitted the following batch:

@echo offsetlocalif {%1} EQU {} goto syntaxif not exist %1 goto syntaxset infile=%1if {%2} EQU {} goto syntaxset outfile=%2type nul > %outfile%for /f "tokens=1* delims=:" %%a in (  'type %infile%  ^| sort  ^| findstr /n /v /c:"CoLoRlEsS gReEn IdEaS"') do call :dedup %%a "%%b"endlocal&goto :EOF:syntax@echo **************************************@echo Syntax: SortDup Input_File Output_File@echo **************************************endlocal&goto :EOF:dedupset curr_rec=%2if [%curr_rec%]==[""] set curr_rec=$$$blankline$$$set curr_rec=

%curr_rec%

set curr_rec=%curr_rec:

"=%set curr_rec=%curr_rec:

=%if not defined prev_rec goto :writeif "%curr_rec%" EQU "%prev_rec%" goto :EOF:writeif "%curr_rec%" EQU "$$$blankline$$$" (   echo.>>%outfile%) else (   echo>>%outfile% %curr_rec%)set prev_rec=%curr_rec%goto :EOF

Borrowing Phil's findstr idea, I countered with the following amendment:

@echo offsetlocalif {%1} EQU {} goto syntaxif not exist %1 goto syntaxset file=%1set file="%file:"=%"set work=%~pd1%~nx1.tmpset work="%work:"=%"set work=%work:\=%sort %file% /O %work%del /f /q %file%for /f "Tokens=1* Delims=:" %%s in ('findstr /n /v /c:"dO nOt FiNd" %work%') do set record=###%%t###&call :output REM if exist %work% del /q %work%endlocalgoto :EOF:syntax@echo ***************************@echo Syntax: SortDup Input_File @echo ***************************goto :EOF:outputif not defined prev_rec goto :writeif "%record%" EQU "%prev_rec%" goto :EOF:writeset prev_rec=%record%set record=%record:###=%if "%record%" EQU "" goto :blknulif "%record%" GTR " " @echo>>%file% %record%&goto :EOF:blknulif defined bn_rec goto :EOFset bn_rec=Y@echo.>>%file%

NOTE: Neither script gracefully handles records that contain batch control characters, such as &, |, and >. Neither do they address multiple blank records of differinglength or null records. I elected to handle multiple blanks records and null records by outputting a single blank record. If you don't want to output any blank records, remove the last line (@echo.>>%file%).

NOTE: Phil's script pipes the output of the sort command directly into the findstr command, while my script lets the sort write an output file (%work%). Phil's script runs faster on very small files, while mine is twice as fast when sorting larger files.

NOTE: Phil's script script use an Input_File and Output_File, while I elected to return the results in the Input_File. I don't delete the sort output file, which I created in the same folder as Input_File. If you wish to delete it, remove the REM from REM if exist %work% del /q %work%.



Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like