FINDDUPE: Duplicate file detector and eliminator

FINDDUPE: Duplicate file detector and eliminator

Version 1.0   Nov 25 2006

Finddupe is a tool for quick detecting of duplicate files on a hard drive under Windows. Duplicate files can be just detected, hardlinked, or deleted.

Finddupe has several possible uses:

Deleting duplicate files
When working thru somebody else's photo collection, or MP3 collection, this tool is useful for deleting the files that are duplicate. Depending on how the media is organized, there can be a lot of duplicate files in a collection.

Freeing hard drive space
Sometimes its intentional to have certain media in multiple places. By running finddupe, and hard linking the identical files, you can keep the files in multiple places, while only having one physical copy on the hard drive.

Detecting changed files for backup
Finddupe is useful for detecting which files have changed and need backing up. Simply back up the media, and then run finddupe to eliminate those files in the copy that are already contained in a previous backup.

Finddupe is a command line program. If you don't know what a command prompt under Windows is, you may have to do a bit of learning before attempting to use this program. The command prompt is not DOS (before-windows), although it looks and acts a lot like that, and people unfamiliar thing that it is dos. Please don't ask me for help if you don't know how to use command line based programs - learn about that first.

finddupe command line options

finddupe [options] [-ref] <filepat> [filepat]...

Example uses

If you have a previous backup in a directory tree on c:\prev_backup, and just copied your work files to a directory tree on c:\new_backup, you can remove any files that are already in the previous backup with the follwoing incarnation:
    findup -del -ref c:\prev_backup c:\new_backup


If you have a large photo collection on c:\photos, and you wish to replace duplicates with hard links, you can run:

    finddupe -hardlink c:\photos
Note that this only works on NTFS file systems (such as the C drive under Windows XP). It won't work on FAT file systems, like the ones used on most external hard disks or USB flash drives.


If you just want to know which files are common between two directory trees, you can run:

    finddupe -bat work.bat -del c:\media\** c:\media2\**
This will create the file "work bat" with file delete commands in it. The '-bat' option tells finddupe to not do anything, but rather store the actions to a batch file. This allows you to review what finddupe would do before taking any action. The '**' tells it to recursively do all the files.

"Screenshot" - finddupe looks like while running:

Command Prompt - finddupe
C:\>finddupe testdir\*
Duplicate: 'testdir\aab.txt'
With:      'testdir\aab.zzz'
Duplicate: 'testdir\aab.bak'
With:      'testdir\aac.txt'
Duplicate: 'testdir\dup1'
With:      'testdir\foo2'
Duplicate: 'testdir\foo'
With:      'testdir\makefile'
Duplicate: 'testdir\foo'
With:      'testdir\makefile.bak'
Duplicate: 'testdir\dup1'
With:      'testdir\myglob.bak'
Duplicate: 'testdir\longdiff.bak'
With:      'testdir\nadine.txt'
Files:    23285 kBytes in    23 files
Dupes:     6971 kBytes in     7 files

Compatibility

Finddupe has been tested on Windows 2000 and XP. Hard linking does not work on Windows 98 or ME.

Why I wrote this program

I wanted to eliminate some duplicate files on my windows computer. Naturally, I searched the internet. But mostly, I could just find fancy payware, whereas all I wanted was a really simple command line based utility. So I eventually wrote one.

I also wrote it to be very fast. For large media files, this helps a lot. Finddupe will only read the first 32k of a file and compute a hash based on that. Only if that matches with another file will it even read the entire files. I use it mostly on various media, like jpegs and mp3s to find and eliminate duplicates I may have

Licence

Finddupe is totally free. Do whatever you like with it. You can integrate it into GPL or BSD style liscensed programs if you would like to.

Bugs

Presently, finddupe does not check for NTFS filesystems before attempting to hard link. If you run it on a non NTFS filesystem, it will stop on the first failed hardlink attempt, but not before deleting the file it meant to replace with a hardlink.

Downloads

Got questions? Email me: The address is in the PNG file so no robot can pick it up

Other handy free utilities by Matthias Wandel:


To Matthias Wandel's home page