If you are like me, you create a lot of backups. My most common backups are cutting and pasting every recognizable file that I may have created onto an external drive before routinely reinstalling Windows. This often leads to me having many copies of the same file in multiple drives and folders. I needed a way to scan for all duplicate files so that I could finally consolidate my backups into having unique files only.
I decided to create a simple script that would scan two given top level directories and check each file in one directory for a duplicate in the other. If one was found, the names and locations of both files would be written to a csv file that could then be easily read and sorted in Excel.
When you run the above code, you will get a Log.csv file like below if any duplicates are found.
As you can see from the above example, I have many copies of the same file that even have different names. This would be a nightmare to sort out manually. The script is currently single threaded and will take some time to run on a large dataset. I plan to upgrade it to use multithreading later on and follow up with a secondary script that reads this csv file and either moves or deletes duplicates.