find duplicates from the cli using checksum
Date: July 30, 2011
linux:
find . -type f -print0 | xargs -0 -n1 md5sum | sort -k 1,32 | uniq -w 32 -d --all-repeated=separate | sed -e 's/^[0-9a-f]*\ *//;'
mac osx:
find . -type f -print0 | xargs -0 cksum | sort | awk '{if($1 == prevsum) {printf("----\n %s\n %s\n", prev, $0);} prev=$0; prevsum=$1;}'
The MD5sum will always be different if the size does not match. Therefore you do not need to read files that has a size diferent from all others.
find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate