find duplicates from the cli using checksum

linux:

find . -type f -print0 | xargs -0 -n1 md5sum | sort -k 1,32 | uniq -w 32 -d --all-repeated=separate | sed -e 's/^[0-9a-f]*\ *//;'

mac osx:

find . -type f -print0 | xargs -0 cksum | sort | awk '{if($1 == prevsum) {printf("----\n %s\n %s\n", prev, $0);} prev=$0; prevsum=$1;}'
«
»
  • The MD5sum will always be different if the size does not match. Therefore you do not need to read files that has a size diferent from all others.

    find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate


Leave a Reply

Your email address will not be published. Required fields are marked *