Re: Tool to compare directories and delete duplicate files from one directory
- In reply to: David Christensen : "Re: Tool to compare directories and delete duplicate files from one directory"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 15 May 2023 08:43:38 UTC
On 5/15/23 01:29, David Christensen wrote: > On 5/14/23 15:48, Sysadmin Lists wrote: >> #!/bin/sh -e >> # remove or report duplicate files: $0 [-n] dir[1] dir[2] ... dir[n] >> if [ "X$1" = "X-n" ]; then n=1; shift; fi >> >> echo "Building files list from: ${@}" >> >> find "${@}" -xdev -type f | >> awk -v n=$n 'BEGIN { cmd = "stat -f %z " >> for (x = 1; x < ARGC; x++) args = args ? args "|" ARGV[x] : ARGV[x]; >> ARGC = 0 } >> { files[$0] = match($0, "(" args ")/?") + RLENGTH } >> END { for (i in ARGV) sub("/*$", "/", ARGV[i]) >> print "Comparing files ..." >> for (i = 1; i < x; i++) for (file in files) if (file ~ "^" >> ARGV[i]) { >> for (j = i +1; j < x; j++) >> if (ARGV[j] substr(file, files[file]) in files) { >> dup = ARGV[j] substr(file, files[file]) >> cmd "\"" file "\"" | getline fil_s; close(cmd "\"" >> file "\"") >> cmd "\"" dup "\"" | getline dup_s; close(cmd "\"" >> dup "\"") >> if (dup_s == fil_s) act("dup") >> else act("diff") } >> delete files[file] >> } } >> function act(message) { >> print ((message == "dup") ? "duplicates:" : "difference:"), dup, >> file >> if (!n) system("rm -vi \"" dup "\" </dev/tty") >> }' "${@}" > Your script does not appear to do anything (?): > > 2023-05-15 01:19:00 dpchrist@vf1 /vf1zpool1/dpchrist > $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo > Building files list from: foo > Comparing files ... > > 2023-05-15 01:19:33 dpchrist@vf1 /vf1zpool1/dpchrist > $ ls -R1 foo | wc > 26 24 82 > > 2023-05-15 01:19:35 dpchrist@vf1 /vf1zpool1/dpchrist > $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo > Building files list from: foo > Comparing files ... > > 2023-05-15 01:19:48 dpchrist@vf1 /vf1zpool1/dpchrist > $ ls -R1 foo | wc > 26 24 82 I looks like your script only finds duplicates when the subpath is identical (?): 2023-05-15 01:38:20 dpchrist@vf1 /vf1zpool1/dpchrist $ cp -Ra foo bar 2023-05-15 01:39:18 dpchrist@vf1 /vf1zpool1/dpchrist $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar Building files list from: foo bar Comparing files ... duplicates: bar/1/2/a foo/1/2/a duplicates: bar/1/i-j foo/1/i-j duplicates: bar/1/2/e foo/1/2/e duplicates: bar/1/a-b foo/1/a-b duplicates: bar/1/g foo/1/g duplicates: bar/1/2/i foo/1/2/i duplicates: bar/q-r foo/q-r duplicates: bar/m-n foo/m-n duplicates: bar/1/2/m foo/1/2/m duplicates: bar/c foo/c duplicates: bar/e-f foo/e-f duplicates: bar/1/s foo/1/s duplicates: bar/k foo/k duplicates: bar/o foo/o duplicates: bar/q foo/q duplicates: bar/1/c-d foo/1/c-d duplicates: bar/1/2/s-t foo/1/2/s-t duplicates: bar/1/2/o-p foo/1/2/o-p duplicates: bar/1/2/k-l foo/1/2/k-l duplicates: bar/g-h foo/g-h 2023-05-15 01:39:41 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 foo | wc 26 24 82 2023-05-15 01:39:44 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 bar | wc 26 24 82 2023-05-15 01:40:10 dpchrist@vf1 /vf1zpool1/dpchrist $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar Building files list from: foo bar Comparing files ... duplicates: bar/1/2/a foo/1/2/a duplicates: bar/1/i-j foo/1/i-j duplicates: bar/1/2/e foo/1/2/e duplicates: bar/1/a-b foo/1/a-b duplicates: bar/1/g foo/1/g duplicates: bar/1/2/i foo/1/2/i duplicates: bar/q-r foo/q-r duplicates: bar/m-n foo/m-n duplicates: bar/1/2/m foo/1/2/m duplicates: bar/c foo/c duplicates: bar/e-f foo/e-f duplicates: bar/1/s foo/1/s duplicates: bar/k foo/k duplicates: bar/o foo/o duplicates: bar/q foo/q duplicates: bar/1/c-d foo/1/c-d duplicates: bar/1/2/s-t foo/1/2/s-t duplicates: bar/1/2/o-p foo/1/2/o-p duplicates: bar/1/2/k-l foo/1/2/k-l duplicates: bar/g-h foo/g-h 2023-05-15 01:40:22 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 foo | wc 26 24 82 2023-05-15 01:40:29 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 bar | wc 26 24 82 2023-05-15 01:40:34 dpchrist@vf1 /vf1zpool1/dpchrist $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo bar Building files list from: foo bar Comparing files ... duplicates: bar/1/2/a foo/1/2/a remove bar/1/2/a? n duplicates: bar/1/i-j foo/1/i-j remove bar/1/i-j? n duplicates: bar/1/2/e foo/1/2/e remove bar/1/2/e? n duplicates: bar/1/a-b foo/1/a-b remove bar/1/a-b? n duplicates: bar/1/g foo/1/g remove bar/1/g? n duplicates: bar/1/2/i foo/1/2/i remove bar/1/2/i? n duplicates: bar/q-r foo/q-r remove bar/q-r? n duplicates: bar/m-n foo/m-n remove bar/m-n? n duplicates: bar/1/2/m foo/1/2/m remove bar/1/2/m? n duplicates: bar/c foo/c remove bar/c? n duplicates: bar/e-f foo/e-f remove bar/e-f? n duplicates: bar/1/s foo/1/s remove bar/1/s? n duplicates: bar/k foo/k remove bar/k? n duplicates: bar/o foo/o remove bar/o? n duplicates: bar/q foo/q remove bar/q? n duplicates: bar/1/c-d foo/1/c-d remove bar/1/c-d? n duplicates: bar/1/2/s-t foo/1/2/s-t remove bar/1/2/s-t? n duplicates: bar/1/2/o-p foo/1/2/o-p remove bar/1/2/o-p? n duplicates: bar/1/2/k-l foo/1/2/k-l remove bar/1/2/k-l? n duplicates: bar/g-h foo/g-h remove bar/g-h? n David