From nobody Mon May 15 08:43:38 2023 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QKXwR1LZJz4BMX0 for ; Mon, 15 May 2023 08:44:07 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [184.105.128.27]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "holgerdanske.com", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QKXwQ081qz3pf8 for ; Mon, 15 May 2023 08:44:05 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=gtfgq1J5; spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 184.105.128.27 as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com; dmarc=pass (policy=none) header.from=holgerdanske.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com; s=nov-20210719-112354; t=1684140233; bh=IIxich1HRkWw1gvOOxCL8UfzK/GJUQ/0i8dgXv0jRrc=; h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject:To: References:Content-Language:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=gtfgq1J5Au2/TAoCxMDlKIfk/KTZPzPfNUtiJiG+QloQ5VTMNetQpwRrHHG4hzqPc cNEDMyLHQ8u6CL843h35dVn5ujOfEK8rIRCUv+IjC/XvjM5TZpNBBz1n+949wPXi/Q aeQEKe3ZY5VdsOpUpv+GBvmAPzos+MOQpnGFrZ/vJgptGG/NORDD0TPonzYa4RTh7Z 9VCApxJ8R+uk+XfANtBa6CS+7fvEB/NMzR6SspKXReAvSQjZjPXtDz+huZ2jpyhrSA dTiN/KaHHxOMsUtMJNGRlbm94oOjFkmQVKTM0Jw5gRhJTVx3aqrgC5iZf7SV5krIAP /flZgyG35xwOQ2SLkotIZJ1qO+ssFJP86VkL3Rxqk8udOuN7OGzH1Vuz75seob57dj DU+9HYMHLhyT4mHuKi3qwJWIYlzGI/2omDChIxR7kGK1ayc3CyD2BJAAAwEE5TgWRx /uZLVUA3xvJoCEVJMf/hfyWgcOuJ0sfaFLP/C1Mq4Ymd9TzuPHmNRCeZjAD9pseMHf a0fraJyiV1z2zURps7aQxHtjAc4HjeA2UQYkTPDY73p3WPJvWZrxPqWFl70rAcGx19 DSkpBHTFqGIqb7szX2pY7MeeHmm5xpbf4iVmLf1onnPeNrf3F0zMe7//0G3sfc937W IKzV4CZDDqlxpQD6IbA2oNrg= Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Mon, 15 May 2023 01:43:53 -0700 Message-ID: <818813a2-8ab0-df54-3c59-0e1ba9ce743d@holgerdanske.com> Date: Mon, 15 May 2023 01:43:38 -0700 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: Tool to compare directories and delete duplicate files from one directory To: questions@freebsd.org References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <7c2429c5-55d0-1649-a442-ce543f2d46c2@holgerdanske.com> <6a0aba81-485a-8985-d20d-6da58e9b5580@optiplex-networks.com> <347612746.1721811.1683912265841@fidget.co-bxl> <08804029-03de-e856-568b-74494dfc81cf@holgerdansk e.com> <126434505.494354.1684104532813@ichabod.co-bxl> Content-Language: en-US From: David Christensen In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Result: default: False [-3.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.988]; DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none]; R_SPF_ALLOW(-0.20)[+a]; R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354]; MIME_GOOD(-0.10)[text/plain]; DKIM_TRACE(0.00)[holgerdanske.com:+]; ASN(0.00)[asn:6939, ipnet:184.104.0.0/15, country:US]; MLMMJ_DEST(0.00)[questions@freebsd.org]; MIME_TRACE(0.00)[0:+]; FROM_EQ_ENVFROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; TO_DN_NONE(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_ALL(0.00)[] X-Rspamd-Queue-Id: 4QKXwQ081qz3pf8 X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On 5/15/23 01:29, David Christensen wrote: > On 5/14/23 15:48, Sysadmin Lists wrote: >> #!/bin/sh -e >> # remove or report duplicate files: $0 [-n] dir[1] dir[2] ... dir[n] >> if [ "X$1" = "X-n" ]; then n=1; shift; fi >> >> echo "Building files list from: ${@}" >> >> find "${@}" -xdev -type f | >> awk -v n=$n 'BEGIN { cmd = "stat -f %z " >> for (x = 1; x < ARGC; x++) args = args ? args "|" ARGV[x] : ARGV[x]; >> ARGC = 0 } >>       { files[$0] = match($0, "(" args ")/?") + RLENGTH } >> END  { for (i in ARGV) sub("/*$", "/", ARGV[i]) >>         print "Comparing files ..." >>         for (i = 1; i < x; i++) for (file in files) if (file ~ "^" >> ARGV[i]) { >>             for (j = i +1; j < x; j++) >>                 if (ARGV[j] substr(file, files[file]) in files) { >>                     dup = ARGV[j] substr(file, files[file]) >>                     cmd "\"" file "\"" | getline fil_s; close(cmd "\"" >> file "\"") >>                     cmd "\"" dup  "\"" | getline dup_s; close(cmd "\"" >> dup  "\"") >>                     if (dup_s == fil_s) act("dup") >>                     else act("diff") } >>             delete files[file] >>       } } >> function act(message) { >>      print ((message == "dup") ? "duplicates:" : "difference:"), dup, >> file >>      if (!n) system("rm -vi \"" dup "\" > }' "${@}" > Your script does not appear to do anything (?): > > 2023-05-15 01:19:00 dpchrist@vf1 /vf1zpool1/dpchrist > $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo > Building files list from: foo > Comparing files ... > > 2023-05-15 01:19:33 dpchrist@vf1 /vf1zpool1/dpchrist > $ ls -R1 foo | wc >       26      24      82 > > 2023-05-15 01:19:35 dpchrist@vf1 /vf1zpool1/dpchrist > $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo > Building files list from: foo > Comparing files ... > > 2023-05-15 01:19:48 dpchrist@vf1 /vf1zpool1/dpchrist > $ ls -R1 foo | wc >       26      24      82 I looks like your script only finds duplicates when the subpath is identical (?): 2023-05-15 01:38:20 dpchrist@vf1 /vf1zpool1/dpchrist $ cp -Ra foo bar 2023-05-15 01:39:18 dpchrist@vf1 /vf1zpool1/dpchrist $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar Building files list from: foo bar Comparing files ... duplicates: bar/1/2/a foo/1/2/a duplicates: bar/1/i-j foo/1/i-j duplicates: bar/1/2/e foo/1/2/e duplicates: bar/1/a-b foo/1/a-b duplicates: bar/1/g foo/1/g duplicates: bar/1/2/i foo/1/2/i duplicates: bar/q-r foo/q-r duplicates: bar/m-n foo/m-n duplicates: bar/1/2/m foo/1/2/m duplicates: bar/c foo/c duplicates: bar/e-f foo/e-f duplicates: bar/1/s foo/1/s duplicates: bar/k foo/k duplicates: bar/o foo/o duplicates: bar/q foo/q duplicates: bar/1/c-d foo/1/c-d duplicates: bar/1/2/s-t foo/1/2/s-t duplicates: bar/1/2/o-p foo/1/2/o-p duplicates: bar/1/2/k-l foo/1/2/k-l duplicates: bar/g-h foo/g-h 2023-05-15 01:39:41 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 foo | wc 26 24 82 2023-05-15 01:39:44 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 bar | wc 26 24 82 2023-05-15 01:40:10 dpchrist@vf1 /vf1zpool1/dpchrist $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh -n foo bar Building files list from: foo bar Comparing files ... duplicates: bar/1/2/a foo/1/2/a duplicates: bar/1/i-j foo/1/i-j duplicates: bar/1/2/e foo/1/2/e duplicates: bar/1/a-b foo/1/a-b duplicates: bar/1/g foo/1/g duplicates: bar/1/2/i foo/1/2/i duplicates: bar/q-r foo/q-r duplicates: bar/m-n foo/m-n duplicates: bar/1/2/m foo/1/2/m duplicates: bar/c foo/c duplicates: bar/e-f foo/e-f duplicates: bar/1/s foo/1/s duplicates: bar/k foo/k duplicates: bar/o foo/o duplicates: bar/q foo/q duplicates: bar/1/c-d foo/1/c-d duplicates: bar/1/2/s-t foo/1/2/s-t duplicates: bar/1/2/o-p foo/1/2/o-p duplicates: bar/1/2/k-l foo/1/2/k-l duplicates: bar/g-h foo/g-h 2023-05-15 01:40:22 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 foo | wc 26 24 82 2023-05-15 01:40:29 dpchrist@vf1 /vf1zpool1/dpchrist $ ls -R1 bar | wc 26 24 82 2023-05-15 01:40:34 dpchrist@vf1 /vf1zpool1/dpchrist $ sysadmin.lists_mailfence.com-20230514-1548-find-dupes.sh foo bar Building files list from: foo bar Comparing files ... duplicates: bar/1/2/a foo/1/2/a remove bar/1/2/a? n duplicates: bar/1/i-j foo/1/i-j remove bar/1/i-j? n duplicates: bar/1/2/e foo/1/2/e remove bar/1/2/e? n duplicates: bar/1/a-b foo/1/a-b remove bar/1/a-b? n duplicates: bar/1/g foo/1/g remove bar/1/g? n duplicates: bar/1/2/i foo/1/2/i remove bar/1/2/i? n duplicates: bar/q-r foo/q-r remove bar/q-r? n duplicates: bar/m-n foo/m-n remove bar/m-n? n duplicates: bar/1/2/m foo/1/2/m remove bar/1/2/m? n duplicates: bar/c foo/c remove bar/c? n duplicates: bar/e-f foo/e-f remove bar/e-f? n duplicates: bar/1/s foo/1/s remove bar/1/s? n duplicates: bar/k foo/k remove bar/k? n duplicates: bar/o foo/o remove bar/o? n duplicates: bar/q foo/q remove bar/q? n duplicates: bar/1/c-d foo/1/c-d remove bar/1/c-d? n duplicates: bar/1/2/s-t foo/1/2/s-t remove bar/1/2/s-t? n duplicates: bar/1/2/o-p foo/1/2/o-p remove bar/1/2/o-p? n duplicates: bar/1/2/k-l foo/1/2/k-l remove bar/1/2/k-l? n duplicates: bar/g-h foo/g-h remove bar/g-h? n David