From nobody Thu May 04 22:32:04 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QC7q518Kbz49fvd for ; Thu, 4 May 2023 22:32:17 +0000 (UTC) (envelope-from pprocacci@gmail.com) Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QC7q46XgHz3r9w for ; Thu, 4 May 2023 22:32:16 +0000 (UTC) (envelope-from pprocacci@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oi1-x22a.google.com with SMTP id 5614622812f47-38dee9a7795so409396b6e.3 for ; Thu, 04 May 2023 15:32:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683239536; x=1685831536; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ROm1RAS7z9SXkYOjP7vRouUxxCgxbfX5Q2xIFTvrTHk=; b=hceMS4+MvWhZ2RDcNuqdoIdA98PhnZAj8taFv47GXmTM04xeWWcXQyV42ccOtweQFy +pjbaX+69WlLOMwW3xQZd++xJc6H0rWPvqzRGpU3sw/7kVVNeutwhX3MR3edRmhl+iL7 Itb6sokE0QOGhm8paIEWRbzaMhgyPq+a+Dk6+8uDErP75sEfIjtLMcBcCZ257LZHi0MK qCv2FDVB16+/I1r7TBTW+Bg2ntYDO9V60yaJDfHvW1CcKzeFiWQukYPSnLTYRmsEeOM1 Ar6fhHsq/8j3UIJvXClXbmEncVwdNdiocrl+tphyul4nxUbSgGq7IQ1rhrzmQgdzBBKh 5cKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683239536; x=1685831536; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ROm1RAS7z9SXkYOjP7vRouUxxCgxbfX5Q2xIFTvrTHk=; b=i24plY35i9ZTirsE87Fqe5a7WtGoGzEoIpt3mKM/T8TD/MgMLj6hbYjz7fD5mEmaIF W1uBw3FpyzwBG9H0rPuHNIh6BmF92g166Dhcm9LeR5Q2UacxS0XK3VoHWJgIYPG0+JBF v7PUGHiqN/9FMhvKvx/z0iq2gi5Ts5NwszwgFDEIxvJhUSx7rTN19oja+mocZYgyqQ9K 3PDwA//EGgrVuoooE16LoAraOgPWOagWv09VCEYvRUPT3ihCl1jFyDuJgV9q7n/4W5Hb aC6kU3S58HrL7KLBbHhB4Oy986oUZYptJ4MDIsG7fULzspjUAmzP4WxvWC6f0d+ug4yh v1kQ== X-Gm-Message-State: AC+VfDzM+dSEIw2IgIsscNGnjNDkKyQJjNnqr2niSQ5PqkB3L2ta88ar iqwlpN+Ci9nyz+UyisTVLzJVhyYdlW6KuTWr9M/cnWLpuxHw X-Google-Smtp-Source: ACHHUZ5e4hA3/EeNrfcNQC+Mj74u/fMd7AuLzT9W3oyLxkVqGQBxpagvR6dLvWFgFAp2ihLVvDwgGX7SuMXmJxT1mQY= X-Received: by 2002:a05:6808:1149:b0:38e:2567:315a with SMTP id u9-20020a056808114900b0038e2567315amr2585779oiu.1.1683239535775; Thu, 04 May 2023 15:32:15 -0700 (PDT) List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> In-Reply-To: <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> From: Paul Procacci Date: Thu, 4 May 2023 18:32:04 -0400 Message-ID: Subject: Re: Tool to compare directories and delete duplicate files from one directory To: Kaya Saman Cc: freebsd-questions@freebsd.org Content-Type: multipart/alternative; boundary="0000000000009a126805fae5be5a" X-Rspamd-Queue-Id: 4QC7q46XgHz3r9w X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --0000000000009a126805fae5be5a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman wrote: > > On 5/4/23 17:29, Paul Procacci wrote: > > > > On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman < > kayasaman@optiplex-networks.com> wrote: > >> Hi, >> >> >> I'm wondering if anyone knows of a tool like diff or so that can also >> delete files based on name and size from either left/right or >> source/destination directory? >> >> >> Basically what I have done is performed an rsync without using the >> --remove-source-files option onto a newly bought and created disk pool >> (yes zpool) that i am trying to consolidate my data - as it's currently >> spread out over multiple pools with the same folder name. >> >> >> The issue I am facing mainly is that I perform another rsync and use the >> --remove-source-files option, rsync will delete files based on name >> while there are some files that have the same name but not same size and >> I would like to retain these files. >> >> >> Right now I have looked at many different options in both rsync and >> other tools but found nothing suitable. I even tested using a few test >> dirs and files that I put into /tmp and whatever I tried, the files of >> different size either got transferred or deleted. >> >> >> How would be a good way to approach this problem? >> >> >> Even if I create some kind of shell script and use diff, I think it will >> only compare names and not file sizes. >> >> >> I'm really lost here.... >> >> >> Regards, >> >> >> Kaya >> >> >> >> > It sounds like you want fdupes. It's in the ports tree. > > ~Paul > > -- > __________________ > > :(){ :|:& };: > > > > I tried fdupes and installed it a while back. For me it felt like it only > works on a single directory. > > > My dir structure is that I have" > > > /dir <- main directory where everything has now been rsync'ed to > > /dir_1 <- old directory with partial content > > /dir_2 <- more partial content > > /dir_3 <- more partial content > > > The key thing here is that I need to compare: > > > /dir_(x) with /dir > > > if the files are different sizes in /dir_(x) then leave them, otherwise > delete if both name and file size are the same. > Then a tiny shell script does the job assuming your files don't have any spaces and no weird characters exist: #!/bin/sh for i in b c d; do ls $i/ | while read file; do [ ! -f a/$file ] && cp $i/$file a/$file && continue ref=3D`stat -f '%z' a/$file` src=3D`stat -f '%z' %i/$file` [ $ref -eq $src ] && rm -f $i/file done done Change paths accordingly and backup your stuff. ;) ~Paul --=20 __________________ :(){ :|:& };: --0000000000009a126805fae5be5a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, May 4, 2023 at 5:47=E2= =80=AFPM Kaya Saman <= kayasaman@optiplex-networks.com> wrote:
=20 =20 =20


On 5/4/23 17:29, Paul Procacci wrote:
=20


On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman <kayasaman@optiplex-networks.com= > wrote:
Hi,


I'm wondering if anyone knows of a tool like diff or so that can also
delete files based on name and size from either left/right or
source/destination directory?


Basically what I have done is performed an rsync without using the
--remove-source-files option onto a newly bought and created disk pool
(yes zpool) that i am trying to consolidate my data - as it's currently
spread out over multiple pools with the same folder name.


The issue I am facing mainly is that I perform another rsync and use the
--remove-source-files option, rsync will delete files based on name
while there are some files that have the same name but not same size and
I would like to retain these files.


Right now I have looked at many different options in both rsync and
other tools but found nothing suitable. I even tested using a few test
dirs and files that I put into /tmp and whatever I tried, the files of
different size either got transferred or deleted.


How would be a good way to approach this problem?


Even if I create some kind of shell script and use diff, I think it will
only compare names and not file sizes.


I'm really lost here....


Regards,


Kaya




It sounds like you want fdupes.=C2=A0 It's in the ports tr= ee.

~Paul

--
__________________

:(){ :|:& };:



I tried fdupes and installed it a while back. For me it felt like it only works on a single directory.


My dir structure is that I have"


/dir <- main directory where everything has now been rsync'ed to

/dir_1 <- old directory with partial content

/dir_2 <- more partial content

/dir_3 <- more partial content


The key thing here is that I need to compare:


/dir_(x) with /dir


if the files are different sizes in /dir_(x) then leave them, otherwise delete if both name and file size are the same.


Then a tiny shell script does the job assuming your = files don't have any spaces and no weird characters exist:

#!/bin/sh

for i in b c d;
do
=C2=A0 ls $i/ | while re= ad file;
=C2=A0 do
=C2=A0 =C2=A0 [ ! -f a/$file ] && cp $i/$f= ile a/$file && continue

=C2=A0 =C2=A0 ref=3D`stat -f '%z= ' a/$file`
=C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file`
= =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file

=C2=A0 done=
done

Change paths accordingly and backup your stuff. = ;)

~Paul

--
__= ________________

:(){ :|:& };:
--0000000000009a126805fae5be5a--