From nobody Thu May 04 23:53:14 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QC9cZ5nPzz49tx6 for ; Thu, 4 May 2023 23:53:18 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Received: from mail.optiplex-networks.com (mail.optiplex-networks.com [212.159.80.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QC9cZ2xCGz436W for ; Thu, 4 May 2023 23:53:18 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Authentication-Results: mx1.freebsd.org; none Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 23E6815C2EC6; Fri, 5 May 2023 00:53:17 +0100 (BST) Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id F1WFEPFrlDTM; Fri, 5 May 2023 00:53:16 +0100 (BST) Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 493A215C2DBC; Fri, 5 May 2023 00:53:14 +0100 (BST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.optiplex-networks.com 493A215C2DBC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=optiplex-networks.com; s=AE93A2AC-7F67-11EA-90AE-8A1FE64F6997; t=1683244394; bh=vMYjYwL4OYyOekp4ggvuUEb3JIEP1DiWeEPXOVGYClA=; h=Message-ID:Date:MIME-Version:To:From; b=z//WLKVm6XuEs9fc8H4hfx911ElHDP2Dsq4O9z3j5qBesLMHwODKdtN69d7iA8USV l7tyAN2tzhQmu3GOreG4NCyV9jZsRx/3cnIBawLSNtzzibWsHJj10PBvFSPdLnAQ81 Rj9nSOTc4SeuByANpvDIchLsW00CNcqD32BmQYO2LOdQkoBq+LufISijrOdggc1RKa bQfgfY2giKDSQrHfLS9gva9rJIBD895uv1aD9Oa4b6W+Ni2SYSyBDlxTsDMFt2m/Xp EAbksDiofaWvbt/dCgobyC4sorl4GzRy5ujqMHXU4ICSFWmISONDcLo+I78r80CAqW mAL4SXw21Ca4A== X-Virus-Scanned: amavisd-new at mail.optiplex-networks.com Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jUvajR8ikzLT; Fri, 5 May 2023 00:53:14 +0100 (BST) Received: from [192.168.20.23] (unknown [192.168.20.23]) by mail.optiplex-networks.com (Postfix) with ESMTPSA id 2815415C2C76; Fri, 5 May 2023 00:53:14 +0100 (BST) Content-Type: multipart/alternative; boundary="------------iLtfd7qrOG0ADWnzwCu037z0" Message-ID: Date: Fri, 5 May 2023 00:53:14 +0100 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: Tool to compare directories and delete duplicate files from one directory Content-Language: en-US To: Paul Procacci Cc: freebsd-questions@freebsd.org References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> From: Kaya Saman In-Reply-To: X-Rspamd-Queue-Id: 4QC9cZ2xCGz436W X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6871, ipnet:212.159.64.0/18, country:GB] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N This is a multi-part message in MIME format. --------------iLtfd7qrOG0ADWnzwCu037z0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 5/4/23 23:32, Paul Procacci wrote: > > > On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman=20 > wrote: > > > On 5/4/23 17:29, Paul Procacci wrote: >> >> >> On Thu, May 4, 2023 at 11:53=E2=80=AFAM Kaya Saman >> wrote: >> >> Hi, >> >> >> I'm wondering if anyone knows of a tool like diff or so that >> can also >> delete files based on name and size from either left/right or >> source/destination directory? >> >> >> Basically what I have done is performed an rsync without >> using the >> --remove-source-files option onto a newly bought and created >> disk pool >> (yes zpool) that i am trying to consolidate my data - as it's >> currently >> spread out over multiple pools with the same folder name. >> >> >> The issue I am facing mainly is that I perform another rsync >> and use the >> --remove-source-files option, rsync will delete files based >> on name >> while there are some files that have the same name but not >> same size and >> I would like to retain these files. >> >> >> Right now I have looked at many different options in both >> rsync and >> other tools but found nothing suitable. I even tested using a >> few test >> dirs and files that I put into /tmp and whatever I tried, the >> files of >> different size either got transferred or deleted. >> >> >> How would be a good way to approach this problem? >> >> >> Even if I create some kind of shell script and use diff, I >> think it will >> only compare names and not file sizes. >> >> >> I'm really lost here.... >> >> >> Regards, >> >> >> Kaya >> >> >> >> >> It sounds like you want fdupes.=C2=A0 It's in the ports tree. >> >> ~Paul >> >> --=20 >> __________________ >> >> :(){ :|:& };: > > > > I tried fdupes and installed it a while back. For me it felt like > it only works on a single directory. > > > My dir structure is that I have" > > > /dir <- main directory where everything has now been rsync'ed to > > /dir_1 <- old directory with partial content > > /dir_2 <- more partial content > > /dir_3 <- more partial content > > > The key thing here is that I need to compare: > > > /dir_(x) with /dir > > > if the files are different sizes in /dir_(x) then leave them, > otherwise delete if both name and file size are the same. > > > Then a tiny shell script does the job assuming your files don't have=20 > any spaces and no weird characters exist: > > #!/bin/sh > > for i in b c d; > do > =C2=A0 ls $i/ | while read file; > =C2=A0 do > =C2=A0 =C2=A0 [ ! -f a/$file ] && cp $i/$file a/$file && continue > > =C2=A0 =C2=A0 ref=3D`stat -f '%z' a/$file` > =C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file` > =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file > > =C2=A0 done > done > > Change paths accordingly and backup your stuff. ;) > > ~Paul > > --=20 > __________________ > > :(){ :|:& };: Thanks Paul, I should be able to work with this. There are actually spaces and weird=20 characters in the file names so I assume doing something like "file"=20 should allow for that? I don't think I need the line after the 'do' statement do I? From what I=20 understand it copies the file from directory i to directory a? As I=20 explained initially, the files have already been rsync'ed so I just need=20 to compare and delete accordingly. When I performed the rsync it took around a week to complete per run,=20 currently zfs list shows around 12TB usage for my /dir but that's with=20 compression enabled, of the merged directory. A quick Google shows that I can use something like this: |search_dir=3D/the/path/to/base/dir for entry in "$search_dir"/* do echo=20 "$entry" done| To list the files in the directory though this might be Bash and not Csh Otherwise clunkily (my scripting style is pretty rubbish and non=20 efficient), I could do something like (it probably won't work!): #!/bin/sh #fb =3D file base #fm - file merge - file that has already been merged using rsync unless=20 size was different dir_base=3D/dir for fb in "$dir_base"/* do =C2=A0 echo "$fs" done dir_merge=3D/dir_1 for fm in "$dir_merge"/* do =C2=A0 echo "$fm" done =C2=A0 do =C2=A0 =C2=A0 ref=3D`stat -f '%z' $dir_base/$fb` =C2=A0 =C2=A0 src=3D`stat -f '%z' %i$dir_merge/$fm` =C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $dir_merge/$fm =C2=A0 done Regards, Kaya --------------iLtfd7qrOG0ADWnzwCu037z0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On 5/4/23 23:32, Paul Procacci wrote:<= br>


On Thu, May 4, 2023 at 5:47=E2=80=AFPM Kaya Saman <= kayasaman@optiplex-networks.com> wrote:


On 5/4/23 17:29, Paul Procacci wrote:


On Thu, May= 4, 2023 at 11:53=E2=80=AFAM Kaya Saman <kayasaman@opt= iplex-networks.com> wrote:
Hi,


I'm wondering if anyone knows of a tool like diff or so that can also
delete files based on name and size from either left/right or
source/destination directory?


Basically what I have done is performed an rsync without using the
--remove-source-files option onto a newly bought and created disk pool
(yes zpool) that i am trying to consolidate my data - as it's currently
spread out over multiple pools with the same folder name.


The issue I am facing mainly is that I perform another rsync and use the
--remove-source-files option, rsync will delete files based on name
while there are some files that have the same name but not same size and
I would like to retain these files.


Right now I have looked at many different options in both rsync and
other tools but found nothing suitable. I even tested using a few test
dirs and files that I put into /tmp and whatever I tried, the files of
different size either got transferred or deleted.


How would be a good way to approach this problem?


Even if I create some kind of shell script and use diff, I think it will
only compare names and not file sizes.


I'm really lost here....


Regards,


Kaya




It sounds like you want fdupes.=C2=A0 It's in th= e ports tree.

~Paul

--
__________________

:(){ :|:& };:



I tried fdupes and installed it a while back. For me it felt like it only works on a single directory.


My dir structure is that I have"


/dir <- main directory where everything has now been rsync'ed to

/dir_1 <- old directory with partial content

/dir_2 <- more partial content

/dir_3 <- more partial content


The key thing here is that I need to compare:


/dir_(x) with /dir


if the files are different sizes in /dir_(x) then leave them, otherwise delete if both name and file size are the same.


Then a tiny shell script does the job assuming your files don't have any spaces and no weird characters exist:

#!/bin/sh

for i in b c d;
do
=C2=A0 ls $i/ | while read file;
=C2=A0 do
=C2=A0 =C2=A0 [ ! -f a/$file ] && cp $i/$file a/$file &= amp;& continue

=C2=A0 =C2=A0 ref=3D`stat -f '%z' a/$file`
=C2=A0 =C2=A0 src=3D`stat -f '%z' %i/$file`
=C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $i/file

=C2=A0 done
done

Change paths accordingly and backup your stuff. ;)

~Paul

--
__________________
:(){ :|:& };:


Thanks Paul,


I should be able to work with this. There are actually spaces and weird characters in the file names so I assume doing something like "file" should allow for that?


I don't think I need the line after the 'do' statement do I? From what I understand it copies the file from directory i to directory a? As I explained initially, the files have already been rsync'ed so I just need to compare and delete accordingly.

When I performed the rsync it took around a week to complete per run, currently zfs list shows around 12TB usage for my /dir but that's with compression enabled, of the merged directory.


A quick Google shows that I can use something like this:

search_dir=3D/the/path/to/bas=
e/dir
for=
 entry in<=
/span> "$search_di=
r"/*
do
  echo "$entry"
done


To list the files in the directory though this might be Bash and not Csh


Otherwise clunkily (my scripting style is pretty rubbish and non efficient), I could do something like (it probably won't work!):


#!/bin/sh


#fb =3D file base

#fm - file merge - file that has already been merged using rsync unless size was different


dir_base=3D/dir
for fb in "$dir_base"/*
do
=C2=A0 echo "$fs"
done


dir_merge=3D/dir_1
for fm in "$dir_merge"/*
do
=C2=A0 echo "$fm"
done


=C2=A0 do

=C2=A0 =C2=A0 ref=3D`stat -f '%z' $dir_base/$fb`
=C2=A0 =C2=A0 src=3D`stat -f '%z' %i$dir_merge/$fm`
=C2=A0 =C2=A0 [ $ref -eq $src ] && rm -f $dir_merge/$fm
=
=C2=A0 done



Regards,


Kaya

--------------iLtfd7qrOG0ADWnzwCu037z0--