From nobody Fri May 05 03:36:19 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QCGZB4wGGz49LfR for ; Fri, 5 May 2023 03:36:34 +0000 (UTC) (envelope-from pprocacci@gmail.com) Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [IPv6:2607:f8b0:4864:20::22f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QCGZB2GWHz3h2j for ; Fri, 5 May 2023 03:36:34 +0000 (UTC) (envelope-from pprocacci@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oi1-x22f.google.com with SMTP id 5614622812f47-3921cd76d5bso859070b6e.2 for ; Thu, 04 May 2023 20:36:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683257792; x=1685849792; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=9vFFHdPV2nSpLcF6FUmun/0WAbyVKqbYKqAwX0AP9kA=; b=iLskSTsxV4V8tE3o7Z+GfC8FKjuQub/CCP9+VoTsYlObauiFt6sbiT3Y+36A7Z2knd Jng0g1xX49fICS4UhkdFaysn6FEfZIWrmtcAfRggqvBDOhP6m7ZTN+3a9S+EtI/geznQ 3rEuCG28Zufyt3OY482tf+wQyUZZ5/Nna+21K8WEU7Ud8ZkCsU6VfrYc/6wBfmcQAydJ GQKMQby5rdjzrdYA7f9jYqqYDYjstbLVtV9j6yeKaT4CnLO9TlVjZW2XJs7nh81RgUnQ OGFCjr1bq9360t2ncMDl+FiwBSpr+p6GBZrhlk8iEBhLEsK++idR7DfrlFPQLMAune9j QA/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683257792; x=1685849792; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9vFFHdPV2nSpLcF6FUmun/0WAbyVKqbYKqAwX0AP9kA=; b=lZmNl9lv+EKKqKHtzGVGoXct+X/XaeJ21qn1ppPkKEa7Y1RFqvLCYm2kKDbQtJY5Dr OFcEHlnqbMzf5bvYK2bnR8d1AgYU+GqpE6XwAAN2tZd+0C/nryRq/uvC0Ut4iOAIOJej fMw1IjnBxPDcDdrRZA1HW7g8BpXrNCjoLHgX1zAzvGqwHMan9BKJ7jLyIfc/fV86k9zq kTps7f5PSNCuDu/Zi4x0ACVCsTok73bSLHK9sOFe/YoxfsqcsoYWDu/mhgDoK/ERdOKv 0RZu7sVIjMuPbtKiOjgCHyvfAMPI1aF9srVLuwLxHbPUu0QNFknUalB9uEz4Q5tNuB27 rfoA== X-Gm-Message-State: AC+VfDyHlaiM1sZ6KBw2EfYubwmxcAkMfCKUdWGusDv2+xLdUrTdfleB ye4Tyf4Ws4WlBWyUpxyiUcUGSsLmpEXZFAV+s5iRCFVq7tEZ X-Google-Smtp-Source: ACHHUZ4KbchCFXw2dO64QTKWwyvqunAvsB8o1IkC85Tsnf2kTTIy2pYdGXwaox+zK0/XKdKVt3pav+hwMqKxZq5eewU= X-Received: by 2002:a05:6808:199d:b0:389:4a9e:3341 with SMTP id bj29-20020a056808199d00b003894a9e3341mr3097626oib.18.1683257791713; Thu, 04 May 2023 20:36:31 -0700 (PDT) List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> In-Reply-To: From: Paul Procacci Date: Thu, 4 May 2023 23:36:19 -0400 Message-ID: Subject: Re: Tool to compare directories and delete duplicate files from one directory To: Kaya Saman Cc: freebsd-questions@freebsd.org Content-Type: multipart/alternative; boundary="000000000000bd96d105fae9fe90" X-Rspamd-Queue-Id: 4QCGZB2GWHz3h2j X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --000000000000bd96d105fae9fe90 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 4, 2023 at 11:20=E2=80=AFPM Kaya Saman wrote: > > On 5/5/23 04:01, Paul Procacci wrote: > > On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman < > kayasaman@optiplex-networks.com> wrote: > >> >> On 5/5/23 03:08, Paul Procacci wrote: >> >> There are multiple reasons why it may not work. My guess is because the >> potential for characters that could be showing up within the filenames a= nd >> whatnot. >> >> This can be solved with an interpreted language that's a bit more >> forgiving. >> Take the following perl script. It does the same thing as the shell >> script (almost). It renames the source file instead of making a copy of= it. >> >> run as: ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x >> >> ########################################################################= ########### >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings; >> >> sub msgDie >> { >> my ($ret) =3D shift; >> my ($msg) =3D shift // "$0 dir_base dir\n"; >> print $msg; >> exit($ret); >> } >> >> msgDie(1) unless(scalar @ARGV eq 2); >> >> my $base =3D $ARGV[0]; >> my $dir =3D $ARGV[1]; >> >> msgDie(1, "base directory doesn't exist\n") unless -d $base; >> msgDie(1, "source directory doesn't exist\n") unless -d $dir; >> >> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); >> while(readdir $dh) >> { >> next if($_ eq '.' || $_ eq '..'); >> if( ! -f "$base/$_" ){ >> rename("$dir/$_", "$base/$_"); >> next; >> } >> >> my ($ref) =3D (stat("$base/$_"))[7]; >> my ($src) =3D (stat("$dir/$_"))[7]; >> unlink("$dir/$_") if($ref =3D=3D $src); >> } >> >> ########################################################################= ########### >> >> ~Paul >> >> >> >> This didn't seem to work :-( >> >> >> What exactly happened is this: >> >> >> I created a set of test directories in /tmp >> >> >> So, I have /tmp/test1 and /tmp/test2 >> >> >> to mimic the structure of the directories I intend to run this thing I >> did this: >> >> >> create a subdir called: dupdir in /tmp/test1 and /tmp/test2 >> >> >> /tmp/test2/dupdir contains these files: dup and dup1 >> >> >> /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file. >> >> >> However*, now things get interesting as dup from test1 contains "1234567= " >> and dup from test2 contains "111" <- this is to simulate the file size >> difference. >> >> >> >> >> >> > Worked for me! Regardless. Use rsync then. > > rsync --ignore-existing --remove-source-files /src /dest > > This would at the very least move non-existent files from the source over= to the dest AND remove those source files AFTER the transfer happens. > > You'll be 1/2 way there doing that. What you'll be left with are file th= at exist in BOTH src AND DEST. > > > ~Paul > > > Paul, I think we've got wires crossed.... > > > I *have* already performed the rsync. Apologies if I wasn't clear! > > > The problem I am faced with is that the destination directory is already > populated with the information from 3 source directories. > > > I need to remove the sync'ed files in the source directories and leave > files that match in name but are of different sizes. > > > The problem is I can't use rsync again for this as there aren't any > options to simply compare files based on size. I can't use the --existing > option as the files exist in both directories.... > > > This is the dilemma I am facing: > > > ls -l /merged_dir/folder/ > > 234904506 - file 'a' > > > ls -l /source_dir/folder/ > > 1080918146 - file 'a' > > > so in this case file 'a' is in both directories with the same name but > different size. I need to keep both versions. However, *if* they were the > same size then remove the file in the source_dir..... > > > That's all.. I don't need to transfer anything or copy anything at all... > just compare and remove files of same name and size. > > > Hopefully I am explaining better and things are more clear? Again I > apologize for the confusion :-( > You're at least partially right that I was confused because comparing by name and by size makes no sense to me. A single byte changed in one yields the same name and the same size but are different! ;) Is the below output what you're expecting to happen: % mkdir a b % echo 1111 > a/test.txt % echo 1111 > b/test.txt %./test.pl a b % ls -l a b a: total 5 -rw-r--r-- 1 pprocacci pprocacci 5 May 5 03:26 test.txt b: total 0 ---------- The below perl script is what was ran above. 1) Find a file from directory "b". 2) Go to the top of the loop if the file doesn't exist in directory "a". 3) Go to the top of the loop if the file sizes do not match 4) unlink the file if conditions 2 and 3 fall through. ################################################# #!/usr/bin/env perl use strict; use warnings; sub msgDie { my ($ret) =3D shift; my ($msg) =3D shift // "$0 dir_base dir\n"; print $msg; exit($ret); } msgDie(1) unless(scalar @ARGV eq 2); my $base =3D $ARGV[0]; my $dir =3D $ARGV[1]; msgDie(1, "base directory doesn't exist\n") unless -d $base; msgDie(1, "source directory doesn't exist\n") unless -d $dir; opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); while(readdir $dh) { next if($_ eq '.' || $_ eq '..'); next if(! -f "$base/$_"); my ($ref) =3D (stat("$base/$_"))[7]; my ($src) =3D (stat("$dir/$_"))[7]; unlink("$dir/$_") if($ref =3D=3D $src); } ################################################# --=20 __________________ :(){ :|:& };: --000000000000bd96d105fae9fe90 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Thu, May 4, 2023 at 11:2= 0=E2=80=AFPM Kaya Saman <kayasaman@optiplex-networks.com> wrote:
=20 =20 =20


On 5/5/23 04:01, Paul Procacci wrote:
=20
On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman <kayasaman@opt= iplex-networks.com> wrote:


On 5/5/23 03:08, Paul Procacci wrote:
There are multiple reasons why it may not work.=C2=A0 My guess is because the potential for characters that could be showing up within the filenames and whatnot.

This can be solved with an interpreted language that's a bit more forgiving.
Take the following perl script.=C2=A0 It does the same thing as the shell script (almost).=C2=A0 It renames the source file instead of making a copy of it.

run as:=C2=A0 ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x

###########################################################################= ########
#!/usr/bin/env perl

use strict;
use warnings;

sub msgDie
{
=C2=A0 my ($ret) =3D shift;
=C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n= ";
=C2=A0 print $msg;
=C2=A0 exit($ret);
}

msgDie(1) unless(scalar @ARGV eq 2);

my $base =3D $ARGV[0];
my $dir =C2=A0=3D $ARGV[1];

msgDie(1, "base directory doesn't exist\n&qu= ot;) unless -d $base;
msgDie(1, "source directory doesn't exist\n&= quot;) unless -d $dir;

opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
while(readdir $dh)
{
=C2=A0 next if($_ eq '.' || $_ eq '..'= ;);
=C2=A0 if( ! -f "$base/$_" ){
=C2=A0 =C2=A0 rename("$dir/$_", "$base= /$_");
=C2=A0 =C2=A0 next;
=C2=A0 }

=C2=A0 my ($ref) =3D (stat("$base/$_"))[7];=
=C2=A0 my ($src) =3D (stat("$dir/$_"))[7];<= br> =C2=A0 unlink("$dir/$_") if($ref =3D=3D $sr= c);
}
###########################################################################= ########

~Paul



This didn't seem to work :-(


What exactly happened is this:


I created a set of test directories in /tmp


So, I have /tmp/test1 and /tmp/test2


to mimic the structure of the directories I intend to run this thing I did this:


create a subdir called: dupdir in /tmp/test1 and /tmp/test2


/tmp/test2/dupdir contains these files: dup and dup1


/tmp/test1/dupdir contains a modified 'dup' file= but copied dup1 file.


However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 con= tains "111" <- this is to simulate the file size difference.





=C2=A0
Worked for me!=C2=A0 Regardless.=C2=A0 Use rsync then.

rsync --ignore-existing --remove-source-files=C2=A0 /src /dest
This would at the very least move non-existent fil=
es from the source over to the dest AND remove those source files AFTER the=
 transfer happens.
You'll be 1/2 way there doing that.  What you&=
#39;ll be left with are file that exist in BOTH src AND DEST.

~Paul


Paul, I think we've got wires crossed....


I *have* already performed the rsync. Apologies if I wasn't clear!


The problem I am faced with is that the destination directory is already populated with the information from 3 source directories.


I need to remove the sync'ed files in the source directories and leave files that match in name but are of different sizes.


The problem is I can't use rsync again for this as there aren= 9;t any options to simply compare files based on size. I can't use th= e --existing option as the files exist in both directories....


This is the dilemma I am facing:


ls -l /merged_dir/folder/

234904506 - file 'a'


ls -l /source_dir/folder/

1080918146 - file 'a'


so in this case file 'a' is in both directories with the sam= e name but different size. I need to keep both versions. However, *if* they were the same size then remove the file in the source_dir.....


That's all.. I don't need to transfer anything or copy anyth= ing at all... just compare and remove files of same name and size.


Hopefully I am explaining better and things are more clear? Again I apologize for the confusion=C2=A0 :-(


You're at least partially ri= ght that I was confused because comparing by name and by size makes no sens= e to me.=C2=A0 A single byte changed in one yields the same name and the sa= me size but are different!=C2=A0 ;)
Is the below output what you're = expecting to happen:

% mkdir a b
% echo 1111 > a/test.txt
%= echo 1111 > b/test.txt
%./test.pl a b=
% ls -l a b
a:
total 5
-rw-r--r-- =C2=A01 pprocacci= =C2=A0pprocacci =C2=A05 May =C2=A05 03:26 test.txt

b:
total 0

----------

The bel= ow perl script is what was ran above.=C2=A0 1) Find a file from directory &= quot;b".=C2=A0 2)=C2=A0 Go to the top of the loop if the file doesn= 9;t exist in directory "a".=C2=A0 3) Go to the top of the loop if= the file sizes do not match=C2=A0 4)=C2=A0 unlink the file if conditions 2= and 3 fall through.

#################################################
#!/usr/bin/env perl

use strict;
use warnings;

sub msgD= ie
{
=C2=A0 my ($ret) =3D shift;
=C2=A0 my ($msg) =3D shift // &qu= ot;$0 dir_base dir\n";
=C2=A0 print $msg;
=C2=A0 exit($ret);
= }

msgDie(1) unless(scalar @ARGV eq 2);

my $base =3D $ARGV[0];=
my $dir =C2=A0=3D $ARGV[1];

msgDie(1, "base directory doesn= 't exist\n") unless -d $base;
msgDie(1, "source directory = doesn't exist\n") unless -d $dir;

opendir(my $dh, $dir) or = msgDie("Unable to open directory: $dir\n");
while(readdir $dh)=
{
=C2=A0 next if($_ eq '.' || $_ eq '..');
=C2=A0= next if(! -f "$base/$_");

=C2=A0 my ($ref) =3D (stat(&quo= t;$base/$_"))[7];
=C2=A0 my ($src) =3D (stat("$dir/$_"))[= 7];
=C2=A0 unlink("$dir/$_") if($ref =3D=3D $src);
}
<= div>#################################################

--
__________________

:(){ :|:& };:
--000000000000bd96d105fae9fe90--