From nobody Fri May 05 03:20:23 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QCGCY70Q5z49JHt for ; Fri, 5 May 2023 03:20:25 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Received: from mail.optiplex-networks.com (mail.optiplex-networks.com [212.159.80.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QCGCY4K7Gz3P31 for ; Fri, 5 May 2023 03:20:25 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Authentication-Results: mx1.freebsd.org; none Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 4AB6515C2DBC; Fri, 5 May 2023 04:20:24 +0100 (BST) Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id I5GQA6PVQ0Vd; Fri, 5 May 2023 04:20:23 +0100 (BST) Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 6C9AA15C2EC6; Fri, 5 May 2023 04:20:23 +0100 (BST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.optiplex-networks.com 6C9AA15C2EC6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=optiplex-networks.com; s=AE93A2AC-7F67-11EA-90AE-8A1FE64F6997; t=1683256823; bh=oVXjp8prtxk8x/Wj4RT78Nw+WaV50MyE0wELsyokYfE=; h=Message-ID:Date:MIME-Version:To:From; b=KPAwn2a00itP2DgJrn+C/yaqqR1c8x+OLGk5I577J9kIPghXJTjHQsGzjdUuyp/GT U3O8IGSTQbJmMu+lQQesJ3ZAD6yxfzx5DMppxd7wKda/Q/Mt0RHDcUS5XfFxjw5lQ1 iUFO6f32QuZlXwksq4yU977+j42V8p2tpGOHhlSurmP+ncLqNPk+tIohxtE4ZYkkw2 SLeJKWNo40QdmCJriitbPvgQIUe9DltgwzZlYO+hytb6SWGF1qga7aJdBdLeXoLW07 UbyZHefqEkuacScKG9ZBeCe1O+vS85npV7p8L8R4krLDl8iDwvp9MmYD9FKLkT8HWK TMZGQLlqi5jGQ== X-Virus-Scanned: amavisd-new at mail.optiplex-networks.com Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id dLBu-hMy_d8x; Fri, 5 May 2023 04:20:23 +0100 (BST) Received: from [192.168.20.23] (unknown [192.168.20.23]) by mail.optiplex-networks.com (Postfix) with ESMTPSA id 4ED0515C2DBC; Fri, 5 May 2023 04:20:23 +0100 (BST) Content-Type: multipart/alternative; boundary="------------d0U3TqYER7CIuMFlsBlTFHlk" Message-ID: Date: Fri, 5 May 2023 04:20:23 +0100 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: Tool to compare directories and delete duplicate files from one directory Content-Language: en-US To: Paul Procacci Cc: freebsd-questions@freebsd.org References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> From: Kaya Saman In-Reply-To: X-Rspamd-Queue-Id: 4QCGCY4K7Gz3P31 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6871, ipnet:212.159.64.0/18, country:GB] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N This is a multi-part message in MIME format. --------------d0U3TqYER7CIuMFlsBlTFHlk Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable On 5/5/23 04:01, Paul Procacci wrote: > On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman=20 > wrote: > > > On 5/5/23 03:08, Paul Procacci wrote: >> There are multiple reasons why it may not work.=C2=A0 My guess is >> because the potential for characters that could be showing up >> within the filenames and whatnot. >> >> This can be solved with an interpreted language that's a bit more >> forgiving. >> Take the following perl script.=C2=A0 It does the same thing as th= e >> shell script (almost).=C2=A0 It renames the source file instead of >> making a copy of it. >> >> run as:=C2=A0 ./test.pl /absolute/path/to/master_= dir >> /absolute_path_to_dir_x >> >> ##################################################################= ################# >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings; >> >> sub msgDie >> { >> =C2=A0 my ($ret) =3D shift; >> =C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n"; >> =C2=A0 print $msg; >> =C2=A0 exit($ret); >> } >> >> msgDie(1) unless(scalar @ARGV eq 2); >> >> my $base =3D $ARGV[0]; >> my $dir =C2=A0=3D $ARGV[1]; >> >> msgDie(1, "base directory doesn't exist\n") unless -d $base; >> msgDie(1, "source directory doesn't exist\n") unless -d $dir; >> >> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"= ); >> while(readdir $dh) >> { >> =C2=A0 next if($_ eq '.' || $_ eq '..'); >> =C2=A0 if( ! -f "$base/$_" ){ >> =C2=A0 =C2=A0 rename("$dir/$_", "$base/$_"); >> =C2=A0 =C2=A0 next; >> =C2=A0 } >> >> =C2=A0 my ($ref) =3D (stat("$base/$_"))[7]; >> =C2=A0 my ($src) =3D (stat("$dir/$_"))[7]; >> =C2=A0 unlink("$dir/$_") if($ref =3D=3D $src); >> } >> ##################################################################= ################# >> >> ~Paul >> >> > > This didn't seem to work :-( > > > What exactly happened is this: > > > I created a set of test directories in /tmp > > > So, I have /tmp/test1 and /tmp/test2 > > > to mimic the structure of the directories I intend to run this > thing I did this: > > > create a subdir called: dupdir in /tmp/test1 and /tmp/test2 > > > /tmp/test2/dupdir contains these files: dup and dup1 > > > /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 fi= le. > > > However*, now things get interesting as dup from test1 contains > "1234567" and dup from test2 contains "111" <- this is to simulate > the file size difference. > > > > > > > Worked for me!=C2=A0 Regardless.=C2=A0 Use rsync then. > > rsync --ignore-existing --remove-source-files=C2=A0 /src /dest > |This would at the very least move non-existent files from the source=20 > over to the dest AND remove those source files AFTER the transfer=20 > happens. | > |You'll be 1/2 way there doing that. What you'll be left with are file=20 > that exist in BOTH src AND DEST. | > |~Paul | Paul, I think we've got wires crossed.... I *have* already performed the rsync. Apologies if I wasn't clear! The problem I am faced with is that the destination directory is already=20 populated with the information from 3 source directories. I need to remove the sync'ed files in the source directories and leave=20 files that match in name but are of different sizes. The problem is I can't use rsync again for this as there aren't any=20 options to simply compare files based on size. I can't use the=20 --existing option as the files exist in both directories.... This is the dilemma I am facing: ls -l /merged_dir/folder/ 234904506 - file 'a' ls -l /source_dir/folder/ 1080918146 - file 'a' so in this case file 'a' is in both directories with the same name but=20 different size. I need to keep both versions. However, *if* they were=20 the same size then remove the file in the source_dir..... That's all.. I don't need to transfer anything or copy anything at=20 all... just compare and remove files of same name and size. Hopefully I am explaining better and things are more clear? Again I=20 apologize for the confusion=C2=A0 :-( --------------d0U3TqYER7CIuMFlsBlTFHlk Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On 5/5/23 04:01, Paul Procacci wrote:<= br>
On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman <kayasaman@optiplex-networks.c= om> wrote:


On 5/5/23 03:08, Paul Procacci wrote:
There are multiple reasons why it may not work.=C2=A0 My guess is because the potential for characters that could be showing up within the filenames and whatnot.

This can be solved with an interpreted language that's a bit more forgiving.
Take the following perl script.=C2=A0 It does th= e same thing as the shell script (almost).=C2=A0 It renames the source file instead of making a copy of it.

run as:=C2=A0 ./test.p= l /absolute/path/to/master_dir /absolute_path_to_dir_x

#########################################################################= ##########
#!/usr/bin/env perl

use strict;
use warnings;

sub msgDie
{
=C2=A0 my ($ret) =3D shift;
=C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n";<= br> =C2=A0 print $msg;
=C2=A0 exit($ret);
}

msgDie(1) unless(scalar @ARGV eq 2);

my $base =3D $ARGV[0];
my $dir =C2=A0=3D $ARGV[1];

msgDie(1, "base directory doesn't exist\n") unless -d $base;
msgDie(1, "source directory doesn't exist\n") unless -d $dir;

opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
while(readdir $dh)
{
=C2=A0 next if($_ eq '.' || $_ eq '..');
=C2=A0 if( ! -f "$base/$_" ){
=C2=A0 =C2=A0 rename("$dir/$_", "$base/$_");
=C2=A0 =C2=A0 next;
=C2=A0 }

=C2=A0 my ($ref) =3D (stat("$base/$_"))[7];
=C2=A0 my ($src) =3D (stat("$dir/$_"))[7];
=C2=A0 unlink("$dir/$_") if($ref =3D=3D $src);
}
#########################################################################= ##########

~Paul



This didn't seem to work :-(


What exactly happened is this:


I created a set of test directories in /tmp


So, I have /tmp/test1 and /tmp/test2


to mimic the structure of the directories I intend to run this thing I did this:


create a subdir called: dupdir in /tmp/test1 and /tmp/test2


/tmp/test2/dupdir contains these files: dup and dup1


/tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file.


However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 contains "111" <- this is to simulate the file size difference.





=C2=A0
Worked for me!=C2=A0 Regardless.=C2=A0 Use rsync then.

rsync --ignore-existing --remove-source-files=C2=A0 /src /dest
This would at the very least move non-existent f=
iles from the source over to the dest AND remove those source files AFTER=
 the transfer happens.
You'll be 1/2 way there doing that.  What you'll=
 be left with are file that exist in BOTH src AND DEST.

~Paul


Paul, I think we've got wires crossed....


I *have* already performed the rsync. Apologies if I wasn't clear!


The problem I am faced with is that the destination directory is already populated with the information from 3 source directories.


I need to remove the sync'ed files in the source directories and leave files that match in name but are of different sizes.


The problem is I can't use rsync again for this as there aren't any options to simply compare files based on size. I can't use the --existing option as the files exist in both directories....


This is the dilemma I am facing:


ls -l /merged_dir/folder/

234904506 - file 'a'


ls -l /source_dir/folder/

1080918146 - file 'a'


so in this case file 'a' is in both directories with the same name but different size. I need to keep both versions. However, *if* they were the same size then remove the file in the source_dir.....


That's all.. I don't need to transfer anything or copy anything at all... just compare and remove files of same name and size.


Hopefully I am explaining better and things are more clear? Again I apologize for the confusion=C2=A0 :-(

--------------d0U3TqYER7CIuMFlsBlTFHlk--