From nobody Fri May 05 03:01:54 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QCFpS3kkFz49Ft5 for ; Fri, 5 May 2023 03:02:08 +0000 (UTC) (envelope-from pprocacci@gmail.com) Received: from mail-oo1-xc36.google.com (mail-oo1-xc36.google.com [IPv6:2607:f8b0:4864:20::c36]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QCFpR6cv6z3M7c for ; Fri, 5 May 2023 03:02:07 +0000 (UTC) (envelope-from pprocacci@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-oo1-xc36.google.com with SMTP id 006d021491bc7-54711bc097bso440837eaf.0 for ; Thu, 04 May 2023 20:02:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1683255726; x=1685847726; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=UlVc+w4DeSapKqX7hv60TMtnNRUB/CW11DoMjXzDtcQ=; b=Xlkr+NEfU2lY/jt81wHIHKFIxiKOuX+H0uVLWc9lJ+SWXmUZjOEqsg/qc/odxKKNUG lhwp03ruyXu6AdO1orantXCxWVf8UVHdyh/OM7PY04xIud7wLSYxJJJJCRCUslW2a57q TgZnfXqtjIurrw6wgwHEw9pzdzTRZ2WSNRtX8dgzBIr+Q3N48kP37XWTdrLFWoMZmoYj p6lXUz+7k/lUqlE5bIOWbrS0pjk/McPMcxBsE+ta0FjmTJArPtEfo5qwEplKq6yRPfV4 4mhvvrEGlSNghBA6g8Jpvv2Agh7s5PjlgeLFhaKrp+ct7vyzL8h1f52N3XPN42eY9YAC tCgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683255726; x=1685847726; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UlVc+w4DeSapKqX7hv60TMtnNRUB/CW11DoMjXzDtcQ=; b=BN/0lnLY/AX3daDdH64ozdJ1m76Bz0hcZjOP02yl490OU0bOHMDwJjuN5tPmPV/7tG 1gjmJ3IlRsNmWSeeQhLIMRnhjuczqa1B5y6gnLOEGGVA7MoO1uAjHvt4lfLAmEENrqUZ Bh7oEXqqo37j07ZEWL63tta03Kx03pHB8EakGiH6+WagJi/fnBU82MacGfQLgZOckvK+ DzMfpqiQmZHmGbhl2uIdncf2y+ufJU5NlRanQPcn04TubpVtmwcrqgbBBRxFHGQXY8fc Us0OxxbXMO4v6esEP6RiASBG0UjPb8ABlmby1Ndp0AKA98zusICOv2OiKIFczTNCEXDy 2Jjw== X-Gm-Message-State: AC+VfDwQWQ6Wm380ec8n86rIU5U+GV8ne4b83uSoM2ZhOg2FufuCgLhZ ietMtMzm9HZU8vKlsTOvu0GZAPF07BjO/9TOugFnvImbBK1Z X-Google-Smtp-Source: ACHHUZ7deSTXHpxVT49ljFbSrLutkh3bEqyCkYxHToOS0NuKL+IdJfAVe7MhzzqJlFEyRjj/nOiYRlxmZT8X+ze8asY= X-Received: by 2002:a05:6808:438f:b0:38e:dc5b:7bc0 with SMTP id dz15-20020a056808438f00b0038edc5b7bc0mr2230556oib.59.1683255726588; Thu, 04 May 2023 20:02:06 -0700 (PDT) List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> In-Reply-To: From: Paul Procacci Date: Thu, 4 May 2023 23:01:54 -0400 Message-ID: Subject: Re: Tool to compare directories and delete duplicate files from one directory To: Kaya Saman Cc: freebsd-questions@freebsd.org Content-Type: multipart/alternative; boundary="000000000000a649b605fae9831d" X-Rspamd-Queue-Id: 4QCFpR6cv6z3M7c X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --000000000000a649b605fae9831d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman wrote: > > On 5/5/23 03:08, Paul Procacci wrote: > > There are multiple reasons why it may not work. My guess is because the > potential for characters that could be showing up within the filenames an= d > whatnot. > > This can be solved with an interpreted language that's a bit more > forgiving. > Take the following perl script. It does the same thing as the shell > script (almost). It renames the source file instead of making a copy of = it. > > run as: ./test.pl /absolute/path/to/master_dir /absolute_path_to_dir_x > > #########################################################################= ########## > > #!/usr/bin/env perl > > use strict; > use warnings; > > sub msgDie > { > my ($ret) =3D shift; > my ($msg) =3D shift // "$0 dir_base dir\n"; > print $msg; > exit($ret); > } > > msgDie(1) unless(scalar @ARGV eq 2); > > my $base =3D $ARGV[0]; > my $dir =3D $ARGV[1]; > > msgDie(1, "base directory doesn't exist\n") unless -d $base; > msgDie(1, "source directory doesn't exist\n") unless -d $dir; > > opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); > while(readdir $dh) > { > next if($_ eq '.' || $_ eq '..'); > if( ! -f "$base/$_" ){ > rename("$dir/$_", "$base/$_"); > next; > } > > my ($ref) =3D (stat("$base/$_"))[7]; > my ($src) =3D (stat("$dir/$_"))[7]; > unlink("$dir/$_") if($ref =3D=3D $src); > } > > #########################################################################= ########## > > ~Paul > > > > This didn't seem to work :-( > > > What exactly happened is this: > > > I created a set of test directories in /tmp > > > So, I have /tmp/test1 and /tmp/test2 > > > to mimic the structure of the directories I intend to run this thing I di= d > this: > > > create a subdir called: dupdir in /tmp/test1 and /tmp/test2 > > > /tmp/test2/dupdir contains these files: dup and dup1 > > > /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file. > > > However*, now things get interesting as dup from test1 contains "1234567" > and dup from test2 contains "111" <- this is to simulate the file size > difference. > > > > > > Worked for me! Regardless. Use rsync then. rsync --ignore-existing --remove-source-files /src /dest This would at the very least move non-existent files from the source over to the dest AND remove those source files AFTER the transfer happens. You'll be 1/2 way there doing that. What you'll be left with are file that exist in BOTH src AND DEST. ~Paul --000000000000a649b605fae9831d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, May 4, 2023 at 10:30=E2=80=AFPM Kaya Saman &l= t;kaya= saman@optiplex-networks.com> wrote:
=20 =20 =20


On 5/5/23 03:08, Paul Procacci wrote:
=20
There are multiple reasons why it may not work.=C2=A0 My guess is because the potential for characters that could be showing up within the filenames and whatnot.

This can be solved with an interpreted language that's a bit more forgiving.
Take the following perl script.=C2=A0 It does the same thing a= s the shell script (almost).=C2=A0 It renames the source file inste= ad of making a copy of it.

run as:=C2=A0 ./test= .pl /absolute/path/to/master_dir /absolute_path_to_dir_x

###########################################################################= ########
#!/usr/bin/env perl

use strict;
use warnings;

sub msgDie
{
=C2=A0 my ($ret) =3D shift;
=C2=A0 my ($msg) =3D shift // "$0 dir_base dir\n";
=C2=A0 print $msg;
=C2=A0 exit($ret);
}

msgDie(1) unless(scalar @ARGV eq 2);

my $base =3D $ARGV[0];
my $dir =C2=A0=3D $ARGV[1];

msgDie(1, "base directory doesn't exist\n") unless = -d $base;
msgDie(1, "source directory doesn't exist\n") unles= s -d $dir;

opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n");
while(readdir $dh)
{
=C2=A0 next if($_ eq '.' || $_ eq '..');
=C2=A0 if( ! -f "$base/$_" ){
=C2=A0 =C2=A0 rename("$dir/$_", "$base/$_");<= br> =C2=A0 =C2=A0 next;
=C2=A0 }

=C2=A0 my ($ref) =3D (stat("$base/$_"))[7];
=C2=A0 my ($src) =3D (stat("$dir/$_"))[7];
=C2=A0 unlink("$dir/$_") if($ref =3D=3D $src);
}
###########################################################################= ########

~Paul



This didn't seem to work :-(


What exactly happened is this:


I created a set of test directories in /tmp


So, I have /tmp/test1 and /tmp/test2


to mimic the structure of the directories I intend to run this thing I did this:


create a subdir called: dupdir in /tmp/test1 and /tmp/test2


/tmp/test2/dupdir contains these files: dup and dup1


/tmp/test1/dupdir contains a modified 'dup' file but copied = dup1 file.


However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 contains "111" <-= this is to simulate the file size difference.





=C2=A0
Wor= ked for me!=C2=A0 Regardless.=C2=A0 Use rsync then.

rsync --ignore-existing=20 --remove-source-files=C2=A0 /src /dest
This would at the very=
 least move non-existent files from the source over to the dest AND remove =
those source files AFTER the transfer happens.
Y=
ou'll be 1/2 way there doing that.  What you'll be left with are fi=
le that exist in BOTH src AND DEST.

~Paul
--000000000000a649b605fae9831d--