Re: Tool to compare directories and delete duplicate files from one directory
Date: Fri, 05 May 2023 03:20:23 UTC
On 5/5/23 04:01, Paul Procacci wrote: > On Thu, May 4, 2023 at 10:30 PM Kaya Saman > <kayasaman@optiplex-networks.com> wrote: > > > On 5/5/23 03:08, Paul Procacci wrote: >> There are multiple reasons why it may not work. My guess is >> because the potential for characters that could be showing up >> within the filenames and whatnot. >> >> This can be solved with an interpreted language that's a bit more >> forgiving. >> Take the following perl script. It does the same thing as the >> shell script (almost). It renames the source file instead of >> making a copy of it. >> >> run as: ./test.pl <http://test.pl> /absolute/path/to/master_dir >> /absolute_path_to_dir_x >> >> ################################################################################### >> >> #!/usr/bin/env perl >> >> use strict; >> use warnings; >> >> sub msgDie >> { >> my ($ret) = shift; >> my ($msg) = shift // "$0 dir_base dir\n"; >> print $msg; >> exit($ret); >> } >> >> msgDie(1) unless(scalar @ARGV eq 2); >> >> my $base = $ARGV[0]; >> my $dir = $ARGV[1]; >> >> msgDie(1, "base directory doesn't exist\n") unless -d $base; >> msgDie(1, "source directory doesn't exist\n") unless -d $dir; >> >> opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); >> while(readdir $dh) >> { >> next if($_ eq '.' || $_ eq '..'); >> if( ! -f "$base/$_" ){ >> rename("$dir/$_", "$base/$_"); >> next; >> } >> >> my ($ref) = (stat("$base/$_"))[7]; >> my ($src) = (stat("$dir/$_"))[7]; >> unlink("$dir/$_") if($ref == $src); >> } >> ################################################################################### >> >> ~Paul >> >> > > This didn't seem to work :-( > > > What exactly happened is this: > > > I created a set of test directories in /tmp > > > So, I have /tmp/test1 and /tmp/test2 > > > to mimic the structure of the directories I intend to run this > thing I did this: > > > create a subdir called: dupdir in /tmp/test1 and /tmp/test2 > > > /tmp/test2/dupdir contains these files: dup and dup1 > > > /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file. > > > However*, now things get interesting as dup from test1 contains > "1234567" and dup from test2 contains "111" <- this is to simulate > the file size difference. > > > > > > > Worked for me! Regardless. Use rsync then. > > rsync --ignore-existing --remove-source-files /src /dest > |This would at the very least move non-existent files from the source > over to the dest AND remove those source files AFTER the transfer > happens. | > |You'll be 1/2 way there doing that. What you'll be left with are file > that exist in BOTH src AND DEST. | > |~Paul | Paul, I think we've got wires crossed.... I *have* already performed the rsync. Apologies if I wasn't clear! The problem I am faced with is that the destination directory is already populated with the information from 3 source directories. I need to remove the sync'ed files in the source directories and leave files that match in name but are of different sizes. The problem is I can't use rsync again for this as there aren't any options to simply compare files based on size. I can't use the --existing option as the files exist in both directories.... This is the dilemma I am facing: ls -l /merged_dir/folder/ 234904506 - file 'a' ls -l /source_dir/folder/ 1080918146 - file 'a' so in this case file 'a' is in both directories with the same name but different size. I need to keep both versions. However, *if* they were the same size then remove the file in the source_dir..... That's all.. I don't need to transfer anything or copy anything at all... just compare and remove files of same name and size. Hopefully I am explaining better and things are more clear? Again I apologize for the confusion :-(