Re: Tool to compare directories and delete duplicate files from one directory
Date: Fri, 05 May 2023 02:30:14 UTC
On 5/5/23 03:08, Paul Procacci wrote: > There are multiple reasons why it may not work. My guess is because > the potential for characters that could be showing up within the > filenames and whatnot. > > This can be solved with an interpreted language that's a bit more > forgiving. > Take the following perl script. It does the same thing as the shell > script (almost). It renames the source file instead of making a copy > of it. > > run as: ./test.pl <http://test.pl> /absolute/path/to/master_dir > /absolute_path_to_dir_x > > ################################################################################### > > #!/usr/bin/env perl > > use strict; > use warnings; > > sub msgDie > { > my ($ret) = shift; > my ($msg) = shift // "$0 dir_base dir\n"; > print $msg; > exit($ret); > } > > msgDie(1) unless(scalar @ARGV eq 2); > > my $base = $ARGV[0]; > my $dir = $ARGV[1]; > > msgDie(1, "base directory doesn't exist\n") unless -d $base; > msgDie(1, "source directory doesn't exist\n") unless -d $dir; > > opendir(my $dh, $dir) or msgDie("Unable to open directory: $dir\n"); > while(readdir $dh) > { > next if($_ eq '.' || $_ eq '..'); > if( ! -f "$base/$_" ){ > rename("$dir/$_", "$base/$_"); > next; > } > > my ($ref) = (stat("$base/$_"))[7]; > my ($src) = (stat("$dir/$_"))[7]; > unlink("$dir/$_") if($ref == $src); > } > ################################################################################### > > ~Paul > > This didn't seem to work :-( What exactly happened is this: I created a set of test directories in /tmp So, I have /tmp/test1 and /tmp/test2 to mimic the structure of the directories I intend to run this thing I did this: create a subdir called: dupdir in /tmp/test1 and /tmp/test2 /tmp/test2/dupdir contains these files: dup and dup1 /tmp/test1/dupdir contains a modified 'dup' file but copied dup1 file. However*, now things get interesting as dup from test1 contains "1234567" and dup from test2 contains "111" <- this is to simulate the file size difference. I then ran: ./test.pl /tmp/test1 /tmp/test2 The expected behavior is that I should retain the file 'dup' in test1 while 'dup1' should be removed. In my actual file system I have many of these subdirs, so a fair test would probably be something like creating: /tmp/test1/dupdir1 /tmp/test2/dupdir1 /tmp/test1/dupdir2 /tmp/test2/dupdir2 then putting the file dup into dupdir1 and dup1 into dupdir2 I guess my issue is complex?? If I only I had used the --remove-source-files option during my initial rsync then I wouldn't have had to worry about any of this since I used the --ignore-existing option so that would have done the trick initially, but I decided to play safe instead and now ended up with a slight headache on my hands.