Re: Tool to compare directories and delete duplicate files from one directory
Date: Thu, 04 May 2023 23:53:14 UTC
On 5/4/23 23:32, Paul Procacci wrote: > > > On Thu, May 4, 2023 at 5:47 PM Kaya Saman > <kayasaman@optiplex-networks.com> wrote: > > > On 5/4/23 17:29, Paul Procacci wrote: >> >> >> On Thu, May 4, 2023 at 11:53 AM Kaya Saman >> <kayasaman@optiplex-networks.com> wrote: >> >> Hi, >> >> >> I'm wondering if anyone knows of a tool like diff or so that >> can also >> delete files based on name and size from either left/right or >> source/destination directory? >> >> >> Basically what I have done is performed an rsync without >> using the >> --remove-source-files option onto a newly bought and created >> disk pool >> (yes zpool) that i am trying to consolidate my data - as it's >> currently >> spread out over multiple pools with the same folder name. >> >> >> The issue I am facing mainly is that I perform another rsync >> and use the >> --remove-source-files option, rsync will delete files based >> on name >> while there are some files that have the same name but not >> same size and >> I would like to retain these files. >> >> >> Right now I have looked at many different options in both >> rsync and >> other tools but found nothing suitable. I even tested using a >> few test >> dirs and files that I put into /tmp and whatever I tried, the >> files of >> different size either got transferred or deleted. >> >> >> How would be a good way to approach this problem? >> >> >> Even if I create some kind of shell script and use diff, I >> think it will >> only compare names and not file sizes. >> >> >> I'm really lost here.... >> >> >> Regards, >> >> >> Kaya >> >> >> >> >> It sounds like you want fdupes. It's in the ports tree. >> >> ~Paul >> >> -- >> __________________ >> >> :(){ :|:& };: > > > > I tried fdupes and installed it a while back. For me it felt like > it only works on a single directory. > > > My dir structure is that I have" > > > /dir <- main directory where everything has now been rsync'ed to > > /dir_1 <- old directory with partial content > > /dir_2 <- more partial content > > /dir_3 <- more partial content > > > The key thing here is that I need to compare: > > > /dir_(x) with /dir > > > if the files are different sizes in /dir_(x) then leave them, > otherwise delete if both name and file size are the same. > > > Then a tiny shell script does the job assuming your files don't have > any spaces and no weird characters exist: > > #!/bin/sh > > for i in b c d; > do > ls $i/ | while read file; > do > [ ! -f a/$file ] && cp $i/$file a/$file && continue > > ref=`stat -f '%z' a/$file` > src=`stat -f '%z' %i/$file` > [ $ref -eq $src ] && rm -f $i/file > > done > done > > Change paths accordingly and backup your stuff. ;) > > ~Paul > > -- > __________________ > > :(){ :|:& };: Thanks Paul, I should be able to work with this. There are actually spaces and weird characters in the file names so I assume doing something like "file" should allow for that? I don't think I need the line after the 'do' statement do I? From what I understand it copies the file from directory i to directory a? As I explained initially, the files have already been rsync'ed so I just need to compare and delete accordingly. When I performed the rsync it took around a week to complete per run, currently zfs list shows around 12TB usage for my /dir but that's with compression enabled, of the merged directory. A quick Google shows that I can use something like this: |search_dir=/the/path/to/base/dir for entry in "$search_dir"/* do echo "$entry" done| To list the files in the directory though this might be Bash and not Csh Otherwise clunkily (my scripting style is pretty rubbish and non efficient), I could do something like (it probably won't work!): #!/bin/sh #fb = file base #fm - file merge - file that has already been merged using rsync unless size was different dir_base=/dir for fb in "$dir_base"/* do echo "$fs" done dir_merge=/dir_1 for fm in "$dir_merge"/* do echo "$fm" done do ref=`stat -f '%z' $dir_base/$fb` src=`stat -f '%z' %i$dir_merge/$fm` [ $ref -eq $src ] && rm -f $dir_merge/$fm done Regards, Kaya