Re: Tool to compare directories and delete duplicate files from one directory
Date: Sun, 07 May 2023 20:25:18 UTC
On 5/6/23 21:33, David Christensen wrote: > I thought I sent this, but it never hit the list (?) -- David > > > On 5/4/23 21:06, Kaya Saman wrote: > >> To start with this is the directory structure: >> >> >> ls -lhR /tmp/test1 >> total 1 >> drwxr-xr-x 2 root wheel 3B May 5 04:57 dupdir1 >> drwxr-xr-x 2 root wheel 3B May 5 04:57 dupdir2 >> >> /tmp/test1/dupdir1: >> total 1 >> -rw-r--r-- 1 root wheel 8B Apr 30 03:17 dup >> >> /tmp/test1/dupdir2: >> total 1 >> -rw-r--r-- 1 root wheel 7B May 5 03:23 dup1 >> >> >> ls -lhR /tmp/test2 >> total 1 >> drwxr-xr-x 2 root wheel 3B May 5 04:56 dupdir1 >> drwxr-xr-x 2 root wheel 3B May 5 04:56 dupdir2 >> >> /tmp/test2/dupdir1: >> total 1 >> -rw-r--r-- 1 root wheel 4B Apr 30 02:53 dup >> >> /tmp/test2/dupdir2: >> total 1 >> -rw-r--r-- 1 root wheel 7B Apr 30 02:47 dup1 >> >> >> So what I want to happen is the script to recurse from the top level >> directories test1 and test2 then expected behavior should be to >> remove file dup1 as dup is different between directories. > > > My previous post missed the mark, but I have been watching this thread > with interest (trepidation?). > > > I think Tim already identified a tool that will safely get you close > to your goal, if not all the way: > > On 5/4/23 09:28, Tim Daneliuk wrote: >> I've never used it, but there is a port of fdupes in the ports tree. >> Not sure if it does exactly what you want though. > > > fdupes(1) is also available as a package: > > 2023-05-04 21:25:31 toor@vf1 ~ > # freebsd-version; uname -a > 12.4-RELEASE-p2 > FreeBSD vf1.tracy.holgerdanske.com 12.4-RELEASE-p1 FreeBSD > 12.4-RELEASE-p1 GENERIC amd64 > > 2023-05-04 21:25:40 toor@vf1 ~ > # pkg search fdupes > fdupes-2.2.1,1 Program for identifying or deleting > duplicate files > > > Looking at the man page: > > https://man.freebsd.org/cgi/man.cgi?query=fdupes&sektion=1&manpath=FreeBSD+13.2-RELEASE+and+Ports > > > > I am fairly certain that you will want to give the destination > directory as the first argument and the source directories after that: > > $ fdupes --recurse /dir /dir_1 /dir_2 /dir_3 > > > The above will provide you with information, but not delete anything. > > > Practice under /tmp to gain familiarity with fdupes(1) is a good idea. > > > As you are using ZFS, I assume you know how to take snapshots and do > rollbacks (?). These could serve as backup and restore operations if > things go badly. > > > Given a 12+ TB of data, you may want the --noprompt option when you do > give the --delete option and actual arguments, > > > David > Thanks David! I tried using fdupes like this but I wasn't able to see anything. Probably because it took so long to run and never completed? It does actually feature a -d flag too which does delete stuff but from my testing this deletes all duplicates and doesn't allow you to choose the directory to delete the duplicate files from, unless I failed to understand the man page. At present the Perl script from Paul in it's last iteration solved my problem and was pretty fast at the same time. Of course at first I tested it on my test dirs in /tmp, then I took zfs snapshots on the actual working dirs and finally ran the script. It worked flawlessly. Regards, Kaya