From nobody Sun May 07 20:25:18 2023 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QDwsN3ZDNz49Sx6 for ; Sun, 7 May 2023 20:25:28 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Received: from mail.optiplex-networks.com (mail.optiplex-networks.com [212.159.80.20]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4QDwsL53qHz4ZTl for ; Sun, 7 May 2023 20:25:26 +0000 (UTC) (envelope-from kayasaman@optiplex-networks.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=optiplex-networks.com header.s=AE93A2AC-7F67-11EA-90AE-8A1FE64F6997 header.b=NbubRmlc; spf=pass (mx1.freebsd.org: domain of kayasaman@optiplex-networks.com designates 212.159.80.20 as permitted sender) smtp.mailfrom=kayasaman@optiplex-networks.com Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 6763515C2EE3 for ; Sun, 7 May 2023 21:25:19 +0100 (BST) Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id cldzTzp0yvjg for ; Sun, 7 May 2023 21:25:18 +0100 (BST) Received: from localhost (localhost [127.0.0.1]) by mail.optiplex-networks.com (Postfix) with ESMTP id 958C415C38B1 for ; Sun, 7 May 2023 21:25:18 +0100 (BST) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.optiplex-networks.com 958C415C38B1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=optiplex-networks.com; s=AE93A2AC-7F67-11EA-90AE-8A1FE64F6997; t=1683491118; bh=iyPG0DxESJce/uUNeJjSGTjfnpYeJiU1d+kWIn3BpYc=; h=Message-ID:Date:MIME-Version:To:From; b=NbubRmlcZVr14di9rmN87+zsMf3nB0GfQ/qw2KPuvGm2ttwqRBXG9zIPVg2d8934Y n7R9YlBJ4GWdgssf9cvikW6aGhIP6ZUAuXYzCEkSFPyfANKi4RMQDNSgxafNeZhrkL X4TJCJVWS4WDKJ9olqlkF9DhI3CuJOwY6BPPyIybl0KDGyDohQBU8Q/6PXCzZROAZT DqX2zC74nzXCqaKppp08tVDbqY5RlvymPIj5N4RTR4Sui2d4xfnnyX61FWtPyVZVN8 QS9LjTT5beMqBKE3pCqbzZxfOogH+vOjw4EMBghwOOQSh0v8qrMSe3EUbFZeT5GAoP kEap4oTxp46uA== X-Virus-Scanned: amavisd-new at mail.optiplex-networks.com Received: from mail.optiplex-networks.com ([127.0.0.1]) by localhost (mail.optiplex-networks.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id eebrBJ5fJMS7 for ; Sun, 7 May 2023 21:25:18 +0100 (BST) Received: from [192.168.20.23] (unknown [192.168.20.23]) by mail.optiplex-networks.com (Postfix) with ESMTPSA id 8171D15C2EE3 for ; Sun, 7 May 2023 21:25:18 +0100 (BST) Message-ID: <6a0aba81-485a-8985-d20d-6da58e9b5580@optiplex-networks.com> Date: Sun, 7 May 2023 21:25:18 +0100 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.1 Subject: Re: Tool to compare directories and delete duplicate files from one directory Content-Language: en-US To: questions@freebsd.org References: <9887a438-95e7-87cc-a162-4ad7a70d744f@optiplex-networks.com> <344b29c6-3d69-543d-678d-c2433dbf7152@optiplex-networks.com> <7747f587-f33e-f39c-ac97-fe4fe19e0b76@optiplex-networks.com> <7c2429c5-55d0-1649-a442-ce543f2d46c2@holgerdanske.com> From: Kaya Saman In-Reply-To: <7c2429c5-55d0-1649-a442-ce543f2d46c2@holgerdanske.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4QDwsL53qHz4ZTl X-Spamd-Bar: / X-Spamd-Result: default: False [-0.50 / 15.00]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[optiplex-networks.com:s=AE93A2AC-7F67-11EA-90AE-8A1FE64F6997]; MIME_GOOD(-0.10)[text/plain]; FROM_EQ_ENVFROM(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; DKIM_TRACE(0.00)[optiplex-networks.com:+]; FROM_HAS_DN(0.00)[]; ASN(0.00)[asn:6871, ipnet:212.159.64.0/18, country:GB]; MLMMJ_DEST(0.00)[questions@freebsd.org]; MIME_TRACE(0.00)[0:+]; local_wl_ip(0.00)[212.159.80.20] X-Rspamd-Pre-Result: action=no action; module=multimap; Matched map: local_wl_ip X-ThisMailContainsUnwantedMimeParts: N On 5/6/23 21:33, David Christensen wrote: > I thought I sent this, but it never hit the list (?) -- David > > > On 5/4/23 21:06, Kaya Saman wrote: > >> To start with this is the directory structure: >> >> >> =C2=A0=C2=A0ls -lhR /tmp/test1 >> total 1 >> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0= 5 04:57 dupdir1 >> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0= 5 04:57 dupdir2 >> >> /tmp/test1/dupdir1: >> total 1 >> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 8B Apr 30 = 03:17 dup >> >> /tmp/test1/dupdir2: >> total 1 >> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 7B May=C2=A0= 5 03:23 dup1 >> >> >> ls -lhR /tmp/test2 >> total 1 >> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0= 5 04:56 dupdir1 >> drwxr-xr-x=C2=A0 2 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 3B May=C2=A0= 5 04:56 dupdir2 >> >> /tmp/test2/dupdir1: >> total 1 >> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 4B Apr 30 = 02:53 dup >> >> /tmp/test2/dupdir2: >> total 1 >> -rw-r--r--=C2=A0 1 root=C2=A0 wheel=C2=A0=C2=A0=C2=A0=C2=A0 7B Apr 30 = 02:47 dup1 >> >> >> So what I want to happen is the script to recurse from the top level=20 >> directories test1 and test2 then expected behavior should be to=20 >> remove file dup1 as dup is different between directories. > > > My previous post missed the mark, but I have been watching this thread=20 > with interest (trepidation?). > > > I think Tim already identified a tool that will safely get you close=20 > to your goal, if not all the way: > > On 5/4/23 09:28, Tim Daneliuk wrote: >> I've never used it, but there is a port of fdupes in the ports tree. >> Not sure if it does exactly what you want though. > > > fdupes(1) is also available as a package: > > 2023-05-04 21:25:31 toor@vf1 ~ > # freebsd-version; uname -a > 12.4-RELEASE-p2 > FreeBSD vf1.tracy.holgerdanske.com 12.4-RELEASE-p1 FreeBSD=20 > 12.4-RELEASE-p1 GENERIC=C2=A0 amd64 > > 2023-05-04 21:25:40 toor@vf1 ~ > # pkg search fdupes > fdupes-2.2.1,1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Program for identifying or deleti= ng=20 > duplicate files > > > Looking at the man page: > > https://man.freebsd.org/cgi/man.cgi?query=3Dfdupes&sektion=3D1&manpath=3D= FreeBSD+13.2-RELEASE+and+Ports=20 > > > > I am fairly certain that you will want to give the destination=20 > directory as the first argument and the source directories after that: > > $ fdupes --recurse /dir /dir_1 /dir_2 /dir_3 > > > The above will provide you with information, but not delete anything. > > > Practice under /tmp to gain familiarity with fdupes(1) is a good idea. > > > As you are using ZFS, I assume you know how to take snapshots and do=20 > rollbacks (?).=C2=A0 These could serve as backup and restore operations= if=20 > things go badly. > > > Given a 12+ TB of data, you may want the --noprompt option when you do=20 > give the --delete option and actual arguments, > > > David > Thanks David! I tried using fdupes like this but I wasn't able to see anything.=20 Probably because it took so long to run and never completed? It does=20 actually feature a -d flag too which does delete stuff but from my=20 testing this deletes all duplicates and doesn't allow you to choose the=20 directory to delete the duplicate files from, unless I failed to=20 understand the man page. At present the Perl script from Paul in it's last iteration solved my=20 problem and was pretty fast at the same time. Of course at first I tested it on my test dirs in /tmp, then I took zfs=20 snapshots on the actual working dirs and finally ran the script. It=20 worked flawlessly. Regards, Kaya