Re: hard link pointing to itself?

From: Greg 'groggy' Lehey <grog_at_freebsd.org>
Date: Sun, 18 Feb 2024 01:14:28 UTC
On Saturday, 17 February 2024 at  8:28:53 -0700, Gary Aitken wrote:
> running 13.2-release,
> created a tar archive, went to extract on another 13.2-release system,
> and got several messages of the form:
>
> $ tar xf tmp.tar path-to-file/filename.jpg:
>  Skipping hardlink pointing to itself: path-to-file/filename.jpg:
>  File exists

Fascinating.  I'm rearranging the rest of your message to hopefully
explain things better.

> How does one tell it's a hard link?

The simple answer is: it *is* a hard link.  But read on.  There's also
a complicated answer.

> I can get the inode number using ls -il, but that doesn't tell me
> much regarding its being a hard link when there is only one file.

Yes, it does.  The inode number tells you that it's a hard link.

A step back: the Unix file systems identify files by the inode number.
The names are something like an afterthought: they're stored in
directories (and emphatically *not* folders), with a pointer to the
inode number, rather like names in a telephone directory.  And this
combination is called a link, from the name to the inode.

Later symbolic links came along.  They're a different beast: instead
of pointing to an inode, they point to another name.  But that gave
rise to the term "hard link" to distinguish real links from symbolic
links.

> If I go to the source directory on the original system and do:
>
> find . | grep filename.jpg
>
> I see only one file.
>
> Any idea how this file could have been created?

Yes, it's confusing.  It confused me too.  I went and took a look at
the sources (in this case the file
/usr/src/contrib/libarchive/libarchive/archive_write_disk_posix.c),
and found what's going on--I think.  The hard links aren't in the file
system, they're in the tar archive.  And one of the more obscure
things about a tar archive is that it needs to keep track of files
with multiple links (names).  It stores the file under one name, and
if there are any more, it creates a reference to the same file.  It
seems that this somewhat confusing message is saying that it
discovered some inconsistency that it (and the author of libarchive)
wasn't expecting.  From the source to archive_write_disk_header()
(round line 563):

  /*
   * Extract this entry to disk.
   *
   * TODO: Validate hardlinks.  According to the standards, we're
   * supposed to check each extracted hardlink and squawk if it refers
   * to a file that we didn't restore.  I'm not entirely convinced this
   * is a good idea, but more importantly: Is there any way to validate
   * hardlinks without keeping a complete list of filenames from the
   * entire archive?? Ugh.
   */

Without going into too much detail, this looks like some kind of bug.
I've tried to think of a number of scenarios, but I can't at the
moment.  It would be interesting to know what you were trying to do.
Does it happen when you try to extract the entire archive to an empty
hierarchy?  Does this file have multiple links?  That's the number in
the second column of ls -l output.  Normally it's 1 for a file, but if
there are additional links, it will show the number.

Another thing that might be interesting would be to try GNU tar
(gtar, in the ports).  It might accept the archive, or it might
produce a different error result.

> I'm pretty sure the original was a lower-resolution file written
> from gimp, but that wouldn't have been a hard link.  There's a
> reasonable chance I tried to create a hard link at some point, but I
> don't see a reference to it.

There's not much to reference.  To create a link, you do

  ln existing-file new-file

That's a normal ("hard") link.  A symlink would be

  ln -s existing-file new-file

> Suspiciously, the tarball had several of these, and quit after
> reporting a few on extraction.

My guess is that there might be two different issues here.  The
message you show is a warning, though it does mean that the file
doesn't get restored.  Was there another message at the end?

Greg
--
When replying to this message, please copy the original recipients.
If you don't, I may ignore the reply or reply to the original recipients.
For more information, see http://www.lemis.com/questions.html
Sent from my desktop computer.
See complete headers for address and phone numbers.
This message is digitally signed.  If your Microsoft mail program
reports problems, please read http://lemis.com/broken-MUA.php