[BUG] Getting path to program binary sometimes fails

Fri Nov 14 13:30:30 UTC 2014

On Nov 13, 2014, at 8:07 PM, Konstantin Belousov <kostikbel at gmail.com> wrote:

> This is not a defect.  The vnode->path translation uses namecache, which
> could be purged at any time.  The behaviour is typical for most unix
> implementations.  Linux and new Solaris have 'rigid' namecache, where
> name entry lifetime is the same as the vnode lifetime it is attached to.
> I am not aware of any useful consequences of such design, except
> vn_fullpath() working more reliable, but at the cost of increased
> memory usage.

The man page for sysctl(3) states that “Unless explicitly noted below, sysctl() returns a consistent snapshot of the data requested” (surely we don’t expect half the path being returned; I’m just trying to read thoroughly). Later on there are no special notes on {CTL_KERN, KERN_PROC, KERN_PROC_PATHNAME}; at least no notes on the unstable behavior being observed, and no funny details of internal implementation you describe. ERRORS section only describes ENOENT condition as “The name array specifies a value that is unknown,” which certainly is not the case here.

Since you’re saying that current behavior is not a defect, maybe documentation is wrong (incomplete, misleading) then? I will readily accept the “not a defect” explanation, but only if one wouldn’t have to ask you every time this oddity is met. If this is the expected error condition, what should I do to get the path reliably? Should I retry (and how many times)? You’re saying cache is being purged; does it mean that when I ask for path then cache is populated again? Does it guarantee then that I’ll be able to get the path on next call? Could you guarantee that I’ll be able to get the path at all if I fail two or more times? Should I rely on ENOENT specifically when retrying?

It would also be nice if you could tell whether anything had possibly changed between 8 and 9 releases that could lead to this behavior. As I said before, same code works on FreeBSD 8 with no errors for more than two years. Moreover, I didn’t previously mention that but 8 and 9 systems which I’m currently testing on are installed on completely identical hardware.

> Another possible reason for failed translation is the replacement of
> the binary while it runs.  There, rigid namecache does not help.

Not the case here.

Kind regards,
Mike