[BUG] Getting path to program binary sometimes fails
John Baldwin
jhb at freebsd.org
Thu Nov 20 17:57:11 UTC 2014
On Friday, November 14, 2014 4:54:18 am Mike Gelfand wrote:
> On Nov 13, 2014, at 8:07 PM, Konstantin Belousov <kostikbel at gmail.com>
wrote:
>
> > This is not a defect. The vnode->path translation uses namecache, which
> > could be purged at any time. The behaviour is typical for most unix
> > implementations. Linux and new Solaris have 'rigid' namecache, where
> > name entry lifetime is the same as the vnode lifetime it is attached to.
> > I am not aware of any useful consequences of such design, except
> > vn_fullpath() working more reliable, but at the cost of increased
> > memory usage.
>
> The man page for sysctl(3) states that “Unless explicitly noted below,
sysctl() returns a consistent snapshot of the data requested” (surely we don’t
expect half the path being returned; I’m just trying to read thoroughly).
Later on there are no special notes on {CTL_KERN, KERN_PROC,
KERN_PROC_PATHNAME}; at least no notes on the unstable behavior being
observed, and no funny details of internal implementation you describe. ERRORS
section only describes ENOENT condition as “The name array specifies a value
that is unknown,” which certainly is not the case here.
Note that sysctl(3) is describing a generic interface that mostly returns
integers. The language is trying to state that when you read the values you
get a consistent snapshot of whatever logical values a node provides. (e.g.
for a 64-bit int on a 32-bit system it will try to return a consistent value
rather than one which mixes 32-bit halves from different values of the
associated varaible, or things like the kern.cp_times sysctl (for the
cp_times[] array) will return a consistent snapshot of the entire array of
ints). It is not saying that a node is not permitted to say "I have no valid
data at this time." If anything, I think that a node is obligated to return
that instead of a partial data (as you somewhat noted).
> Since you’re saying that current behavior is not a defect, maybe
documentation is wrong (incomplete, misleading) then? I will readily accept
the “not a defect” explanation, but only if one wouldn’t have to ask you every
time this oddity is met. If this is the expected error condition, what should
I do to get the path reliably? Should I retry (and how many times)? You’re
saying cache is being purged; does it mean that when I ask for path then cache
is populated again? Does it guarantee then that I’ll be able to get the path
on next call? Could you guarantee that I’ll be able to get the path at all if
I fail two or more times? Should I rely on ENOENT specifically when retrying?
Is this over NFS? NFS is more aggressive than local filesystems in purging
name cache entries because there are inherent races in NFS with certain
fileservers (ones that don't use sub-second timestamps), so by default entries
always expire after about a minute. You can change that via the 'nametimeo'
mount option (takes a count in seconds).
--
John Baldwin
More information about the freebsd-hackers
mailing list