svn commit: r277199 - in head/sys: fs/devfs kern
Hans Petter Selasky
hps at selasky.org
Fri Jan 16 08:59:28 UTC 2015
Hi Konstantin,
On 01/16/15 09:03, Konstantin Belousov wrote:
> On Thu, Jan 15, 2015 at 01:14:39PM +0100, Hans Petter Selasky wrote:
>> On 01/15/15 12:51, Konstantin Belousov wrote:
>>> On Thu, Jan 15, 2015 at 11:49:09AM +0100, Hans Petter Selasky wrote:
>>>>
>>>> I see no leakage in that case either!
>>> Because these cases come through destroy_dev().
>>>
>>>>
>>>> Are there more cases which I don't see?
>>> You are breaking existig devfs KPI by your hack. You introduce yet another
>>> reference on the device, which is not supposed to be there.
>>
>> Hi Konstantin,
>>
>> I need a non-sleeping way to say a character device is no longer
>> supposed to be used and be able to re-use the device name right away
>> creating a new device. I guess you got that.
> Yes, I got it. The devfs design is not suitable for this, and your
> hack is the good witness of the fact.
>
> My opinion is that you should have tried to handle the issue at the driver
> level, instead of making this devfs issue. I.e., if you already have
> cdev node with the correct name, driver should have replaced the private
> data to point to new device.
I think this way cannot be implemented in a clean way, because of
locking order reversal. And if you try to avoid the LOR you end up with
the race. Chess mate sort of ;-) Let me explain:
Thread 1:
usb_sx_lock();
cdev = Look in freelist for existing device();
else
cdev = make_dev();
usb_sx_unlock();
Thread 2:
usb_sx_lock();
put cdev on freelist
usb_sx_unlock();
Thread 3:
usb_sx_lock();
cdev = remove first entry in freelist
usb_sx_unlock();
/*
* XXX because USB needs to call destroy_dev() unlocked we
* are racing with Thread 1 again
*/
destroy_dev(cdev);
>
> This would also close a window where /dev node is non-existent or operate
> erronously.
I'm not saying I plan so, but I think "cdevs" at some point need to
understand mutexes and locks. That means like with other API's in the
kernel we can associate a lock with the "cdev", and this lock is then
used to ensure an atomic shutdown of the system in a non-blocking
fashion. In my past experience multithreaded APIs should be high level
implemented like this:
NON-BLOCKING methods:
lock(); **
xxx_start();
xxx_stop();
unlock();
BLOCKING methods:
setup(); // init
unsetup(); // drain
Any callbacks should always be called locked **
In devfs there was no non-blocking stop before I added the delist_dev()
function.
>
> You do not understand my point.
>
> I object against imposing one additional global reference on all cdevs
> just to cope with the delist hack. See the patch at the end of the message.
It's fine by me.
>
> WRT destroy_dev_sched_cb() calling delist_dev(), even after calling
> delist_dev(), the node still exists in the /dev. It is only removed
> after populate loop is run sometime later. dev_sched() KPI is inheritly
> racy, drivers must handle the races for other reasons.
The populate loop is all running under the dev_lock() from what I can
see and make_dev() is also keeping the same lock when inserting and
removing new cdevs. The populate loop should always give a consistent
view of the character devices available, and I don't see how "cdev"
structures without the CDP_ACTIVE flag can appear with recently created
ones, even if the name is the same.
>
>>
>>> Entry can be random, since after the populate loop is ran, your code in
>>> other thread could start and create duplicate entry. There is a window
>>> in the lookup where both directory vnode lock and mount point sx locks
>>> are dropped. So running the populate does not guarantee anything.
>>
>> If there is such a race, it is already there! My patch changes nothing
>> in that area:
>>
>> Thread1:
>> Calls destroy_dev() and clears CDP_ACTIVE, after dev_unlock() it goes
>> waiting for some refs for xxx milliseconds.
>> Thread2:
>> Tries to create create a new character device having the same name like
>> the one in thread1. Device name duplication check is missed because
>> CDP_ACTIVE is cleared. Still thread1 is waiting.
>> Thread3:
>> Tries to open character device created by thread2 while thread1 is still
>> waiting for some ref held by a userspace app to go away.
>>
>> This can happen already before my patches! What do you think?
> Possibly.
>
>> At what level do you mean duplicate names, I don't get this fully? At
>> the directory level (DE nodes)? Or inside the list of character devices
>> (LIST_XXX)?
> It does not matter, dup at either one directory level, or dup of full
> names in the global list are equivalent (bad) things.
Like I write above I don't see where the problem is. At the cdev level,
we are protecting the cdev's LIST with dev_lock() and only one entry
will exist having CDP_ACTIVE bit set per unique cdev name and path. Else
we will hit a panic in make_dev() and friends.
In the directory entry level the populate loop will also ensure a
consistent view, and hence the cdev's LIST is consistent, the view
presented to userspace will also be consistent.
That system functions can still call into the dangling read/write/ioctl
functions is another story, and that is why I tell, that in order to
simplify this teardown, we possibly should associate a client selectable
lock with each CDEV, for teardown purposes. Like done for several years
in the callout and USB APIs and possibly many other places.
>
>>
>>> Let me summarize:
>>> - the extra reference on arbitrary cdev should be eliminated. The
>>> delist_dev_locked() may add the ref and set some CDP_ flag to
>>> indicate to destroy_dev() that it should do dev_rel().
>>
>> It is possible to do this. I thought about this before doing my patch,
>> but decided to try to avoid adding yet another cdev flag.
>>
>>> - the existence of the duplicated entries should be either eliminated
>>> (I am not sure it is possible with your code), or we must ensure
>>> that only one name with CDP_ACTIVE flag set and given name exists.
>>> Then, all lookup code must be audited to take CDP_ACTIVE into account
>>> when accessing names. I see at least devfs_find() and devfs_mknod()
>>> which must be changed. I did not performed full audit.
>>
>> I will check this path out aswell.
>>
>
> It seems it is simpler for me to try to clean up after the commit.
> The patch was only lightly tested. I post the patch for discussion,
> not for you to committing it. I will expedite the change into HEAD
> after the consensus on it is made and adequate testing is performed.
I don't see any problems about your patch, except it adds a bit more
code to the kernel.
--HPS
More information about the svn-src-all
mailing list