Re: pondering pi futexes
- Reply: Konstantin Belousov : "Re: pondering pi futexes"
- In reply to: Konstantin Belousov : "Re: pondering pi futexes"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 01 Jul 2021 21:53:01 UTC
On Mon, Jun 28, 2021 at 02:20:25PM +0300, Konstantin Belousov wrote: > On Sun, Jun 27, 2021 at 10:39:35PM +0300, Dmitry Chagin wrote: > > > > Hi, > > some time ago I have changed Linuxulator futexes from sx lock to mtx. > > sx was used as it allows copyin/copyout with sx lock held. > > to use mtx I have changed the code like: > > 1. lock mtx; > > 2. disable_pagefaults; > > 3. copyin() > > 4. enable_pagefaults; > > 5. if error > > - unlock mtx; > > copyin(); > > if error == 0 goto 1. > > > > it works (needto replace copyin() by fueword32()), but pondering pi futexes > > imlementation, I see that it is not possible to drop the futex lock on a return > > from msleep() path. > > > > below a simplified FUTEX_LOCK_PI operation, where on enter to the kernel current thread: > > > > 0. acquire futex lock (which is mtx) > > 1. cmpset(0 -> current thread TID), return (0) on success; > > 2. fetch() from futex *uaddr (for TID of owner): > > - check EDEADLK case (the futex word at *uaddr is already locked by the caller); > > - check that is no waiters on *uaddr exists which is waiting via FUTEX_WAIT or > > FUTEX_WAIT_BITSET, return (EINVAL) if so; > > - cmpset(TID -> (FUTEX_WAITERS|TID)); > > - on error, the futex owner changed in user-space, repeat from 1. > > 3. Here we have: the owner, one waiter (current thread) and 0 or more waiters > > sleeping on a waiting_proc. FUTEX_WAITERS bit is set, so any new waiters go to > > the kernel and owner should unlock futex via the FUTEX_UNLOCK_PI op; > > 4. Try to find the thread which is associated with the owner’s TID: > > - on error, something bad happened, owner died? Clean owner state link? > > return (ESRCH). Or if no other waiters? Check this... > > - on success: > > - save owner state link to the struct futex (save priority); > > - check the owner's priority, bump it if needed; > > - put the current thread to the waiters list in descending priority order; > > - change priority of all waiters if needed; > > - msleep on a futex waiting_proc; come back with futex lock held; > > - restore own priority? If last waiter?; [ponders..] > > - on timeout return (ETIMEDOUT); > > - the current thread is the new owner: > > bah!! - store() the owner TID to *uaddr; [check what should I do on error..] > > - release futex lock; > > - return (0). > > > > is it possible to hold *uaddr page to prevent page faults? > > I did not followed exact algorithm you trying to describe. Still, I can make > two points which could be useful for formulation of the working solution. > > 1. Umtx AKA FreeBSD native implementation of something very similar to > futex, has a concept of the umtx queue chain. The chain owns the mutex > lock used for 'fast' ops, but for situations like accesses to userspace, > we 'busy' the umtxq chain. De-facto busy state is the hand-rolled > sleepable lock, with usual interlocking against chain mutex, and > msleep/wakeup inter-thread notifications. > reading umtx impl I see umtxq_sleep drop lock (PDROP bit is set) and, in case of a signal, reacquire lock and breaks from loop. is there a possible small window for wakeup loss? as ERESTART returned, or the caller should to take care of this? Or I missing something? Im trying to reuse the umtx code for futexes, at least queue chain. Thanks in advance! > 2. If you insist on accessing userspace pages while owning non-sleepable > locks, or even sleepable locks which do not compose cleanly with whole > VM+VFS+network (the later due to NFS) locking protocols, we do have a > technique that can be used. Althought it is complicated and sometimes > slow. > > You already noted half of it above, namely disable of the page faults and > copyin_nofault-like operations. Second half is prefaulting, which is > done by vm_fault_quick_hold_pages(). There are two ways you can use it. > > Either you hold the target userspace page, then remap it into KVA and > access userspace object through the kernel mapping. This assumes that > the object fits into page, also it does not interact with parallel forks > and CoW, which you either have to accept to recheck after. The usual > races, e.g. userspace remapping the held (actually wired) page under us, > is there, of course, but it is a race anyway. > > Or, you hold the page but still access it using copyin_nofault. Then any > reported error from copyin becomes fatal instead of an indicator to retry. > Error can happen for the same reason, because userspace could unmap or > remap this address under us. > > Umtx implementation only uses busy on chains, might be because _nofault+ > vm_fault_quick_hold() appeared later.