Stale memory during post fork cow pmap update

Tue Feb 13 09:11:48 UTC 2018

On 02/10/2018 04:56 PM, Konstantin Belousov wrote:
> On Sat, Feb 10, 2018 at 09:56:20PM +0000, Elliott.Rabe at dell.com wrote:
>> On 02/10/2018 05:18 AM, Konstantin Belousov wrote:
>>> On Sat, Feb 10, 2018 at 05:12:11AM +0000, Elliott.Rabe at dell.com wrote:
>>>> ...
>>>> I've been hunting for the root cause of elusive, slight memory
>>>> corruptions in a large, complex process that manages many threads. All
>>>> failures and experimentation thus far has been on x86_64 architecture
>>>> machines, and pmap_pcid is not in use.
>>>> ...
>>> It is necessary for you to provide the test and provide
>>> some kind of the test trace or the output which illustrates the issue
>>> you found.
>> Here is the sequence of actions I am referring to.  There is only one
>> lock, and all the writes/reads are on one logical page.
>>
>> +The process is forked transitioning a map entry to COW
>> +Thread A writes to a page on the map entry, faults, updates the pmap to
>> writable at a new phys addr, and starts TLB invalidations...
>> +Thread B acquires a lock, writes to a location on the new phys addr,
>> and releases the lock
>> +Thread C acquires the lock, reads from the location on the old phys addr...
>> +Thread A ...continues the TLB invalidations which are completed
>> +Thread C ...reads from the location on the new phys addr, and releases
>> the lock
>>
>> In this example Thread B and C [lock, use and unlock] properly and
>> neither own the lock at the same time.  Thread A was writing somewhere
>> else on the page and so never had/needed the lock.  Thread B sees a
>> location that is only ever read|modified under a lock change beneath it
>> while it is the lock owner.
> I believe you mean 'Thread C' in the last sentence.
You are correct, I did mean Thread C.
>> I will get a test patch together and make it available as soon as I can.
> Please.

Sorry for my delayed response; I had been working off a separate project 
based on releng/11.1 and it took me longer then I expected to get a dev 
rig setup off of master on which I could re-evaluate the situation.

I am attaching my test apparatus, however, calling it a test is probably 
a disservice to tests everywhere.  I consider this entire fixture 
disposable, so I didn't get carried away trying to properly 
style/partition/locate the code.  I never wanted anything this 
complicated either; it pretty much just evolved into a development aid 
to spelunk around in the fault/pmap handling.  My attempts thus-far at 
reducing the fixture to be user-space only have not been successful.  
Additionally, I have noticed that the fixture is /very/ sensitive to any 
changes in timing; several of the debugging entries even seem key to 
hitting the problem.  I didn't have much luck getting the problem to 
manifest on a virtual machine guest w/ a VirtualBox host either.  For 
all of these reasons, I don't think there is value here in trying to use 
this as any sort of regression fixture, unless perhaps if someone is 
willing to try to turn it into something less ridiculous.  Despite all 
shortcomings, on my hardware anyways, it is able to reproduce the 
example I described pretty much immediately when I use it with the 
debugging knob "-v". Instructions and expectations are at the top of the 
main test fixture source file.

I am also attaching a patch that I have been using to prevent the 
problem.  I was looking at things with a much narrower view and made the 
changes directly in pmap_enter.  I suspect the internal 
double-update-invalidate is slightly better performance wise then taking 
two whole faults, but I haven't benchmarked it, it probably doesn't 
matter much compared to the cost and frequency of the actual copies, and 
it also has the disadvantage of being architecture specific.  I also 
don't feel like I have enough experience with the vm fault code in 
general for my commentary to be very valuable here.  However, I do 
wonder: 1) if there are any other scenarios where a potentially 
accessible page might be undergoing an [address+writable] change in the 
same way (this sort of thing seems hard to read out of code), and 2) if 
there is ever any legal reason why an accessible page should be 
undergoing such a change?  If not, perhaps we could come up with an 
appropriate sanity-check condition to guard against any cases of this 
sort of thing accidentally slipping in the future.

The attached git patches should apply and build cleanly on master commit 
fe0ee5c.  I have verified at least these three scenarios in my environment:
1) the fixture alone reproduces the problem.
2) the fixture with my patch does not reproduce the problem.
3) the fixture with your patch does not reproduce the problem.

Thanks!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-DISPOSABLE-A-test-fixture-that-can-repro-a-pmap-upda.patch
Type: text/x-patch
Size: 70608 bytes
Desc: 0001-DISPOSABLE-A-test-fixture-that-can-repro-a-pmap-upda.patch
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20180213/73a33d63/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-TRIAL-Double-invalidate-when-finishing-COW-pmap-upda.patch
Type: text/x-patch
Size: 3688 bytes
Desc: 0002-TRIAL-Double-invalidate-when-finishing-COW-pmap-upda.patch
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20180213/73a33d63/attachment-0003.bin>