cvs commit: src/sys/ia64/ia64 exception.S interrupt.c machdep.c mp_machdep.c pmap.c trap.c vm_machdep.c src/sys/ia64/include proc.h smp.h

Mon Aug 8 16:57:13 GMT 2005

On 8 Aug 2005, at 17:49, Marcel Moolenaar wrote:

> On Aug 8, 2005, at 1:11 AM, Doug Rabson wrote:
>
>
>>> I'd like to do is get a better sense of how critical it is if  
>>> there's
>>> a VHPT miss. Maybe we can implement the code that handles it in C,
>>> use locks
>>> and open the doors to having various different hash bucket
>>> implementations
>>> to play with. I still have my concerns about the assembly in
>>> exception.S and the lack of locking therein. This in the context of
>>> having spurious core dumps.
>>>
>>>
>>
>> If you make it a spin mutex, I think it might be possible to take the
>> mutex from exception.s safely. The uses of this mutex should be
>> extremely short (and collisions rare).
>>
>
> I made them spin mutexes already. For the reasons you mentioned. I'll
> play with it a bit.

Obviously, the hard bit is making sure that you don't deadlock by  
taking a TLB fault while holding the mutex. It ought to be possible  
to do this by explicitly doing any necessary VA->PA conversions  
outside the mutex. I would be worried that the compiler would try to  
break things by doing unexpected stack accesses.

>
>
>>> In parallel, I'm measuring the effect on performance of bumping up
>>> the page
>>> size to 16K and 32K. I suspect the cost of a VHPT miss is mostly due
>>> to us
>>> needing to find the PTE in the hash bucket by walking a linked list.
>>> Keeping
>>> the average length of the list small may improve our overall
>>> performance.
>>>
>>> Lots to learn...
>>>
>>>
>>
>> How about the effect of different VHPT sizes?
>>
>
> A larger VHPT does not necessarily improve performance. I think I got
> the best results with a 64K VHPT in a 2GB machine. The performance
> deltas were really small, but that might be due to the particulars of
> the load I put onto the machine. The effects on large databases may
> be better for example.
>
>
>> A long time ago I
>> experimented with different ways of assigning region IDs to processes
>> in an attempt to reduce collisions (and therefore reduce collision
>> chain length). I think there still might be some mileage in that
>> direction.
>>
>
> I think the algorithm is defined in the architecture specification. It
> should be possible to analyze it and determine if get a good  
> distribution.
>
> We probably get better results if we share translations across  
> processes.
> For this to work, we need to use the permission keys so that we can  
> assign
> different permissions per process without having to create new  
> translations
> for it.

Right.