Problem with ufs not releasing vm_pages on busy volume. (soft updates related)

Wed Aug 9 00:44:58 UTC 2006

On 09/08/2006, at 2:07 AM, Eric Anderson wrote:

> On 08/08/06 00:14, Q wrote:
>> On 02/08/2006, at 8:10 PM, Q wrote:
>>> I have a problem that seems to be related to something ufs  
>>> related  not releasing some vm_pages on busy filesystems. I have  
>>> two servers  running PostgreSQL, one running 6.0-RELEASE, the  
>>> other 6.1-RELEASE.  Both are under the same (fairly heavy) load,  
>>> performing the same  operations in bursts every five minutes. The  
>>> filesystems in  question are 450-500Gig, each server using a  
>>> different brand of  RAID card, they both have soft-updates enabled.
>>>
>>> The problem is that both servers are seeing an accumulation of   
>>> about 100Mb of active pages per day (looking at   
>>> vm.stats.vm.v_active_count) that never get released. The only  
>>> way  to release these pages is to unmount the filesystem and  
>>> remount it.  Failing to do this results in the server eventually  
>>> locking up.
>>>
>>> If someone could provide me with some direction on how to go  
>>> about  tracking down what might be causing this to happen it  
>>> would be much  appreciated.
>> I have narrowed the cause of this issue down further to something  
>> to  do with soft updates. If I turn off soft updates for the  
>> filesystem  hosting the database the system no longer accumulates  
>> active vm_pages  constantly. Instead for accumulating 100Mb a day  
>> of active vm pages  until all memory is consumed, it will hover  
>> around 50-60Mb with soft  updates disabled.
>> If someone familiar with the softupdates code is willing to help  
>> me  pinpoint the cause of this problem it would be much appreciated.
>
>
> Is it possible for you to upgrade to the latest 6-STABLE branch,  
> just to make sure that the issue hasn't been fixed already?

I did a buildworld on one the machine running 6.0-RELEASE last night.  
I just have to schedule some downtime to do the upgrade sometime  
today. Having two identical servers has it's advantages.

> Is there any way to reproduce this on another box for testing?  (I  
> assume not, due to the nature of these things)

I think this bug is very circumstance specific, and my database  
design just happens to exercise the bug.
The fact that I have two identical servers experiencing exactly the  
same problem does helps narrow the field of possibilities, but the  
servers are my no means "expendable".

> Also - I wonder if doing a snapshot on the filesystem would flush  
> out the pages - is that something you can try?

Possibly. I will see what I can do.

-- 
Seeya...Q

                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

                           _____  /  Quinton Dolan - qdolan at gmail.com
   __  __/  /   /   __/   /      /
      /    __  /   _/    /      /        Gold Coast, QLD, Australia
   __/  __/ __/ ____/   /   -  /            Ph: +61 419 729 806
                     _______  /
                             _\