Suddenly slow lstat syscalls on CURRENT from Juli
Beat Gätzi
beat at chruetertee.ch
Sat Jan 1 16:00:54 UTC 2011
On 01.01.2011 16:45, Kostik Belousov wrote:
> On Sat, Jan 01, 2011 at 04:37:44PM +0100, Beat G?tzi wrote:
>> On 01.01.2011 16:10, Alexander Best wrote:
>>> On Sat Jan 1 11, Beat G?tzi wrote:
>>>> Hi,
>>>>
>>>> Since a couple of days commands like ls(1) are very slow on one of my
>>>> tinderboxes. Checking with ktrace/kdump I see that the lstat syscall
>>>> takes about one second:
>>>>
>>>> 70559 ls 0.004644 CALL lstat(0x284472f8,0x28447298)
>>>> 70559 ls 0.004651 NAMI "Mk"
>>>> 70559 ls 0.004664 STRU struct stat {... }
>>>> 70559 ls 0.004672 RET lstat 0
>>>> 70559 ls 0.004688 CALL lstat(0x2844a1b8,0x2844a158)
>>>> 70559 ls 0.004696 NAMI "README,v"
>>>> 70559 ls 1.004245 STRU struct stat {...}
>>>> 70559 ls 1.004263 RET lstat 0
>>>> 70559 ls 1.004277 CALL lstat(0x2844a2b8,0x2844a258)
>>>> 70559 ls 1.004286 NAMI ".cvsignore,v"
>>>> 70559 ls 2.004282 STRU struct stat {...}
>>>> 70559 ls 2.004302 RET lstat 0
>>>> 70559 ls 2.004316 CALL lstat(0x2844a3b8,0x2844a358)
>>>> 70559 ls 2.004325 NAMI "CHANGES,v"
>>>> 70559 ls 3.004275 STRU struct stat {...}
>>>> 70559 ls 3.004296 RET lstat 0
>>>> 70559 ls 3.004310 CALL lstat(0x2844a4b8,0x2844a458)
>>>> 70559 ls 3.004318 NAMI "COPYRIGHT,v"
>>>> 70559 ls 4.004300 STRU struct stat {...}
>>>> 70559 ls 4.004318 RET lstat 0
>>>>
>>>> When i immediately re-execute the command in the same directory it looks
>>>> normal:
>>>>
>>>> 78007 ls 0.004665 CALL lstat(0x284472f8,0x28447298)
>>>> 78007 ls 0.004673 NAMI "Mk"
>>>> 78007 ls 0.004686 STRU struct stat {...}
>>>> 78007 ls 0.004693 RET lstat 0
>>>> 78007 ls 0.004708 CALL lstat(0x2844a1b8,0x2844a158)
>>>> 78007 ls 0.004715 NAMI "README,v"
>>>> 78007 ls 0.004728 STRU struct stat {...}
>>>> 78007 ls 0.004735 RET lstat 0
>>>> 78007 ls 0.004742 CALL lstat(0x2844a2b8,0x2844a258)
>>>> 78007 ls 0.004749 NAMI ".cvsignore,v"
>>>> 78007 ls 0.004761 STRU struct stat {...}
>>>> 78007 ls 0.004768 RET lstat 0
>>>> 78007 ls 0.004775 CALL lstat(0x2844a3b8,0x2844a358)
>>>> 78007 ls 0.004782 NAMI "CHANGES,v"
>>>> 78007 ls 0.004795 STRU struct stat {...}
>>>> 78007 ls 0.004802 RET lstat 0
>>>> 78007 ls 0.004809 CALL lstat(0x2844a4b8,0x2844a458)
>>>> 78007 ls 0.004817 NAMI "COPYRIGHT,v"
>>>> 78007 ls 0.004828 STRU struct stat {...}
>>>> 78007 ls 0.004835 RET lstat 0
>>>>
>>>> System is a HP DL360 G3 with a Compaq Smart Array 5i controller. There
>>>> is no evidence of a RAID controller or hard disk problem in the system
>>>> logs. Also writing to a file and reading from a file looks normal.
>>>> The running kernel is a GENERIC kernel from Juli without debugging options:
>>>>
>>>> # uname -a
>>>> FreeBSD tinderbox.chruetertee.ch 9.0-CURRENT FreeBSD 9.0-CURRENT #5
>>>> r209980: Tue Jul 13 11:25:50 CEST 2010
>>>> root at tinderbox.chruetertee.ch:/usr/obj/usr/home/beat/dev/src/head/sys/BEASTIE
>>>> i386
>>>
>>> no problems here.
>>
>> I haven't had this problem the first ~166 days since I rebooted the box
>> the last time. Then the problem suddenly occur in all directories on all
>> partitions. The box was pretty much under load all the time as it builds
>> packages almost 7x24. I forgot to say the used filesystem is UFS.
>>
>> # diskinfo -ct da0
>> da0
>> 512 # sectorsize
>> 299992412160 # mediasize in bytes (279G)
>> 585922680 # mediasize in sectors
>> 0 # stripesize
>> 0 # stripeoffset
>> 36472 # Cylinders according to firmware.
>> 255 # Heads according to firmware.
>> 63 # Sectors according to firmware.
>> # Disk ident.
>>
>> I/O command overhead:
>> time to read 10MB block 0.333161 sec = 0.016 msec/sector
>> time to read 20480 sectors 3.392359 sec = 0.166 msec/sector
>> calculated command overhead = 0.149 msec/sector
>>
>> Seek times:
>> Full stroke: 250 iter in 1.202862 sec = 4.811 msec
>> Half stroke: 250 iter in 1.120656 sec = 4.483 msec
>> Quarter stroke: 500 iter in 2.109077 sec = 4.218 msec
>> Short forward: 400 iter in 1.892342 sec = 4.731 msec
>> Short backward: 400 iter in 1.399378 sec = 3.498 msec
>> Seq outer: 2048 iter in 0.399352 sec = 0.195 msec
>> Seq inner: 2048 iter in 0.385460 sec = 0.188 msec
>> Transfer rates:
>> outside: 102400 kbytes in 2.325967 sec = 44025 kbytes/sec
>> middle: 102400 kbytes in 2.157681 sec = 47458 kbytes/sec
>> inside: 102400 kbytes in 2.717089 sec = 37687 kbytes/sec
>>
>> I also observed that the cpu output of vmstat is a little bit confusing
>> on this system:
>>
>> # vmstat
>> procs memory page disk faults cpu
>> r b w avm fre flt re pi po fr sr da0 in sy cs us sy id
>> 0 0 0 1663M 1120M 117 0 1 0 120 21 0 172 27 245 -29 -10 139
>>
> Check the output of sysctl kern.maxvnodes and vfs.numvnodes. I suspect
> they are quite close or equial. If yes, consider increasing maxvnodes.
> Another workaround, if you have huge nested directories hierarhy, is
> to set vfs.vlru_allow_cache_src to 1.
Thanks for the hint. kern.maxvnodes and vfs.numvnodes were equal:
# sysctl kern.maxvnodes vfs.numvnodes
kern.maxvnodes: 100000
vfs.numvnodes: 100765
I've increased kern.maxvnodes and the problem was gone until
vfs.numvnodes reached the value of kern.maxvnodes again:
# sysctl kern.maxvnodes vfs.numvnodes
kern.maxvnodes: 150000
vfs.numvnodes: 150109
As the directory structure is quite huge on this server I've set
vfs.vlru_allow_cache_src to one now.
> You did not specified how much memory your machine have, but I assume it
> is > 1GB. Anyway, increase of maxvnodes on i386 should be done very
> cautiously, since it is easy to exhaust KVA.
The server has 4GB of RAM. Is it possible to check how much i could
increase kern.maxvnodes without exhausting KVA?
Thanks a lot!
Beat
More information about the freebsd-current
mailing list