Process in T state does not want to die.....

Willem Jan Withagen wjw at digiware.nl
Fri Nov 29 15:06:19 UTC 2019


On 29-11-2019 12:40, Konstantin Belousov wrote:
> On Fri, Nov 29, 2019 at 12:05:34PM +0100, Willem Jan Withagen wrote:
>> On 29-11-2019 11:43, Eugene Grosbein wrote:
>>> 29.11.2019 16:24, Eugene Grosbein wrote:
>>>
>>>> 29.11.2019 4:46, Konstantin Belousov write:
>>>>
>>>>>> sys_extattr_set_fd+0xee amd64_syscall+0x364 fast_syscall_common+0x101
>>>>> This is an example of the cause for your problem.
>>>>
>>>> I observe this problem too, but my use case is different.
>>>>
>>>> I have several bhyve instances running Windows guests over ZVOLs over SSD-only RAIDZ1 pool.
>>>> "zfs destroy" for snapshots with large "used" numbers takes long time (several minutes) due to slow TRIM.
>>>> Sometimes this makes virtual guest unresponsible and attempt to restart the bhyve instance may bring it to Exiting (E)
>>>> state for several minutes and it finishes successfully after that. But sometimes bhyve process hangs in T state indefinitely.
>>>>
>>>> This is 11.3-STABLE/amd64 r354667. Should I try your patch too?
>>>
>>> OTOH, same system has several FreeBSD jails over mounted ZFS (file systems) over same pool.
>>> These file systems have snapshots created/removed too and snapshot are large (upto 10G).
>>>
>>
>>   From what I get from Konstantin is that this problem is due to memory
>> pressure build by both ZFS and the buffercache used by UFS.
>> And the buffercache is waiting for some buffer memory to be able to do
>> its work.
>>
>> If wanted I can try and put a ZFS fs on /dev/ggate0 so that any
>> buffering would be in ZFS and not in UFS.
>>
>> But even with the patch I still now have:
>> root 3471   0.0  5.8  646768 480276  - TsJ  11:16  0:10.74 ceph-osd -i 0
>> root 3530   0.0 11.8 1153860 985020  - TsJ  11:17  0:11.51 ceph-osd -i 1
>> root 3532   0.0  5.3  608760 438676  - TsJ  11:17  0:07.31 ceph-osd -i 2
>> root 3534   0.0  3.2  435564 266328  - IsJ  11:17  0:07.35 ceph-osd -i 3
>> root 3536   0.0  4.8  565792 398392  - IsJ  11:17  0:08.73 ceph-osd -i 5
>> root 3553   0.0  2.3  362892 192348  - TsJ  11:17  0:04.21 ceph-osd -i 6
>> root 3556   0.0  3.0  421516 246956  - TsJ  11:17  0:04.81 ceph-osd -i 4
>>
>> And from procstat -kk below it looks like things are still stuck in
>> bwillwrite, but now with another set of functions. I guess not writing
>> an extattrib() but writing a file.
> Yes, it should resolve after you end the load that starves the buffer
> cache' dirty space.  Or wait some time until the thread gets its portion
> of share, which is unfair and could take a long time.

Eh, right....
This pointed me in a direction to offer some stress relief.

The other process is:
root    3581   0.0  0.0   10724 2372 v1  D+ 11:20 0:00.91 bonnie -s 256

Which is also stuck in disk i/o, in the kernel I guess.
So killing it only works once any of the writes succeed.

Luckely the geom-gateway (rbd-ggate) is also in userspace, and can be 
killed. Which I guess is where the buffers collected, because shooting 
that down, immediately allows the ceph-osd to continue crashing.

So is there any controls that I can apply as to make all these 
components behave better?
- One thing would be more memory but this board only allows 8G. (It s an 
oldy)
- Don't run heavy UFS buffer consumers.

Are there any sysctl values I can monitor to check used buffersize?
I guess the system has these values, since top can find 'm

# sysctl -a | grep buffer
vfs.hifreebuffers: 768
vfs.lofreebuffers: 512
vfs.numfreebuffers: 52820
vfs.hidirtybuffers: 13225
vfs.lodirtybuffers: 6612
vfs.numdirtybuffers: 0
vfs.altbufferflushes: 0
vfs.dirtybufferflushes: 0


--WjW

> 
> I will commit the VN_OPEN_INVFS patch shortly.
>>
>> # ps -o pid,lwp,flags,flags2,state,tracer,command -p 3471
>>    PID    LWP        F       F2 STAT TRACER COMMAND
>> 3471 104097 11080081 00000000 TsJ       0 ceph-osd -i 0
>>
>> # procstat -kk 3471:
>>    3471 104310 ceph-osd            journal_write       mi_switch+0xe0
>> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93
>> sys_writev+0x6e amd64_syscall+0x362 fast_syscall_common+0x101
>>    3471 104311 ceph-osd            fn_jrn_objstore     mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104312 ceph-osd            tp_fstore_op        mi_switch+0xe0
>> sleepq_wait+0x2c _sleep+0x247 bwillwrite+0x97 dofilewrite+0x93
>> sys_write+0xc1 amd64_syscall+0x362 fast_syscall_common+0x101
>>    3471 104313 ceph-osd            tp_fstore_op        mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104314 ceph-osd            fn_odsk_fstore      mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104315 ceph-osd            fn_appl_fstore      mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104316 ceph-osd            safe_timer          mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104355 ceph-osd            ms_dispatch         mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104356 ceph-osd            ms_local            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104357 ceph-osd            safe_timer          mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104358 ceph-osd            fn_anonymous        mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104359 ceph-osd            safe_timer          mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104360 ceph-osd            ms_dispatch         mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104361 ceph-osd            ms_local            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104362 ceph-osd            ms_dispatch         mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104363 ceph-osd            ms_local            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104364 ceph-osd            ms_dispatch         mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104365 ceph-osd            ms_local            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104366 ceph-osd            ms_dispatch         mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104367 ceph-osd            ms_local            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104368 ceph-osd            ms_dispatch         mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104369 ceph-osd            ms_local            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104370 ceph-osd            ms_dispatch         mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104371 ceph-osd            ms_local            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104372 ceph-osd            fn_anonymous        mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104373 ceph-osd            finisher            mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104374 ceph-osd            safe_timer          mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104375 ceph-osd            safe_timer          mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104376 ceph-osd            osd_srv_agent       mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104377 ceph-osd            tp_osd_tp           mi_switch+0xe0
>> thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f
>>    3471 104378 ceph-osd            tp_osd_tp           mi_switch+0xe0
>> thread_suspend_switch+0x140 thread_single+0x47b sigexit+0x53
>> postsig+0x304 ast+0x327 fast_syscall_common+0x198
>>
>>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
> 



More information about the freebsd-hackers mailing list