linux threaded application hang
Andriy Gapon
avg at icyb.net.ua
Wed Oct 13 02:23:28 PDT 2004
This is probably not a problem of our linux emulation, but I think this
is most likely place for me to get help.
Here's what I have:
linux_base-7.1_7
linux_devtools-7.1_3
I have installed IBM WebSphere MQ:
MQSeriesJava-5.3.0-1
MQSeriesSDK-5.3.0-1
MQSeriesRuntime-5.3.0-1
MQSeriesServer-5.3.0-1
MQSeriesClient-5.3.0-1
MQSeriesJava-U486878-5.3.0-4
MQSeriesRuntime-U486878-5.3.0-4
MQSeriesSDK-U486878-5.3.0-4
MQSeriesServer-U486878-5.3.0-4
MQSeriesClient-U486878-5.3.0-4
I am trying to execute `setmqcap -1` command and it hangs, seemingly
because of inter-thread communication problem.
Here's ktrace | linux_kdump of the ineteresting part:
#main thread creates a new thread/process with pid 5716
5715 setmqcap CALL linux_clone(0xf00,0x805b3a0)
5715 setmqcap RET linux_clone 5716/0x1654
5716 setmqcap RET linux_fork 0
#writes/reads are made to a pipe and probably are some sort of
#inter-thread communication in old linux threads
#fd 4 in 5715 is writing end and fd 3 in 5716 is reading end of the pipe
5715 setmqcap CALL write(0x4,0xbfbfd900,0x94)
5715 setmqcap RET write 148/0x94
5716 setmqcap CALL linux_rt_sigprocmask(0x2,0x805b304,0,0x8)
5716 setmqcap RET linux_rt_sigprocmask 0
5715 setmqcap CALL linux_rt_sigprocmask(0x2,0,0xbfbfd9d4,0x8)
5715 setmqcap RET linux_rt_sigprocmask 0
5716 setmqcap CALL read(0x3,0x805b264,0x94)
5716 setmqcap GIO fd 3 read 148 bytes
5716 setmqcap RET read 148/0x94
5715 setmqcap CALL write(0x4,0xbfbfd9c0,0x94)
5715 setmqcap GIO fd 4 wrote 148 bytes
5715 setmqcap RET write 148/0x94
#child threas performs timed poll on a pipe before reading from it
#0x7d0 == 2000ms == 2s
5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0)
5716 setmqcap RET poll 1
5715 setmqcap CALL linux_rt_sigprocmask(0x2,0,0xbfbfd910,0x8)
5715 setmqcap RET linux_rt_sigprocmask 0
5716 setmqcap CALL getppid
5716 setmqcap RET getppid 5715/0x1653
#it seems that main thread starts to wait on some condition
#until it is awaken by child thread by a signal
5715 setmqcap CALL linux_rt_sigsuspend(0xbfbfd910,0x8)
5716 setmqcap CALL read(0x3,0x805b264,0x94)
5716 setmqcap GIO fd 3 read 148 bytes
5716 setmqcap RET read 148/0x94
5716 setmqcap CALL linux_mmap(0x805b194)
5716 setmqcap RET linux_mmap -1086394368/0xbf3ef000
5716 setmqcap CALL mprotect(0xbf3ef000,0x1000,0)
5716 setmqcap RET mprotect 0
#child thread creates another thread, that's probably not important
5716 setmqcap CALL linux_clone(0xf21,0xbf3ffbf8)
5716 setmqcap RET linux_clone 5717/0x1655
5717 setmqcap RET linux_fork 0
#child thread wants to wake up main thread (pid == 5715 == 0x1653)
#by sending signal 0x20 == 32 (== linux SIGRTMIN ?)
5716 setmqcap CALL linux_kill(0x1653,0x20)
#but it seems to fail.. or not ?
#I am not sure how to interpret "RESTART" here
5716 setmqcap RET linux_kill RESTART
5717 setmqcap CALL linux_getpid
5717 setmqcap RET linux_getpid 5717/0x1655
5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0)
5717 setmqcap CALL linux_rt_sigprocmask(0x2,0xbf3ffca4,0,0x8)
5717 setmqcap RET linux_rt_sigprocmask 0
5717 setmqcap CALL linux_sched_setscheduler(0x1655,0,0xbf3ffd28)
5717 setmqcap RET linux_sched_setscheduler 0
5717 setmqcap CALL linux_getpid
5717 setmqcap RET linux_getpid 5717/0x1655
5717 setmqcap CALL linux_getpid
5717 setmqcap RET linux_getpid 5717/0x1655
5717 setmqcap CALL linux_sched_setscheduler(0x1655,0,0xbf3ff7a8)
5717 setmqcap RET linux_sched_setscheduler 0
5717 setmqcap CALL linux_rt_sigprocmask(0,0xbf3feecc,0xbf3ff5ac,0x8)
5717 setmqcap RET linux_rt_sigprocmask 0
5717 setmqcap CALL linux_rt_sigaction(0x1c,0xbf3fefcc,0xbf3fef3c,0x8)
5717 setmqcap RET linux_rt_sigaction 0
5717 setmqcap CALL linux_rt_sigprocmask(0x2,0,0xbf3feb3c,0x8)
5717 setmqcap RET linux_rt_sigprocmask 0
#second child thread starts to wait for a signal
5717 setmqcap CALL linux_rt_sigsuspend(0xbf3feb3c,0x8)
#first child thread continues polling for read communication pipe
#but there is nothing there because 5715 is still suspended
5716 setmqcap RET poll 0
5716 setmqcap CALL getppid
5716 setmqcap RET getppid 5715/0x1653
5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0)
5716 setmqcap RET poll 0
5716 setmqcap CALL getppid
5716 setmqcap RET getppid 5715/0x1653
5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0)
5716 setmqcap RET poll 0
5716 setmqcap CALL getppid
5716 setmqcap RET getppid 5715/0x1653
5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0)
5716 setmqcap RET poll 0
5716 setmqcap CALL getppid
5716 setmqcap RET getppid 5715/0x1653
5716 setmqcap CALL poll(0x805b25c,0x1,0x7d0)
#this keeps going on forever (until program is interrupted)
If my understanding of how old linux threads work is correct, then it
seems that the problem is that thread 5715 is never waken up from
linux_rt_sigsuspend. I am not sure if this is because that signal 32 was
never delivered or something more fishy is going on there:
I have attached with gdb to main thread process and here's some data:
(gdb) bt
#0 0x481c69b6 in __sigsuspend (set=0xbfbfd8b0) at
../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1 0x48186d45 in __pthread_wait_for_restart_signal (self=0x481905a0) at
pthread.c:978
#2 0x48186e1d in __pthread_create_2_1 (thread=0xbfbfdaf4,
attr=0xbfbfdf80, start_routine=0x480b6b80 <ThreadMain>, arg=0x8058438)
at restart.h:34
#3 0x480b5683 in xcsCreateThreadWithHandle () at eval.c:41
#4 0x480861ed in xehStartAsySignalMonitor () at eval.c:41
#5 0x480824d0 in xehInitialize () at eval.c:41
#6 0x480928b5 in InitPrivateServices () at eval.c:41
#7 0x48094531 in xcsInitialize () at eval.c:41
#8 0x08049a01 in main () at eval.c:41
#9 0x481b4336 in __libc_start_main (main=0x80496d0 <main>, argc=1,
ubp_av=0xbfbfe744, init=0x8049108 <_init>, fini=0x80540a0 <_fini>,
rtld_fini=0x4806325c <_dl_fini>, stack_end=0xbfbfe73c) at
../sysdeps/generic/libc-start.c:129
(gdb) p set
$1 = (sigset_t *) 0x482c87f0
(gdb) p *set
$2 = {__val = {1250168, 1208405312, 1208363760, 1209738430, 1209738446,
1210126092, 1209738478, 1209738494, 1210131344, 1209738526, 1209738542,
1209738558,
1209738574, 1209738590, 1209554352, 1209738622, 1209738638,
1209738654, 1209738670, 1209738686, 1209738702, 1209738718, 1209738734,
1209738750,
1209738766, 1209738782, 1209738798, 1210090948, 1209738830,
1209738846, 1209738862, 1209738878}}
if I understand correctly linux sigset_t then mask for signal 32 should
be bit 31 (highest) of __val[0]:
1250168 == 100110001001101111000
e.g. it seems that the signal was blocked.
Unfortunately I was not able to attach to other threads, but this looks
insane, especially considering this:
(gdb) fr 1
#1 0x48186d45 in __pthread_wait_for_restart_signal (self=0x481905a0) at
pthread.c:978
978 pthread.c: No such file or directory.
in pthread.c
(gdb) p mask
$3 = {__val = {0, 0, 0, 0, 0, 1208347125, 0, 3217021232, 0, 0, 0, 0, 0,
0, 3, 0, 0, 0, 0, 0, 16777216, 0, 1209525676, 102, 1209536638, 1209529244,
1209524044, 1208404144, 7, 1210900512, 3, 1210877936}}
(gdb) p __pthread_sig_restart
$4 = 32
(gdb) p &mask
$5 = (sigset_t *) 0xbfbfd890
How can &mask in frame 1 be different from set in frame 0 ?
P.S.
I tried setting LD_ASSUME_KERNEL to various values from 2.2.5 to 2.4.2,
but it didn't change anything.
--
Andriy Gapon
More information about the freebsd-emulation
mailing list