32-bit powerpc head -r360311 has signal 11 process crashes with rendezvous_request involved (old PowerMac dual-socket, 1 core each)
Mark Millard
marklmi at yahoo.com
Fri May 1 02:41:10 UTC 2020
[After some time dhclient and sendmail did have the problem
despite the debug-kernel being in use.]
On 2020-Apr-29, at 21:29, Mark Millard <marklmi at yahoo.com> wrote:
> [Notes on using the artifact.ci head -3260311 debug-kernel added.]
>
> On 2020-Apr-29, at 02:27, Mark Millard <marklmi at yahoo.com> wrote:
>
>> Since updating from head -r359427 based to head -r360311 base
>> (my own non-debug builds), various things report segmentation
>> faults. It appears that rendezvous_request may always be
>> involved. I've not had time (yet) to deal with substituting a
>> debug kernel from:
>>
>> https://artifact.ci.freebsd.org/snapshot/head/r360311/powerpc/powerpc/
>>
>> I expect to do so at some point.
>>
>>
>> I give 2 examples below (mountd and rpcbind). Both involve
>> rendezvous_request and svc_getreq_common .
>>
>> First mountd :
>>
>> # gdb mountd /mountd.core
>> GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> . . .
>> Reading symbols from mountd...
>> Reading symbols from /usr/lib/debug//usr/sbin/mountd.debug...
>> [New LWP 100105]
>> Core was generated by `/usr/sbin/mountd -r'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0 atomic_load_u (a=<optimized out>, mo=<optimized out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/atomic.h:70
>>
>> warning: Source file is more recent than executable.
>> 70 JEMALLOC_GENERATE_INT_ATOMICS(unsigned, u, LG_SIZEOF_INT)
>> (gdb) bt
>> #0 atomic_load_u (a=<optimized out>, mo=<optimized out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/atomic.h:70
>> #1 rtree_leaf_elm_szind_read (tsdn=<optimized out>, rtree=<optimized out>, elm=<optimized out>, dependent=<optimized out>) at /usr/src/contrib/jemalloc/include/jemalloc/internal/rtree.h:230
>> #2 rtree_szind_slab_read (tsdn=0x50094018, rtree=<optimized out>, rtree_ctx=0x50094044, key=1869636193, dependent=true, r_szind=<optimized out>, r_slab=<optimized out>)
>> at /usr/src/contrib/jemalloc/include/jemalloc/internal/rtree.h:504
>> #3 ifree (tsd=<optimized out>, ptr=0x6f706261, tcache=<optimized out>, slow_path=false) at jemalloc_jemalloc.c:2574
>> #4 __je_free_default (ptr=0x6f706261) at jemalloc_jemalloc.c:2775
>> #5 0x50235db0 in __free (ptr=0x6f706261) at jemalloc_jemalloc.c:2852
>> #6 0x5026525c in freenetconfigent (netconfigp=0x50049170) at /usr/src/lib/libc/rpc/getnetconfig.c:540
>> #7 0x50260d8c in __rpc_sockinfo2netid (sip=<optimized out>, netid=<optimized out>) at /usr/src/lib/libc/rpc/rpc_generic.c:573
>> #8 0x502521f0 in makefd_xprt (fd=10, sendsize=9000, recvsize=9000) at /usr/src/lib/libc/rpc/svc_vc.c:270
>> #9 0x50252fa4 in rendezvous_request (xprt=0x5007b120, msg=<optimized out>) at /usr/src/lib/libc/rpc/svc_vc.c:315
>> #10 0x50254588 in svc_getreq_common (fd=<optimized out>) at /usr/src/lib/libc/rpc/svc.c:640
>> #11 0x502543d0 in svc_getreqset (readfds=<optimized out>) at /usr/src/lib/libc/rpc/svc.c:611
>> #12 0x1001434c in main (argc=<optimized out>, argv=0xffffde3c) at /usr/src/usr.sbin/mountd/mountd.c:683
>>
>> (gdb) disass
>> Dump of assembler code for function __je_free_default:
>> 0x50235244 <+0>: mflr r0
>> 0x50235248 <+4>: stw r0,4(r1)
>> 0x5023524c <+8>: stwu r1,-80(r1)
>> 0x50235250 <+12>: stw r30,72(r1)
>> 0x50235254 <+16>: stw r21,36(r1)
>> 0x50235258 <+20>: stw r22,40(r1)
>> 0x5023525c <+24>: stw r23,44(r1)
>> 0x50235260 <+28>: stw r24,48(r1)
>> 0x50235264 <+32>: stw r25,52(r1)
>> 0x50235268 <+36>: stw r26,56(r1)
>> 0x5023526c <+40>: stw r27,60(r1)
>> 0x50235270 <+44>: stw r28,64(r1)
>> 0x50235274 <+48>: stw r29,68(r1)
>> 0x50235278 <+52>: bl 0x5023527c <__je_free_default+56>
>> 0x5023527c <+56>: mr r28,r3
>> 0x50235280 <+60>: mflr r30
>> 0x50235284 <+64>: addis r30,r30,14
>> 0x50235288 <+68>: addi r30,r30,-20816
>> 0x5023528c <+72>: lwz r4,64(r30)
>> 0x50235290 <+76>: lwz r5,6188(r30)
>> 0x50235294 <+80>: lwz r4,0(r4)
>> 0x50235298 <+84>: stw r4,32(r1)
>> 0x5023529c <+88>: lbz r4,0(r5)
>> 0x502352a0 <+92>: cmplwi r4,0
>> 0x502352a4 <+96>: bne 0x502353c4 <__je_free_default+384>
>> 0x502352a8 <+100>: cmplwi r28,0
>> 0x502352ac <+104>: beq 0x50235378 <__je_free_default+308>
>> 0x502352b0 <+108>: addi r3,r30,4332
>> 0x502352b4 <+112>: rlwinm r24,r28,0,0,9
>> 0x502352b8 <+116>: bl 0x502f06b4 <00000000.plt_pic32.__tls_get_addr>
>> 0x502352bc <+120>: rlwinm r26,r28,13,25,28
>> 0x502352c0 <+124>: mr r29,r3
>> 0x502352c4 <+128>: lbz r3,0(r3)
>> 0x502352c8 <+132>: cmplwi r3,0
>> 0x502352cc <+136>: bne 0x502353fc <__je_free_default+440>
>> 0x502352d0 <+140>: add r25,r29,r26
>> 0x502352d4 <+144>: lwz r3,44(r25)
>> 0x502352d8 <+148>: addi r27,r29,44
>> 0x502352dc <+152>: cmplw r3,r24
>> 0x502352e0 <+156>: bne 0x50235578 <__je_free_default+820>
>> 0x502352e4 <+160>: lwz r3,48(r25)
>> 0x502352e8 <+164>: rlwinm r4,r28,20,22,31
>> 0x502352ec <+168>: mulli r4,r4,12
>> 0x502352f0 <+172>: add r3,r3,r4
>> => 0x502352f4 <+176>: lwz r6,4(r3)
>> 0x502352f8 <+180>: lwz r5,4368(r30)
>> 0x502352fc <+184>: addi r26,r29,288
>> 0x50235300 <+188>: lbz r4,8(r3)
>> 0x50235304 <+192>: rlwinm r3,r6,2,0,29
>> 0x50235308 <+196>: lwz r7,28(r29)
>> 0x5023530c <+200>: lwzx r5,r5,r3
>> 0x50235310 <+204>: lwz r8,24(r29)
>> 0x50235314 <+208>: andi. r4,r4,1
>> 0x50235318 <+212>: addc r4,r7,r5
>> 0x5023531c <+216>: addze r5,r8
>> 0x50235320 <+220>: stw r5,24(r29)
>> 0x50235324 <+224>: stw r4,28(r29)
>> 0x50235328 <+228>: ble 0x50235534 <__je_free_default+752>
>> . . .
>>
>> (gdb) info reg
>> r0 0x50235c04 1344494596
>> r1 0xffffcfb0 4294954928
>> r2 0x5009b018 1342812184
>> r3 0x2448 9288
>> r4 0x500940ac 1342783660
>> r5 0x2448 9288
>> r6 0x500940cc 1342783692
>> r7 0x1 1
>> r8 0x0 0
>> r9 0x80808080 2155905152
>> r10 0xc 12
>> r11 0x502e3b50 1345207120
>> r12 0x500491a0 1342476704
>> r13 0x0 0
>> r14 0x1 1
>> r15 0x10040000 268697600
>> r16 0x0 0
>> r17 0x10040000 268697600
>> r18 0x2 2
>> r19 0x0 0
>> r20 0x1 1
>> r21 0x5007b164 1342681444
>> r22 0xffffd2dc 4294955740
>> r23 0x80 128
>> r24 0x6f400000 1866465280
>> r25 0x50094080 1342783616
>> r26 0x68 104
>> r27 0x50094044 1342783556
>> r28 0x6f706261 1869636193
>> r29 0x50094018 1342783512
>> r30 0x5031012c 1345388844
>> r31 0x10040000 268697600
>> pc 0x502352f4 0x502352f4 <__je_free_default+176>
>> msr <unavailable>
>> cr 0x242008a4 606079140
>> lr 0x50235c04 0x50235c04 <__je_free_default+2496>
>> ctr 0x0 0
>> xer 0x0 0
>> fpscr 0x0 0
>> vscr <unavailable>
>> vrsave <unavailable>
>>
>>
>> Then rpcbind :
>>
>> # gdb rpcbind /rpcbind.core
>> GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
>> Copyright (C) 2020 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> . . .
>> Reading symbols from rpcbind...
>> Reading symbols from /usr/lib/debug//usr/sbin/rpcbind.debug...
>> [New LWP 100098]
>> Core was generated by `/usr/sbin/rpcbind'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0 __xdrrec_setnonblock (xdrs=0x500472d8, maxrec=9000) at /usr/src/lib/libc/xdr/xdr_rec.c:607
>> 607 rstrm->nonblock = TRUE;
>> (gdb) bt
>> #0 __xdrrec_setnonblock (xdrs=0x500472d8, maxrec=9000) at /usr/src/lib/libc/xdr/xdr_rec.c:607
>> #1 0x502440d4 in rendezvous_request (xprt=<optimized out>, msg=<optimized out>) at /usr/src/lib/libc/rpc/svc_vc.c:348
>> #2 0x50245588 in svc_getreq_common (fd=<optimized out>) at /usr/src/lib/libc/rpc/svc.c:640
>> #3 0x5024580c in svc_getreq_poll (pfdp=<optimized out>, pollretval=1) at /usr/src/lib/libc/rpc/svc.c:739
>> #4 0x10018568 in my_svc_run () at /usr/src/usr.sbin/rpcbind/rpcb_svc_com.c:1167
>> #5 0x10014ad8 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/usr.sbin/rpcbind/rpcbind.c:250
>>
>> (gdb) disass
>> Dump of assembler code for function __xdrrec_setnonblock:
>> 0x50250468 <+0>: lwz r5,12(r3)
>> 0x5025046c <+4>: li r6,1
>> 0x50250470 <+8>: cmplwi r4,0
>> => 0x50250474 <+12>: stw r6,64(r5)
>> 0x50250478 <+16>: bne 0x50250480 <__xdrrec_setnonblock+24>
>> 0x5025047c <+20>: lwz r4,60(r5)
>> 0x50250480 <+24>: li r3,1
>> 0x50250484 <+28>: stw r4,92(r5)
>> 0x50250488 <+32>: blr
>> End of assembler dump.
>>
>> (gdb) info reg
>> r0 0x5c 92
>> r1 0xffffb400 4294947840
>> r2 0x500a1018 1342836760
>> r3 0x500472d8 1342468824
>> r4 0x2328 9000
>> r5 0x2020 8224
>> r6 0x1 1
>> r7 0xffffb364 4294947684
>> r8 0x500472f4 1342468852
>> r9 0x0 0
>> r10 0x20 32
>> r11 0x502d8ea0 1345162912
>> r12 0x24200880 606079104
>> r13 0x0 0
>> r14 0x0 0
>> r15 0xffffbc28 4294949928
>> r16 0x10002848 268445768
>> r17 0x10040000 268697600
>> r18 0x2 2
>> r19 0x0 0
>> r20 0x1 1
>> r21 0x5004c044 1342488644
>> r22 0xffffb63c 4294948412
>> r23 0x80 128
>> r24 0x50048010 1342472208
>> r25 0x14 20
>> r26 0xffffb630 4294948400
>> r27 0x500472d0 1342468816
>> r28 0xe 14
>> r29 0x50047220 1342468640
>> r30 0x5030112c 1345327404
>> r31 0x10040000 268697600
>> pc 0x50250474 0x50250474 <__xdrrec_setnonblock+12>
>> msr <unavailable>
>> cr 0x44200080 1142947968
>> lr 0x502440d4 0x502440d4 <rendezvous_request+584>
>> ctr 0x502d8ea0 1345162912
>> xer 0x0 0
>> fpscr 0x0 0
>> vscr <unavailable>
>> vrsave <unavailable>
>>
>>
>> dhclient and sendmail have notices of signal 11's
>> but I do not find any .core files around for them.
>>
>> Prior to this upgrade I was having no such problems
>> with the 32-bit powerpc PowerMac.
>
> I substituted the debug-kernel from:
>
> https://artifact.ci.freebsd.org/snapshot/head/r360311/powerpc/powerpc/kernel.txz
>
> and with it there is no evidence so far of the
> problem(s) occurring. Since the same world build is
> in use in both contexts, it looks like the kernel
> is what makes the difference for the problem(s).
>
> With the debug-kernel avoiding the problem, I've yet
> to figure out how to gather evidence.
>
About 8.5 hours later than the boot completing
it got a:
kernel: pid 659 (dhclient), jid 0, uid 65: exited on signal 11
About 7 hours later it started getting sendmail messaged
similar to:
kernel: pid 3722 (sendmail), jid 0, uid 25: exited on signal 11
These seem to be 0.5 hours part. Between the 3rd and 4th
is a lock order reversal notice that I've separately
submitted to the lists (but that do not yet appear there,
last I checked).
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-hackers
mailing list