Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023
- Reply: Mark Millard : "Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023"
- In reply to: Mark Millard : "Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 21 Jul 2024 10:36:56 UTC
On Jul 20, 2024, at 16:42, Mark Millard <marklmi@yahoo.com> wrote: > On Jul 20, 2024, at 01:57, Konstantin Belousov <kostikbel@gmail.com> wrote: > >> [Everything and everybody in Cc: are stripped for good]. >> >> On Fri, Jul 19, 2024 at 10:38:36PM -0700, Mark Millard wrote: >>> 0x201375c0 - 0x2014092c is .bss in /lib/libthr.so.3 >>> >>> (gdb) bt >>> #0 0x201aeec0 in __pthread_map_stacks_exec () from /lib/libc.so.7 >>> #1 0x2005d1e4 in ?? () from /libexec/ld-elf.so.1 >>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>> (gdb) disass >>> Dump of assembler code for function __pthread_map_stacks_exec: >>> => 0x201aeec0 <+0>: ldr r0, [pc, #8] @ 0x201aeed0 <__pthread_map_stacks_exec+16> >>> 0x201aeec4 <+4>: add r0, pc, r0 >>> 0x201aeec8 <+8>: ldr r0, [r0, #156] @ 0x9c >>> 0x201aeecc <+12>: bx r0 >>> 0x201aeed0 <+16>: andseq r6, r7, r4, lsr #12 >>> End of assembler dump. >>> >> >> Do the following: >> 1. Rebuild rtld/libc/libthr with the debugging info and no optimization, >> i.e. ensure that flags are "-O0 -g" or "-Og -g" and not -O2. See >> the first comment in libexec/rtld-elf/Makefile for the hint how to >> do it. > > I did a full buildworld with "-Og -g" via temporary > use of: > > diff --git a/share/mk/sys.mk b/share/mk/sys.mk > index 44db9266784f..9c6c7ce575a4 100644 > --- a/share/mk/sys.mk > +++ b/share/mk/sys.mk > @@ -145,7 +145,8 @@ CC ?= c89 > CFLAGS ?= -O > .else > CC ?= cc > -CFLAGS ?= -O2 -pipe > +#CFLAGS ?= -O2 -pipe > +CFLAGS ?= -Og -g -pipe > .if defined(NO_STRICT_ALIASING) > CFLAGS += -fno-strict-aliasing > .endif > > I installed the result armv7 world into a > directory tree and installed pkg and cairo. > >> 2. Reproduce the issue > > The dlopen_test.c based case does not fail under the world > built with "-Og -g": > > # cc -g -std=c11 -pedantic -Wall -pthread dlopen_test.c ; ./a.out > # > >> under gdb > > (gdb) run > Starting program: /root/a.out [Inferior 1 (process 36680) exited normally] > (gdb) > > So it does not reproduce in gdb when buildworld was based > on "-Og -g". I found another context that has useful debugger information and also fails. It avoids graphviz being involved: ) a pkgbase install that I had around (pkgbase has debug information) ) also set up /home/pkgbuild/worktrees/main/ to refer to the /usr/src/ that pkgbase put in place ) pkg install cairo ) use of my simple dlopen program (gdb) run Starting program: /root/a.out Catchpoint 7 Inferior loaded /lib/libgcc_s.so.1 /lib/libthr.so.3 /lib/libc.so.7 /lib/libsys.so.7 r_debug_state (rd=<optimized out>, m=<optimized out>) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4485 4485 } (gdb) c Continuing. Breakpoint 3, get_program_var_addr (name=0x20042f2a "__progname", lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 4523 symlook_init(&req, name); (gdb) c Continuing. Breakpoint 3, get_program_var_addr (name=0x20043c97 "environ", lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 4523 symlook_init(&req, name); (gdb) c Continuing. Breakpoint 3, get_program_var_addr (name=0x20043c9f "__elf_aux_vector", lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 4523 symlook_init(&req, name); (gdb) c Continuing. Breakpoint 3, get_program_var_addr (name=0x200442e8 "__libc_atexit", lockstate=lockstate@entry=0xffffd668) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 4523 symlook_init(&req, name); (gdb) c Continuing. Catchpoint 7 Inferior loaded /usr/local/lib/libcairo.so.2 /usr/local/lib/libpixman-1.so.0 /usr/local/lib/libfontconfig.so.1 /usr/local/lib/libfreetype.so.6 /usr/local/lib/libEGL.so.1 /usr/lib/libdl.so.1 /usr/local/lib/libpng16.so.16 /usr/local/lib/libxcb-shm.so.0 /usr/local/lib/libxcb.so.1 /usr/local/lib/libxcb-render.so.0 /usr/local/lib/libXrender.so.1 /usr/local/lib/libX11.so.6 /usr/local/lib/libXext.so.6 /lib/libz.so.6 /usr/local/lib/libGL.so.1 /lib/libm.so.5 /usr/local/lib/libexpat.so.1 /usr/lib/libbz2.so.4 /usr/local/lib/libbrotlidec.so.1 /usr/local/lib/libGLdispatch.so.0 /usr/local/lib/libXau.so.6 /usr/local/lib/libXdmcp.so.6 /usr/local/lib/libGLX.so.0 /usr/local/lib/libbrotlicommon.so.1 r_debug_state (rd=<optimized out>, m=<optimized out>) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4485 4485 } (gdb) c Continuing. Breakpoint 3, get_program_var_addr (name=0x200435bf "__pthread_map_stacks_exec", lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 4523 symlook_init(&req, name); (gdb) c Continuing. Breakpoint 8.3, _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 140 round_up(thrd->attr.guardsize_attr), (gdb) bt #0 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 #1 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 #2 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 #3 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 #4 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 #5 0x00020510 in main () at dlopen_test.c:14 (gdb) s 139 mprotect((char *)thrd->attr.stackaddr_attr + (gdb) s 141 round_up(thrd->attr.stacksize_attr), (gdb) s 140 round_up(thrd->attr.guardsize_attr), (gdb) s round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:129 129 if (size % _thr_page_size != 0) (gdb) s 130 size = ((size / _thr_page_size) + 1) * (gdb) bt #0 round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130 #1 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 #2 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 #3 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 #4 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 #5 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 #6 0x00020510 in main () at dlopen_test.c:14 (gdb) si 129 if (size % _thr_page_size != 0) (gdb) 130 size = ((size / _thr_page_size) + 1) * (gdb) bt #0 round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130 #1 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 #2 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 #3 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 #4 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 #5 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 #6 0x00020510 in main () at dlopen_test.c:14 (gdb) disass /s Dump of assembler code for function __thr_map_stacks_exec: . . . 130 size = ((size / _thr_page_size) + 1) * 0x20112eec <+340>: mov r0, r6 129 if (size % _thr_page_size != 0) 0x20112ef0 <+344>: ldr r4, [pc, r4] 130 size = ((size / _thr_page_size) + 1) * => 0x20112ef4 <+348>: mov r1, r4 0x20112ef8 <+352>: bl 0x20116b60 NOTE: 0x20116760 - 0x20116f30 is .plt in /lib/libthr.so.3 --Type <RET> for more, q to quit, c to continue without paging-- 0x20112efc <+356>: mov r9, r0 0x20112f00 <+360>: mov r0, r5 0x20112f04 <+364>: mov r1, r4 0x20112f08 <+368>: bl 0x20116b60 NOTE: 0x20116760 - 0x20116f30 is .plt in /lib/libthr.so.3 0x20112f0c <+372>: mls r1, r0, r4, r5 . . . (gdb) si 0x20112ef8 130 size = ((size / _thr_page_size) + 1) * (gdb) 0x20116b60 in ?? () from /lib/libthr.so.3 (gdb) bt #0 0x20116b60 in ?? () from /lib/libthr.so.3 #1 0x20112efc in round_up (size=4096) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130 #2 _thr_stack_fix_protection (thrd=0x20070000) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 #3 __thr_map_stacks_exec () at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 #4 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 #5 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized out>, mode=1, lockstate=0xffffd290) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 #6 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 #7 0x00020510 in main () at dlopen_test.c:14 (gdb) si 0x20116b64 in ?? () from /lib/libthr.so.3 (gdb) si 0x20116b68 in ?? () from /lib/libthr.so.3 (gdb) si 0x20116760 in ?? () from /lib/libthr.so.3 (gdb) si 0x20116764 in ?? () from /lib/libthr.so.3 (gdb) si 0x20116768 in ?? () from /lib/libthr.so.3 (gdb) si 0x2011676c in ?? () from /lib/libthr.so.3 (gdb) si _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:78 78 stmdb sp!,{r0-r5,sl,fp} (gdb) bt #0 _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:78 #1 0x201373b0 in ?? () from /lib/libthr.so.3 NOTE: 0x201373a8 - 0x201375a0 is .got.plt in /lib/libthr.so.3 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Turns out that _thr_rtld_rlock_acquire is looping when the process is stuck: . . . (gdb) bt #0 _thr_rtld_rlock_acquire (lock=0x20137c40) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 #1 0x20060788 in rlock_acquire (lock=0x2008af10 <rtld_locks>, lockstate=lockstate@entry=0xffffd0ec) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld_lock.c:259 #2 0x20059098 in _rtld_bind (obj=0x2008f404, reloff=496) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:1035 #3 0x2005483c in _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 #4 0x2005483c in _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 #5 0x2005483c in _rtld_bind_start () at /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 . . . (gdb) info threads Id Target Id Frame * 1 LWP 100174 of process 97711 _thr_rtld_rlock_acquire (lock=0x20137c40) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 So: Only the one main thread. It is repeating the _thr_rwlock_rdlock loop (lines 121/122): (gdb) list 115 110 _thr_rtld_rlock_acquire(void *lock) 111 { 112 struct pthread *curthread; 113 struct rtld_lock *l; 114 int errsave; 115 116 curthread = _get_curthread(); 117 SAVE_ERRNO(); 118 l = (struct rtld_lock *)lock; 119 (gdb) 120 THR_CRITICAL_ENTER(curthread); 121 while (_thr_rwlock_rdlock(&l->lock, 0, NULL) != 0) 122 ; 123 curthread->rdlock_count++; 124 RESTORE_ERRNO(); 125 } >> , and backtrace all threads from userspace. >> I only need userspace backtrace, not either kernel-side stacks nor >> the syscall history. >> >> Are you sure that the issue is specific to armv7, might be it takes more >> efforts to reproduce on host native? === Mark Millard marklmi at yahoo.com