qemu-arm-static under amd64: example of stuck looping atomic_cmpset_int while building graphics/poppler-qt5 [a tested fix included]
Mark Millard
marklmi at yahoo.com
Wed Jan 2 09:38:06 UTC 2019
On 2019-Jan-1, at 18:43, Mark Millard <marklmi at yahoo.com> wrote:
> The below showed up for poudiere-devel bulk getting stuck using one FreeBSD
> cpu while building graphics/poppler-qt5 . This is not a kevent hang-up, unlike
> the last hang-up that I analyzed. I do not yet know how repeatable this is
> but the original hang-up and the one experiment the below is from.
>
> From top:
>
> PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND
> 12789 root 4 52 0 166M 33M 0 uwait 6 36:06 97.22% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/poppler-qt5/work/poppler-0
>
> Note: The vast margority of the 36:06 has been stuck in the uwait loop involved.
>
> From ps -auxd:
>
> root 94075 0.0 0.0 12932 3552 1 S+ 10:42 0:01.21 | `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root 1944 0.0 0.0 12932 3540 1 I+ 10:42 0:00.00 | |-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
> root 1957 0.0 0.0 12932 3556 1 I 10:42 0:00.04 | |-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> root 12328 0.0 0.0 12932 3548 1 I 10:49 0:00.00 | | `-- sh: poudriere[FBSDFSSDjailArmV7-default][01]: build_pkg (poppler-qt5-0.72.0) (sh)
> root 12329 0.0 0.0 10328 1756 1 IJ 10:49 0:00.01 | | `-- /usr/bin/make -C /usr/ports/graphics/poppler-qt5 stage
> root 12350 0.0 0.0 9860 1248 1 IJ 10:49 0:00.00 | | `-- /usr/bin/make -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/poppler-qt5/work/stage install
> root 12353 0.0 0.0 10236 1664 1 IJ 10:49 0:00.05 | | `-- /nxb-bin/usr/bin/make -f CMakeFiles/Makefile2 qt5/all
> root 12787 0.0 0.0 9856 1236 1 IJ 10:50 0:00.00 | | `-- /nxb-bin/usr/bin/make -f qt5/tests/CMakeFiles/check_qt5_attachments_autogen.dir/build.make qt5/test
> root 12789 100.0 0.0 169868 33528 1 IJ 10:50 36:35.26 | | `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/graphics/
> root 94423 0.0 0.0 12932 3484 1 S+ 10:42 0:12.91 | `-- sh -e /usr/local/share/poudriere/bulk.sh -jFBSDFSSDjailArmV7 -w graphics/poppler-qt5
>
>
> (gdb) attach 12789
> Attaching to process 12789
> Reading symbols from /usr/local/poudriere/data/.m/FBSDFSSDjailArmV7-default/01/usr/local/bin/qemu-arm-static...done.
> [New LWP 101168 of process 12789]
> [New LWP 101178 of process 12789]
> [New LWP 101499 of process 12789]
> [Switching to LWP 100304 of process 12789]
> _umtx_op () at _umtx_op.S:3
> 3 RSYSCALL(_umtx_op)
> (gdb) info threads
> Id Target Id Frame
> * 1 LWP 100304 of process 12789 _umtx_op () at _umtx_op.S:3
> 2 LWP 101168 of process 12789 _umtx_op_err () at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> 3 LWP 101178 of process 12789 _umtx_op () at _umtx_op.S:3
> 4 LWP 101499 of process 12789 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> (gdb) thread 4
> [Switching to thread 4 (LWP 101499 of process 12789)]
> #0 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> 220 ATOMIC_CMPSET(int);
>
> (gdb) bt
> #0 0x0000000060051c26 in atomic_cmpset_int (dst=<optimized out>, expect=<optimized out>, src=536870912) at /usr/include/machine/atomic.h:220
> #1 tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> #2 freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> #3 0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
> at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> #4 0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
> arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> #5 0x00000000600392f0 in target_cpu_loop (env=0x86159b118) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/arm/target_arch_cpu.h:207
> #6 0x0000000060038c99 in cpu_loop (env=0xf4dede80) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/main.c:121
> #7 0x0000000060050c1a in new_freebsd_thread_start (arg=<optimized out>) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:152
> #8 0x00000000601ad5f6 in thread_start (curthread=0x861571300) at /usr/src/lib/libthr/thread/thr_create.c:291
> #9 0x0000000000000000 in ?? ()
> Backtrace stopped: Cannot access memory at address 0x7fffdfbb3000
>
> 181 #define ATOMIC_CMPSET(TYPE) \
> 182 static __inline int \
> 183 atomic_cmpset_##TYPE(volatile u_##TYPE *dst, u_##TYPE expect, u_##TYPE src) \
> 184 { \
> (gdb)
> 185 u_char res; \
> 186 \
> 187 __asm __volatile( \
> 188 " " MPLOCKED " " \
> 189 " cmpxchg %3,%1 ; " \
> 190 " sete %0 ; " \
> 191 "# atomic_cmpset_" #TYPE " " \
> 192 : "=q" (res), /* 0 */ \
> 193 "+m" (*dst), /* 1 */ \
> 194 "+a" (expect) /* 2 */ \
> (gdb)
> 195 : "r" (src) /* 3 */ \
> 196 : "memory", "cc"); \
> 197 return (res); \
> 198 } \
> . . .
> 218 ATOMIC_CMPSET(char);
> 219 ATOMIC_CMPSET(short);
> 220 ATOMIC_CMPSET(int);
> 221 ATOMIC_CMPSET(long);
>
> (gdb) up
> #1 tcmpset_32 (addr=<optimized out>, a=<optimized out>, b=536870912) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:178
> 178 return atomic_cmpset_acq_32(addr, current, new);
> (gdb) list
> 173 static int tcmpset_32(uint32_t *addr, uint32_t a, uint32_t b)
> 174 {
> 175 uint32_t current = tswap32(a);
> 176 uint32_t new = tswap32(b);
> 177
> 178 return atomic_cmpset_acq_32(addr, current, new);
> 179 }
>
> (gdb) up
> #2 freebsd_rw_unlock (target_addr=4108246528) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.c:1264
> 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) {
> (gdb) list
> 1259 }
> 1260 }
> 1261 } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
> 1262 /* decrement reader count */
> 1263 for (;;) {
> 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) {
> 1265 if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> 1266 unlock_user_struct(target_urwlock,
> 1267 target_addr, 1);
> 1268 return -TARGET_EPERM;
> (gdb)
> 1269 }
> 1270 } else {
> 1271 break;
> 1272 }
> 1273 }
> 1274 } else {
> 1275 unlock_user_struct(target_urwlock, target_addr, 1);
> 1276 return -TARGET_EPERM;
> 1277 }
> 1278
>
> The above loop is where the code is stuck looping, using most of the capacity of
> one cpu (as FreeBSD classifies cpus). The value of state is stuck at:
>
> (gdb) print state
> $2 = 536870913
>
> I'll note that the loop is inside qemu-arm-atatic, not in the cmake that it
> is running.
>
>
>
> To show a little more context before the assembler for the routine that ends up
> containing that loop:
>
> (gdb) up
> #3 0x000000006004ab33 in do_freebsd__umtx_op (obj=<optimized out>, op=536870912, val=<optimized out>, uaddr=<optimized out>, target_time=<optimized out>)
> at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/freebsd/os-thread.h:474
> 474 ret = freebsd_rw_unlock(obj);
> (gdb) list
> 469 ret = freebsd_rw_wrlock(obj, val, 0, NULL);
> 470 }
> 471 break;
> 472
> 473 case TARGET_UMTX_OP_RW_UNLOCK:
> 474 ret = freebsd_rw_unlock(obj);
> 475 break;
> 476
> 477 #if defined(__FreeBSD_version) && __FreeBSD_version > 900000
> 478 #ifdef UMTX_OP_MUTEX_WAKE2
>
> (gdb) up
> #4 0x0000000060041b83 in do_freebsd_syscall (cpu_env=0x86159b118, num=454, arg1=<optimized out>, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=-184411592, arg7=-199471616,
> arg8=-1622188640) at /wrkdirs/usr/ports/emulators/qemu-user-static/work/qemu-bsd-user-4ef7d07/bsd-user/syscall.c:1364
> 1364 ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> (gdb) list
> 1359 ret = do_freebsd__umtx_unlock(arg1);
> 1360 break;
> 1361 #endif
> 1362
> 1363 case TARGET_FREEBSD_NR__umtx_op: /* undocumented */
> 1364 ret = do_freebsd__umtx_op(arg1, arg2, arg3, arg4, arg5);
> 1365 break;
> 1366
> 1367 /*
> 1368 * ioctl(2)
>
> As for the assembler level view for the routine containing the
> loop (via inlining and such):
>
> Dump of assembler code for function freebsd_rw_unlock:
> 0x0000000060051bd0 <+0>: push %rbp
> 0x0000000060051bd1 <+1>: mov %rsp,%rbp
> 0x0000000060051bd4 <+4>: push %rbx
> 0x0000000060051bd5 <+5>: push %rax
> 0x0000000060051bd6 <+6>: mov %edi,%ebx
> 0x0000000060051bd8 <+8>: mov $0x20,%esi
> 0x0000000060051bdd <+13>: mov $0x3,%edx
> 0x0000000060051be2 <+18>: callq 0x60036b70 <page_check_range>
> 0x0000000060051be7 <+23>: mov %eax,%ecx
> 0x0000000060051be9 <+25>: mov $0xfffffff2,%eax
> 0x0000000060051bee <+30>: test %ecx,%ecx
> 0x0000000060051bf0 <+32>: jne 0x60051c50 <freebsd_rw_unlock+128>
> 0x0000000060051bf2 <+34>: mov %ebx,%edi
> 0x0000000060051bf4 <+36>: add 0x26a6355(%rip),%rdi # 0x626f7f50 <guest_base>
> 0x0000000060051bfb <+43>: je 0x60051c50 <freebsd_rw_unlock+128>
> 0x0000000060051bfd <+45>: mov (%rdi),%ecx
> 0x0000000060051bff <+47>: mov 0x4(%rdi),%edx
> 0x0000000060051c02 <+50>: test %ecx,%ecx
> 0x0000000060051c04 <+52>: js 0x60051c30 <freebsd_rw_unlock+96>
> 0x0000000060051c06 <+54>: test $0x1fffffff,%ecx
> 0x0000000060051c0c <+60>: je 0x60051c4b <freebsd_rw_unlock+123>
> 0x0000000060051c0e <+62>: lea -0x1(%rcx),%esi
> 0x0000000060051c11 <+65>: nopw %cs:0x0(%rax,%rax,1)
> 0x0000000060051c1b <+75>: nopl 0x0(%rax,%rax,1)
> NOTE: loop is below:
> 0x0000000060051c20 <+80>: mov %ecx,%eax
> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
> 0x0000000060051c26 <+86>: sete %bl
> 0x0000000060051c29 <+89>: test %bl,%bl
> 0x0000000060051c2b <+91>: je 0x60051c20 <freebsd_rw_unlock+80>
> Note: loop is above
> 0x0000000060051c2d <+93>: jmp 0x60051c57 <freebsd_rw_unlock+135>
> 0x0000000060051c2f <+95>: nop
> 0x0000000060051c30 <+96>: mov %ecx,%esi
> 0x0000000060051c32 <+98>: and $0x7fffffff,%esi
> 0x0000000060051c38 <+104>: mov %ecx,%eax
> 0x0000000060051c3a <+106>: lock cmpxchg %esi,(%rdi)
> 0x0000000060051c3e <+110>: sete %bl
> 0x0000000060051c41 <+113>: test %bl,%bl
> 0x0000000060051c43 <+115>: jne 0x60051c57 <freebsd_rw_unlock+135>
> 0x0000000060051c45 <+117>: mov (%rdi),%ecx
> 0x0000000060051c47 <+119>: test %ecx,%ecx
> 0x0000000060051c49 <+121>: js 0x60051c30 <freebsd_rw_unlock+96>
> 0x0000000060051c4b <+123>: mov $0xffffffff,%eax
> 0x0000000060051c50 <+128>: add $0x8,%rsp
> 0x0000000060051c54 <+132>: pop %rbx
> 0x0000000060051c55 <+133>: pop %rbp
> 0x0000000060051c56 <+134>: retq
> 0x0000000060051c57 <+135>: test $0x2,%dl
> 0x0000000060051c5a <+138>: jne 0x60051c70 <freebsd_rw_unlock+160>
> 0x0000000060051c5c <+140>: test $0x40000000,%ecx
> 0x0000000060051c62 <+146>: jne 0x60051c82 <freebsd_rw_unlock+178>
> 0x0000000060051c64 <+148>: xor %eax,%eax
> 0x0000000060051c66 <+150>: test $0x20000000,%ecx
> 0x0000000060051c6c <+156>: je 0x60051c50 <freebsd_rw_unlock+128>
> 0x0000000060051c6e <+158>: jmp 0x60051c89 <freebsd_rw_unlock+185>
> 0x0000000060051c70 <+160>: test $0x20000000,%ecx
> 0x0000000060051c76 <+166>: jne 0x60051c89 <freebsd_rw_unlock+185>
> 0x0000000060051c78 <+168>: xor %eax,%eax
> 0x0000000060051c7a <+170>: test $0x40000000,%ecx
> 0x0000000060051c80 <+176>: je 0x60051c50 <freebsd_rw_unlock+128>
> 0x0000000060051c82 <+178>: mov $0x1,%edx
> 0x0000000060051c87 <+183>: jmp 0x60051c8e <freebsd_rw_unlock+190>
> 0x0000000060051c89 <+185>: mov $0x7fffffff,%edx
> 0x0000000060051c8e <+190>: mov $0x3,%esi
> 0x0000000060051c93 <+195>: xor %ecx,%ecx
> 0x0000000060051c95 <+197>: xor %r8d,%r8d
> 0x0000000060051c98 <+200>: callq 0x601b5970 <_umtx_op>
> 0x0000000060051c9d <+205>: mov %eax,%edi
> 0x0000000060051c9f <+207>: add $0x8,%rsp
> 0x0000000060051ca3 <+211>: pop %rbx
> 0x0000000060051ca4 <+212>: pop %rbp
> --Type <RET> for more, q to quit, c to continue without paging--
> 0x0000000060051ca5 <+213>: jmpq 0x60040a40 <get_errno>
> End of assembler dump.
>
> For just before the "lock cmpxchg %esi,(%rdi)" in the loop:
>
> rax 0x20000001 536870913
> rbx 0xf4dede00 4108246528
> rcx 0x20000001 536870913
> rdx 0x2 2
> rsi 0x20000000 536870912
> rdi 0xf4dede80 4108246656
>
> (gdb) x 0xf4dede80
> 0xf4dede80: 0x40000001
>
> 0x0000000060051c20 <+80>: mov %ecx,%eax
> => 0x0000000060051c22 <+82>: lock cmpxchg %esi,(%rdi)
> 0x0000000060051c26 <+86>: sete %bl
> 0x0000000060051c29 <+89>: test %bl,%bl
> 0x0000000060051c2b <+91>: je 0x60051c20 <freebsd_rw_unlock+80>
>
>
> At this point I do not have interpretation of the details, not even
> a comparison to the source code.
Note: The problem was observed in building ports head -r488859 's graphics/poppler-qt5 .
In the below code I've added a missing statement:
> 1261 } else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
> 1262 /* decrement reader count */
> 1263 for (;;) {
> 1264 if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) {
__get_user(state, &target_urwlock->rw_state);
> 1265 if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> 1266 unlock_user_struct(target_urwlock,
> 1267 target_addr, 1);
> 1268 return -TARGET_EPERM;
> (gdb)
> 1269 }
> 1270 } else {
> 1271 break;
> 1272 }
> 1273 }
The lack of the update to 'state' is why it was valid for the
compiler to eliminate the code for:
> 1265 if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
> 1266 unlock_user_struct(target_urwlock,
> 1267 target_addr, 1);
> 1268 return -TARGET_EPERM;
> (gdb)
> 1269 }
So (without old line numbers):
} else if (TARGET_URWLOCK_READER_COUNT(state) != 0) {
/* decrement reader count */
for (;;) {
if (!tcmpset_32(&target_urwlock->rw_state, state, (state - 1))) {
__get_user(state, &target_urwlock->rw_state);
if (TARGET_URWLOCK_READER_COUNT(state) == 0) {
unlock_user_struct(target_urwlock,
target_addr, 1);
return -TARGET_EPERM;
}
} else {
break;
}
}
This follows the structure of other tcmpset_32 use in the source file.
With this change poudriere-devel's bulk worked for graphics/poppler-qt5
as a amd64->armv7 cross-build (FreeBSD head -r341836 based, under Hyper-V,
with 28 logical-processors assigned):
[00:00:13] [01] [00:00:00] Building graphics/poppler-qt5 | poppler-qt5-0.72.0
[00:08:03] [01] [00:07:50] Finished graphics/poppler-qt5 | poppler-qt5-0.72.0: Success
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-arm
mailing list