Re: ZFS deadlock in 14
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 12 Aug 2023 04:55:40 UTC
The poudriere build machine building amd64 packages also panicked. But with: Dumping 2577 out of 8122 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91 % __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:59 59 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu , (kgdb) #0 __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:5 9 #1 doadump (textdump=textdump@entry=1) at /opt/src/git-src/sys/kern/kern_shutdown.c:407 #2 0xffffffff806c10e0 in kern_reboot (howto=260) at /opt/src/git-src/sys/kern/kern_shutdown.c:528 #3 0xffffffff806c15df in vpanic ( fmt=0xffffffff80b6c5f5 "%s: possible deadlock detected for %p (%s), blocked for %d ticks\n", ap=ap@entry=0xfffffe008e698e90) at /opt/src/git-src/sys/kern/kern_shutdown.c:972 #4 0xffffffff806c1383 in panic (fmt=<unavailable>) at /opt/src/git-src/sys/kern/kern_shutdown.c:896 #5 0xffffffff8064a5ea in deadlkres () at /opt/src/git-src/sys/kern/kern_clock.c:201 #6 0xffffffff80677632 in fork_exit (callout=0xffffffff8064a2c0 <deadlkres>, arg=0x0, frame=0xfffffe008e698f40) at /opt/src/git-src/sys/kern/kern_fork.c:1162 #7 <signal handler called> (kgdb) This is consistent with PR/271945. Reducing -J to 1 or 5:1 circumvents this panic. This is certainly a different panic from the one experienced on the poudriere builder building i386 packages. Both machines run in amd64 mode. -- Cheers, Cy Schubert <Cy.Schubert@cschubert.com> FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org NTP: <cy@nwtime.org> Web: https://nwtime.org e^(i*pi)+1=0 Cy Schubert writes: > This is new. Instead of affecting the machine with poudriere building amd64 > packages, it affected the other machine with poudriere building i386 > packages. This is new since the two recent ZFS patches. > > Don't get me wrong, the two new patches have resulted in I believe better > availability of the poudriere machine building amd64 packages. I doubt the > two patches caused this but they may have exposed this problem, probably > fixed by another patch or two. > > Sorry, there was no dump produced by this panic. I'll need to check the > config of this machine, swap is a gmirror, which it doesn't like to dump > to. Below are serial console messages captured by conserver. > > panic: vm_page_dequeue_deferred: page 0xfffffe00028fb0d0 has unexpected > queue state^M > cpuid = 3^M > time = 1691807572^M > KDB: stack backtrace:^M > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe00c50bc600^M > vpanic() at vpanic+0x132/frame 0xfffffe00c50bc730^M > panic() at panic+0x43/frame 0xfffffe00c50bc790^M > vm_page_dequeue_deferred() at vm_page_dequeue_deferred+0xb2/frame > 0xfffffe00c50bc7a0^M > vm_page_free_prep() at vm_page_free_prep+0x11b/frame 0xfffffe00c50bc7c0^M > vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfffffe00c50bc7f0^M > vm_object_page_remove() at vm_object_page_remove+0xb6/frame > 0xfffffe00c50bc850^M > vn_pages_remove_valid() at vn_pages_remove_valid+0x48/frame > 0xfffffe00c50bc880^M > zfs_rezget() at zfs_rezget+0x35/frame 0xfffffe00c50bca60^M > zfs_resume_fs() at zfs_resume_fs+0x1c8/frame 0xfffffe00c50bcab0^M > zfs_ioc_rollback() at zfs_ioc_rollback+0x157/frame 0xfffffe00c50bcb00^M > zfsdev_ioctl_common() at zfsdev_ioctl_common+0x612/frame > 0xfffffe00c50bcbc0^M > zfsdev_ioctl() at zfsdev_ioctl+0x12a/frame 0xfffffe00c50bcbf0^M > devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfffffe00c50bcc40^M > vn_ioctl() at vn_ioctl+0xc2/frame 0xfffffe00c50bccb0^M > devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00c50bccd0^M > kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe00c50bcd30^M > sys_ioctl() at sys_ioctl+0x152/frame 0xfffffe00c50bce00^M > amd64_syscall() at amd64_syscall+0x138/frame 0xfffffe00c50bcf30^M > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c50bcf30^M > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x20938296107a, rsp = > 0x209379aeee18, rbp = 0x209379aeee90 ---^M > Uptime: 42m33s^M > Automatic reboot in 15 seconds - press a key on the console to abort^M > Rebooting...^M > cpu_reset: Restarting BSP^M > cpu_reset_proxy: Stopped CPU 3^M > > > -- > Cheers, > Cy Schubert <Cy.Schubert@cschubert.com> > FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org > NTP: <cy@nwtime.org> Web: https://nwtime.org > > e^(i*pi)+1=0 > > > Cy Schubert writes: > > I haven't experienced any problems (yet) either. > > > > > > -- > > Cheers, > > Cy Schubert <Cy.Schubert@cschubert.com> > > FreeBSD UNIX: <cy@FreeBSD.org> Web: https://FreeBSD.org > > NTP: <cy@nwtime.org> Web: https://nwtime.org > > > > e^(i*pi)+1=0 > > > > > > In message <CAK7dMtDJQtaai3_6VjEkwVwW5JN6e8v=kKTOPffp371xb=ORUg@mail.gmail. > c > > om> > > , Kevin Bowling writes: > > > The two MFVs on head have improved/fixed stability with poudriere for > > > me 48 core bare metal. > > > > > > On Thu, Aug 10, 2023 at 6:37=E2=80=AFAM Cy Schubert <Cy.Schubert@cschuber > t. > > = > > > com> wrote: > > > > > > > > In message <CAK7dMtDJeuf8rjWbsNEZABUfeqpjUyCHzuOL9AAhKk93sy+PKg@mail.gm > ai > > = > > > l.c > > > > om> > > > > , Kevin Bowling writes: > > > > > Possibly https://github.com/openzfs/zfs/commit/2cb992a99ccadb78d97049 > b4 > > = > > > 0bd4=3D > > > > > 42eb4fdc549d > > > > > > > > > > On Tue, Aug 8, 2023 at 10:08=3DE2=3D80=3DAFAM Dag-Erling Sm=3DC3=3DB8 > rg > > = > > > rav <des@freeb=3D > > > > > sd.org> wrote: > > > > > > > > > > > > At some point between 42d088299c (4 May) and f0c9703301 (26 June), > a > > > > > > deadlock was introduced in ZFS. It is still present as of 9c2823ba > e9 > > = > > > (4 > > > > > > August) and is 100% reproducable just by starting poudriere bulk in > a > > > > > > 16-core VM and waiting a few hours until deadlkres kicks in. In th > e > > > > > > latest instance, deadlkres complained about a bash process: > > > > > > > > > > > > #0 sched_switch (td=3D3Dtd@entry=3D3D0xfffffe02fb1d8000, flags > = > > > =3D3Dflags@e=3D > > > > > ntry=3D3D259) at /usr/src/sys/kern/sched_ule.c:2299 > > > > > > #1 0xffffffff80b5a0a3 in mi_switch (flags=3D3Dflags@entry=3D3D > 25 > > = > > > 9) at /u=3D > > > > > sr/src/sys/kern/kern_synch.c:550 > > > > > > #2 0xffffffff80babcb4 in sleepq_switch (wchan=3D3D0xfffff81854 > 3a > > = > > > 9e70, =3D > > > > > pri=3D3D64) at /usr/src/sys/kern/subr_sleepqueue.c:609 > > > > > > #3 0xffffffff80babb8c in sleepq_wait (wchan=3D3D<unavailable>, > p > > = > > > ri=3D3D<=3D > > > > > unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660 > > > > > > #4 0xffffffff80b1c1b0 in sleeplk (lk=3D3Dlk@entry=3D3D0xfffff8 > 18 > > = > > > 543a9e70=3D > > > > > , flags=3D3Dflags@entry=3D3D2121728, ilk=3D3Dilk@entry=3D3D0x0, wmesg > = > > > =3D3Dwmesg@entry=3D > > > > > =3D3D0xffffffff8222a054 "zfs", pri=3D3D<optimized out>, pri@entry=3D3 > D6 > > = > > > 4, timo=3D3D=3D > > > > > timo@entry=3D3D6, queue=3D3D1) at /usr/src/sys/kern/kern_lock.c:310 > > > > > > #5 0xffffffff80b1a23f in lockmgr_slock_hard (lk=3D3D0xfffff818 > 54 > > = > > > 3a9e70=3D > > > > > , flags=3D3D2121728, ilk=3D3D<optimized out>, file=3D3D0xffffffff8125 > 44 > > = > > > fb "/usr/s=3D > > > > > rc/sys/kern/vfs_subr.c", line=3D3D3057, lwa=3D3D0x0) at /usr/src/sys/ > ke > > = > > > rn/kern_=3D > > > > > lock.c:705 > > > > > > #6 0xffffffff80c59ec3 in VOP_LOCK1 (vp=3D3D0xfffff818543a9e00, > f > > = > > > lags=3D > > > > > =3D3D2105344, file=3D3D0xffffffff812544fb "/usr/src/sys/kern/vfs_subr > .c > > = > > > ", line=3