FreeBSD-HEAD gets stuck on vnode operations
Roger Pau Monné
roger.pau at citrix.com
Mon May 13 11:18:13 UTC 2013
Hello,
I've set up a FreeBSD-HEAD VM on Xen, and compiled the XENHVM kernel,
last commit in the repository is:
Date: Tue, 7 May 2013 12:39:14 +0000
Subject: [PATCH] By request, add an arrow from NetBSD-0.8 to FreeBSD-1.0.
While here, add a few more NetBSD versions to the tree itself.
Submitted by: Alan Barrett <apb at cequrux.com>
Submitted by: Thomas Klausner <wiz at netbsd.org>
And I've started stressing the VM using the following tight loop:
while [ 1 ]; do make -j12 buildkernel; done
>From time to time, I see the VM getting stuck and when breaking into
KDB this is the output of ps:
pid ppid pgrp uid state wmesg wchan cmd
32343 670 670 25 L *Name Cac 0xfffffe0017183780 sendmail
32342 32256 5273 0 L+ *Name Cac 0xfffffe0017183780 sh
32341 32335 5273 0 L+ *Name Cac 0xfffffe0017183780 cc
32340 32284 5273 0 D+ ufs 0xfffffe000dc76068 ctfconvert
32339 32293 5273 0 L+ *Name Cac 0xfffffe0017183780 ctfconvert
32337 32332 5273 0 L+ *vnode_fr 0xfffffe00872c23c0 cc
32335 32334 5273 0 S+ wait 0xfffffe010537d000 cc
32334 30655 5273 0 S+ wait 0xfffffe010ac974b8 sh
32333 32329 5273 0 L+ *vnode in 0xfffffe0017183180 cc
32332 32331 5273 0 S+ wait 0xfffffe00416a14b8 cc
32331 30655 5273 0 S+ wait 0xfffffe00b9846970 sh
32329 32328 5273 0 S+ wait 0xfffffe0049ace000 cc
32328 30655 5273 0 S+ wait 0xfffffe0041670000 sh
32324 32320 5273 0 L+ *Name Cac 0xfffffe0017183780 cc
32320 32318 5273 0 S+ wait 0xfffffe00416184b8 cc
32318 30655 5273 0 S+ wait 0xfffffe00879fd970 sh
32314 32313 5273 0 L+ *Name Cac 0xfffffe0017183780 cc
32313 32311 5273 0 S+ wait 0xfffffe00b9846000 cc
32312 32309 5273 0 L+ *Name Cac 0xfffffe0017183780 cc
32311 30655 5273 0 S+ wait 0xfffffe00415844b8 sh
32310 32305 5273 0 L+ *Name Cac 0xfffffe0017183780 cc
32309 32307 5273 0 S+ wait 0xfffffe00499624b8 cc
32307 30655 5273 0 S+ wait 0xfffffe010537c970 sh
32305 32304 5273 0 S+ wait 0xfffffe0041669000 cc
32304 30655 5273 0 S+ wait 0xfffffe00173d9970 sh
32303 32298 5273 0 L+ *Name Cac 0xfffffe0017183780 cc
32298 32295 5273 0 S+ wait 0xfffffe0049924000 cc
32295 30655 5273 0 S+ wait 0xfffffe0041631970 sh
32293 30655 5273 0 S+ wait 0xfffffe000d15b000 sh
32284 30655 5273 0 S+ wait 0xfffffe00416684b8 sh
32256 31391 5273 0 S+ select 0xfffffe000d965840 make
32022 30655 5273 0 L+ *Name Cac 0xfffffe0017183780 sh
31391 31386 5273 0 S+ wait 0xfffffe0041680970 sh
31386 30664 5273 0 S+ select 0xfffffe0017942cc0 make
30664 30663 5273 0 S+ wait 0xfffffe004169f4b8 sh
30663 30662 5273 0 S+ select 0xfffffe000d1c0d40 make
30662 30655 5273 0 S+ wait 0xfffffe00b9ddb4b8 sh
30655 5287 5273 0 S+ select 0xfffffe000d1c0040 make
5287 5280 5273 0 S+ wait 0xfffffe004148f970 sh
5280 5278 5273 0 S+ select 0xfffffe000d964540 make
5278 5273 5273 0 S+ wait 0xfffffe000de3a970 sh
5273 736 5273 0 S+ select 0xfffffe000bc81740 make
91658 735 91658 0 Ss+ ttyin 0xfffffe000d6caca8 bash
736 735 736 0 Ss+ wait 0xfffffe000d5084b8 bash
735 734 735 0 Ss select 0xfffffe000bc817c0 screen
734 730 734 0 S+ pause 0xfffffe000d6330a8 screen
730 729 730 0 S+ wait 0xfffffe000d58b000 bash
729 728 729 1001 S+ wait 0xfffffe000d7ab970 su
728 727 728 1001 Ss+ wait 0xfffffe000de3a4b8 sh
727 724 724 1001 S select 0xfffffe000d98f340 sshd
724 664 724 0 Ss select 0xfffffe000d4180c0 sshd
722 721 722 0 L+ *Name Cac 0xfffffe0017183780 bash
721 1 721 0 Ss+ wait 0xfffffe000d7ab000 login
720 1 720 0 Ss+ ttyin 0xfffffe000d0470a8 getty
719 1 719 0 Ss+ ttyin 0xfffffe000d0474a8 getty
718 1 718 0 Ss+ ttyin 0xfffffe000bd914a8 getty
717 1 717 0 Ss+ ttyin 0xfffffe000bd918a8 getty
716 1 716 0 Ss+ ttyin 0xfffffe000bd91ca8 getty
715 1 715 0 Ss+ ttyin 0xfffffe000d0440a8 getty
714 1 714 0 Ss+ ttyin 0xfffffe000d0444a8 getty
713 1 713 0 Ss+ ttyin 0xfffffe000d0448a8 getty
674 1 674 0 Ls *Name Cac 0xfffffe0017183780 cron
670 1 670 25 Ss pause 0xfffffe000d5090a8 sendmail
667 1 667 0 Ls *Name Cac 0xfffffe0017183780 sendmail
664 1 664 0 Ss select 0xfffffe000d964840 sshd
566 1 566 0 Ss select 0xfffffe000d98f4c0 syslogd
472 1 472 0 Ss select 0xfffffe000bc81840 devd
471 1 471 65 Ls *Name Cac 0xfffffe0017183780 dhclient
431 1 431 0 Ss select 0xfffffe000d98fc40 dhclient
18 0 0 0 LL *Name Cac 0xfffffe0017183780 [softdepflush]
17 0 0 0 RL CPU 1 [syncer]
9 0 0 0 DL vlruwt 0xfffffe000d15c000 [vnlru]
8 0 0 0 DL psleep 0xffffffff815305ac [bufdaemon]
7 0 0 0 DL pgzero 0xffffffff81538c94 [pagezero]
6 0 0 0 DL psleep 0xffffffff81537ed0 [vmdaemon]
5 0 0 0 DL psleep 0xffffffff815a4084 [pagedaemon]
4 0 0 0 DL ccb_scan 0xffffffff81300f30 [xpt_thrd]
3 0 0 0 DL waiting_ 0xffffffff81598160 [sctp_iterator]
2 0 0 0 DL balloon 0xffffffff8079b9f0 [balloon]
16 0 0 0 SL xbread 0xfffffe00feffc000 [xenstore_rcv]
15 0 0 0 SL waitev 0xffffffff81538e98 [xenwatch]
14 0 0 0 DL - 0xffffffff8131d008 [yarrow]
13 0 0 0 DL (threaded) [geom]
100023 D - 0xffffffff8158ca50 [g_down]
100022 D - 0xffffffff8158ca48 [g_up]
100021 D - 0xffffffff8158ca40 [g_event]
12 0 0 0 WL (threaded) [intr]
100044 I [irq7: ppc0]
100043 I [swi0: uart]
100042 I [irq12: psm0]
100041 I [irq1: atkbd0]
100038 I [irq28: xenpci0]
100037 I [irq15: ata1]
100036 I [irq14: ata0]
100032 I [swi2: cambio]
100030 I [swi5: fast taskq]
100028 I [swi6: Giant taskq]
100027 I [swi6: task queue]
100020 I [swi3: vm]
100019 I [swi1: netisr 0]
100018 I [swi4: clock]
100017 I [swi4: clock]
100016 I [swi4: clock]
100015 I [swi4: clock]
100014 I [swi4: clock]
100013 I [swi4: clock]
100012 I [swi4: clock]
100011 I [swi4: clock]
11 0 0 0 RL (threaded) [idle]
100010 Run CPU 7 [idle: cpu7]
100009 Run CPU 6 [idle: cpu6]
100008 Run CPU 5 [idle: cpu5]
100007 Run CPU 4 [idle: cpu4]
100006 Run CPU 3 [idle: cpu3]
100005 Run CPU 2 [idle: cpu2]
100004 CanRun [idle: cpu1]
100003 Run CPU 0 [idle: cpu0]
1 0 1 0 SLs wait 0xfffffe000bc86970 [init]
10 0 0 0 DL audit_wo 0xffffffff8159b860 [audit]
0 0 0 0 DLs (threaded) [kernel]
100046 D - 0xffffffff81343440 [deadlkres]
100045 D - 0xfffffe000bcbd600 [mca taskq]
100035 D - 0xfffffe000bcbd700 [acpi_task_2]
100034 D - 0xfffffe000bcbd700 [acpi_task_1]
100033 D - 0xfffffe000bcbd700 [acpi_task_0]
100031 D - 0xfffffe000bcbd900 [kqueue taskq]
100029 D - 0xfffffe000bcbdb00 [thread taskq]
100026 D - 0xfffffe000bcbde00 [ffs_trim taskq]
100024 D - 0xfffffe000bcbe000 [firmware taskq]
100000 D sched 0xffffffff8158cb48 [swapper]
db> tr 17
Tracing pid 17 tid 100055 td 0xfffffe000d14d000
cpustop_handler() at cpustop_handler+0x28/frame 0xffffff8000232cf0
ipi_nmi_handler() at ipi_nmi_handler+0x3f/frame 0xffffff8000232d10
trap() at trap+0x42/frame 0xffffff8000232f20
nmi_calltrap() at nmi_calltrap+0x8/frame 0xffffff8000232f20
--- trap 0x13, rip = 0xffffffff808b4bd6, rsp = 0xffffff8000232fe0, rbp = 0xffffff8120c569a0 ---
_mtx_trylock_flags_() at _mtx_trylock_flags_+0x106/frame 0xffffff8120c569a0
mnt_vnode_next_active() at mnt_vnode_next_active+0x141/frame 0xffffff8120c569e0
ffs_sync() at ffs_sync+0xb5/frame 0xffffff8120c56ab0
sync_fsync() at sync_fsync+0x12d/frame 0xffffff8120c56ae0
VOP_FSYNC_APV() at VOP_FSYNC_APV+0xf0/frame 0xffffff8120c56b10
sched_sync() at sched_sync+0x34c/frame 0xffffff8120c56bb0
fork_exit() at fork_exit+0x84/frame 0xffffff8120c56bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xffffff8120c56bf0
--- trap 0, rip = 0, rsp = 0xffffff8120c56cb0, rbp = 0 ---
The VM can be stuck in this state for quite some time, it generally
varies between a couple of minutes (5-10min) to an hour or two, after
this the VM recovers itself and resumes normal operation. I still have
to test this on a bare metal FreeBSD install, but I would like to ask
if someone has seen a similar behaviour, or if someone is suspicious of
a change that could cause this.
Thanks, Roger.
More information about the freebsd-current
mailing list