FreeBSD-9.1: machine reboots during snapshot creation, LORs found
Jeremy Chadwick
jdc at koitsu.org
Sun Jun 16 08:49:54 UTC 2013
On Sun, Jun 16, 2013 at 10:02:39AM +0200, Andre Albsmeier wrote:
> On Sun, 16-Jun-2013 at 08:54:41 +0200, Jeremy Chadwick wrote:
> > On Fri, May 31, 2013 at 07:25:23PM +0200, Andre Albsmeier wrote:
> > > On Fri, 31-May-2013 at 16:51:03 +0200, John Baldwin wrote:
> > > > On Friday, May 31, 2013 8:26:11 am Andre Albsmeier wrote:
> > > > > Each day at 5:15 we are generating snapshots on various machines.
> > > > > This used to work perfectly under 7-STABLE for years but since
> > > > > we started to use 9.1-STABLE the machine reboots in about 10%
> > > > > of all cases.
> > > > >
> > > > > After rebooting we find a new snapshot file which is a bit
> > > > > smaller than the good ones and with different permissions
> > > > > It does not succeed a fsck. In this example it is the one
> > > > > whose name is beginning with s3:
> > > > >
> > > > > -r--r----- 1 root operator snapshot 72802894528 29 May 05:15 s2-2013.05.28-03.15.04
> > > > > -r-------- 1 root operator snapshot 72802893824 29 May 05:15 s3-2013.05.29-03.15.03
> > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s4-2013.05.23-06.38.44
> > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s5-2013.05.24-03.15.03
> > > > > -r--r----- 1 root operator snapshot 72802894528 28 May 14:22 s6-2013.05.25-03.15.03
> > > > >
> > > > > After enabling DIAGNOSTIC, WITNESS and INVARIANTS in the kernel
> > > > > I see the following LORs (mksnap_ffs starts exactly at 5:15):
> > > > >
> > > > > May 29 05:15:00 <kern.crit> palveli kernel: lock order reversal:
> > > > > May 29 05:15:00 <kern.crit> palveli kernel: 1st 0xc2371da8 ufs (ufs) @ /src/src-9/sys/kern/vfs_mount.c:1240
> > > > > May 29 05:15:00 <kern.crit> palveli kernel: 2nd 0xc2371ec4 devfs (devfs) @ /src/src-9/sys/ufs/ffs/ffs_vfsops.c:1414
> > > > > May 29 05:15:04 <kern.crit> palveli kernel: lock order reversal:
> > > > > May 29 05:15:04 <kern.crit> palveli kernel: 1st 0xc228471c snaplk (snaplk) @ /src/src-9/sys/ufs/ufs/ufs_vnops.c:976
> > > > > May 29 05:15:04 <kern.crit> palveli kernel: 2nd 0xc22f25e4 ufs (ufs) @ /src/src-9/sys/ufs/ffs/ffs_snapshot.c:1626
> > > > >
> > > > > Unfortunatley no corefiles are being generated ;-(.
> > > > >
> > > > > I have checked and even rebuilt the (UFS1) fs in question
> > > > > from scratch. I have also seen this happen on an UFS2 on
> > > > > another machine and on a third one when running "dump -L"
> > > > > on a root fs.
> > > > >
> > > > > Any hints of how to proceed?
> > > >
> > > > Would it be possible to setup a serial console that is logged on this machine
> > > > to see if it is panic'ing but failing to write out a crashdump?
> > >
> > > I'll try to arrange that. It'll take a bit since this
> > > box is 200 km away...
> > >
> > > Maybe I'll find another one nearby to reproduce it...
> >
> > SPECIFICALLY regarding "lack of crash dumps": I need to see the
> > following:
> >
> > * cat /etc/rc.conf
> > * cat /etc/fstab
> >
> > I may need output from other commands, but shall deal with that when I
> > see output from the above. Thanks.
>
> No problem, see below...
>
> To make a long story short, the machine dumps core perfectly
> (tested that a while ago), but not when dealing with _this_
> issue...
>
> I dump on da1s1b and savecore fetches it from there and puts
> it on /var (sitting on da0), that's faster.
>
> rc.conf (beware, rc.conf.local exists):
> ---------------------------------------
> rcshutdown_timeout=180
> tmpmfs=YES
> tmpsize="$(( `/sbin/sysctl -n hw.usermem` / 3000000 ))m"
> tmpmfs_flags="$tmpmfs_flags -v 1 -n"
>
> background_fsck=NO
>
> nisdomainname=ofw.tld
> pflog_flags=-S
>
> syslogd_flags=-svv
> inetd_enable=YES
> inetd_flags=-l
> named_flags="-S 1000"
> named_chrootdir=""
> rwhod_enable=YES
> sshd_enable=YES
> amd_enable=YES
> amd_flags="-F /etc/amd.conf"
> nfs_client_enable=YES
> nfs_access_cache=2
> mountd_flags=-n
> rpcbind_enable=YES
>
> ntpdate_enable=YES
> ntpdate_hosts=ntp
> ntpd_enable=YES
> ntpd_flags="-p /var/run/ntpd.pid"
>
> nis_client_enable=YES
> nis_client_flags="-s -S ofw.tld,nis-16-1,nis-16-2"
> nis_server_flags=-n
> nis_yppasswdd_flags="-t /var/yp/src/master.passwd -f -v"
>
> defaultrouter=192.168.16.2
>
> keyrate=fast
>
> sendmail_flags="-bd -q5m"
> sendmail_submit_flags="$sendmail_flags -ODaemonPortOptions=Addr=localhost"
> sendmail_msp_queue_flags="-Ac -q30m"
> sendmail_rebuild_aliases=NO
>
> lpd_enable=YES
> lpd_flags=-s
> chkprintcap_enable=YES
> dumpdev=AUTO
> clear_tmp_X=NO
> ldconfig_paths=/usr/local/lib
> ldconfig_paths_aout=""
> entropy_file=/boot/entropy-file
>
>
> rc.conf.local:
> --------------
> hostname=typhon.ofw.tld
> ifconfig_msk0="inet 192.168.24.1/21"
> ifconfig_msk0_alias0="inet 192.168.24.10/32"
>
> named_enable=YES
> nfs_server_enable=YES
>
> nis_client_flags="-s -S ofw.tld,nis-24-1,nis-24-2"
> nis_server_enable=YES
>
> defaultrouter=192.168.24.2
>
> lpd_flags=-l
> dumpdev=/dev/da1s1b
> quota_enable=YES
>
>
> fstab:
> ------
> /dev/da0s1a / ufs noatime,rw 0 1
> /dev/da0s1b none swap sw 0 0
> proc /proc procfs rw 0 0
> /dev/da0s1d /usr ufs noatime,rw 0 2
> /dev/da0s1e /var ufs noatime,nosuid,rw 0 2
>
> /dev/da10p1 /share2 ufs suiddir,groupquota,noatime,nosuid,rw 0 2
> /dev/da10p2 /raid2 ufs userquota,noatime,nosuid,rw 0 2
Thank you. Can you show me output from the following?
* camcontrol devlist
* gpart show -p da1
I'm pretty sure I see the problem, but I want to be extra sure.
--
| Jeremy Chadwick jdc at koitsu.org |
| UNIX Systems Administrator http://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list