Multi processor locking problem under 7.0
Dennis Glatting
freebsd at penx.com
Tue Jan 29 18:27:52 PST 2008
On Tue, 2008-01-29 at 19:00 -0500, John Baldwin wrote:
> On Tuesday 29 January 2008 03:26:44 pm Paul wrote:
> >
> > >I have several systems of two different types running 7.0. One is an IBM
> > >3550 and the other a Dell 2950. The IBMs more than the Dells
> > >consistently seem to have a kernel locking problem during dump.
> > >Specifically, if I execute this command:
> > >
> > > dump 0uaLCf 64 /dev/null /usr
> > >
> > >Dump consistently stops in Phase IV. However, if I set
> > >machdep.hlt_logical_cpus=1, dump does not stop. At the end of this
> > >message is my boot information.
> > >
> > >When logical_cpus=0, the following is typical of what is displayed by
> > >top when dump stops:
> > >
> > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
> > >COMMAND
> > > 926 root 1 4 0 75476K 71744K sbwait 0 0:04 0.00% dump
> > > 928 root 1 20 0 75348K 67740K pause 1 0:02 0.00% dump
> > > 929 root 1 20 0 75348K 67740K pause 1 0:02 0.00% dump
> > > 927 root 1 20 0 75348K 67740K pause 1 0:02 0.00% dump
> > > 919 root 1 8 0 75348K 67144K wait 0 0:00 0.00% dump
> > >
> > >Fooling around a bit I have found that if I truss dump, the dump
> > >continues. On the Dells, if I force disk activity during the dump, such
> > >as executing a ls -lR /usr > /dev/null, the dump finishes.
> > >
> > >I am unsure how to proceed in debugging this problem. It has been around
> > >for a while but I am now installing the IBMs and the dump problem is a
> > >no-starter. Please contact me directly on how to proceed.
> >
> > I have noticed something similar on my Intel test box.
> >
> > When compiling many ports in the tree that is updated on 7.0RC1 with
> > a S5000pal with 2 Quadcore Xeons the process just STOPS. I am using
> > the install disk and have not updated to the latest cvsup release yet
> > (I am trying to make the world now with fingers crossed :) ) I tried
> > it with just one quadcore and the same problem happens.
> >
> > There are no errors on the screen but it no longer proceeds with the
> > port build. When I suspend the process and restart the make in the
> > same session it has no problem getting past this impasse and with a
> > few suspends the make finishes without error. It does not happen
> > every time which is very odd.
> >
> > Based on your description above it seems like it may be the same problem.
> >
> > What do you think?
>
> If you have threads blocked on "vmo_de" then upgrade to the latest RELENG_7 or
> RELENG_7_0 (specifically the sys/kern/subr_sleepqueue.c file) and try again.
>
I got the right file and updated my systems. I ran dump on the IBM
system five times. Dump hung four times, three times when 99.99%
complete. Below is a ps output.
How do I tell what the threads are blocked on?
Daffy> ps -axwHl | grep dump
0 801 1 0 96 0 20952 4060 select Is ??
0:00.00 /usr/sbin/sshd -f /etc/ssh/dumper/sshd_config
0 14682 870 0 8 0 34388 26628 wait I+ p0 0:00.20 dump
0uaLCf 24 /dev/null /usr (dump)
0 14774 14682 0 4 0 34388 30680 sbwait I+ p0 0:01.01
dump: /dev/aacd0s1e: pass 4: 14.97% done, finished in 0:03 at T
0 14775 14774 0 20 0 34388 26644 pause I+ p0 0:00.69 dump
0uaLCf 24 /dev/null /usr (dump)
0 14776 14774 0 20 0 34388 26644 pause I+ p0 0:00.69 dump
0uaLCf 24 /dev/null /usr (dump)
0 14777 14774 0 20 0 34388 26644 pause I+ p0 0:00.69 dump
0uaLCf 24 /dev/null /usr (dump)
600 14896 12552 0 96 0 5900 1184 - R+ p2 0:00.00 grep
dump
More information about the freebsd-amd64
mailing list