FreeBSD 7-STABLE, isp(4), QLE2462: panic & deadlocks
Panagiotis Christias
p.christias at noc.ntua.gr
Sun Nov 16 17:13:23 PST 2008
On Wed, Oct 15, 2008 at 08:54:53PM +0300, Panagiotis Christias wrote:
> On Wed, Oct 15, 2008 at 09:44:15AM +0400, Oleg Sharoiko wrote:
> > Hi!
> >
> > On Wed, 2008-10-15 at 01:23 +0300, Panagiotis Christias wrote:
> >
> > > However, when we connect them to the CX3-40, create and mount a new
> > > partition and then do something as simple as "tar -C /san -xf ports.tgz"
> > > the system panics and deadlocks. We have tried several FreeBSD versions
> > > (6.3 i386/adm64, 7.0 i386/adm64, 7.1 i386/adm64 and lastly 7-STABLE i386
> > > - we also tried the latest 8-CURRENT snapshot but it panicked too soon).
> > > The result is always the same; panic and deadlock.
> >
> > Try reducing the number of "tagged openings" with 'camcontrol tags' down
> > to 46. If it doesn't work try reducing it further to 2. Also be advised
> > that I've seen panics with geom_multipath in FreeBSD-7, unfortunately I
> > had no time to test it in -current.
>
>
> Hm.. that would probably explain the fact that I was unable to panic the
> system when I had set the hint.isp.0.debug="0x1F" in /boot/device.hints.
>
> Currently I am stress testing the server with the tagged openings set to
> 44 (first value tested). Until now there is no panic or deadlock. I am
> trying concurrent tar extractions and rsync copies. The filesystem looks
> ok till now according to fsck. I will let it write/copy/delete overnight
> and tomorrow I will try different tagged opening values.
>
> Thank you for the hint! I am wondering what is the performance penalty
> with decreased tagged openings. Also, is there anything else I could try
> in order to get more useful debug output? I have at least three servers
> that I could use for any kind of tests and I am willing to spend as much
> time I can get to help solving the problem.
>
> Finally, the only output in the logs is:
>
> Expensive timeout(9) function: 0xc06f4210(0xc67e1200) 0.059422635 s
> Expensive timeout(9) function: 0xc08d4fd0(0) 0.060676147 s
>
> I suppose that is related to the CAMDEBUG kernel config options.
For the record, I have done many tests using several stressing tools
in parallel, different FreeBSD versions (up to 7.1beta2), various
filesystem configurations (plain ufs2 with softupdates, ufs2 and
gjournal, zfs) and various tag openings values (down to 2).
Regardless of the configuration, the system deadlocks, panics or the
filesystem gets awfully corrupted within seconds, minutes or a few hours.
The only configuration that seems to work without problems(?) but with
a unacceptable *severe* performance penalty is when tag openings are set
to minimum value of 2 (that is more or less same as disabling tagged
command queueing at all).
All tests ran using a 500 GB RAID5 LUN on an EMC Clariion CX340:
da0 at isp0 bus 0 target 0 lun 0
da0: <DGC RAID 5 0326> Fixed Direct Access SCSI-4 device
da0: Serial Number CK200083100148
da0: 400.000MB/s transfers
da0: Command Queueing Enabled
da0: 512000MB (1048576000 512 byte sectors: 255H 63S/T 65270C)
Previously, a Sun StorEdge T3 was tested which worked flawlessly but
it had a 1 Gbps fibre channel interface, instead of a 4 Gbps that
Clariion has, was recognized as a SCSI-3 device and had 2 tags openings
(no surprise) by default:
da1 at isp1 bus 0 target 0 lun 0
da1: <SUN T300 0302> Fixed Direct Access SCSI-3 device
da1: 100.000MB/s transfers
da1: 241724MB (495050752 512 byte sectors: 255H 63S/T 30815C)
As I mentioned before, I am willing to spend time or/and provide
access to the system for testing and debugging.
Regards,
Panagiotis
--
Panagiotis J. Christias Network Management Center
P.Christias at noc.ntua.gr National Technical Univ. of Athens, GREECE
More information about the freebsd-scsi
mailing list