kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade)
Markus Gebert
markus.gebert at hostpoint.ch
Fri Jul 5 08:30:01 UTC 2013
The following reply was made to PR kern/179932; it has been noted by GNATS.
From: Markus Gebert <markus.gebert at hostpoint.ch>
To: bug-followup at FreeBSD.org,
=?iso-8859-1?Q?Philipp_M=E4chler?= <philipp.maechler at hostpoint.ch>,
"sean_bruno at yahoo.com" <sean_bruno at yahoo.com>
Cc:
Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade)
Date: Fri, 5 Jul 2013 10:19:58 +0200
Hey Sean
I'm glad to hear you're getting the same controller as ours to test. In =
the meantime it seems that the backported ciss changes from head seem to =
help a lot on the G8 blades with the p220 controllers. It's quite likely =
that the G8 problem is already fixed in head. Of course, we can't be =
sure yet, but still it might be better to focus on the G7 with p410 and =
storage blade, where the issue has occured even with ciss from head. So =
it's good your getting a p410.
We discussed your test scenario. ZFS is known to go nuts and do really =
much IO once a zpool get quite full, so is your goal just to maximise IO =
to reproduce the problem more reliably? Or is there a specific reason =
why you want us to fill a zpool?
Our problem is that half of the G7 blades are productive, so filling the =
zpool is no option there. The second half is where the first half =
replicates all data to, so they're kind of hot standby and we're more =
flexibel doing tests there, but we still have to keep the replication =
running, which makes filling the pool impossible as well.
The day before yesterday we installed the patched kernel that has ciss =
from head and CISS_DEBUG defined on all these standby systems. We run =
zpool scrubs non-stop on all of them to generate IO and as they are =
replication targets, they also receive some amount of write IO. Like =
that, we hope to get a system to stall more often, so we can progress =
more quickly debugging the G7 problem. If you think that more write IO =
would help, we can look into using iozone, but a stated before, we won't =
be able to do things like filling the zpool.
Also, once a G7 blade stalls, is there any information apart from =
alltrace and DDB ciss debug print you want as to pull out of the system?
When reading through the ciss driver source I noticed that the DDB print =
may only outpout information about the first controller. Since the =
storage blade contains a second p410, do you think it'd be worth to =
alter the debug function to print out information about any ciss =
controller in the system?
Markus
More information about the freebsd-scsi
mailing list