CAM Target Layer and Linux (continued)

Thu Oct 4 15:18:00 UTC 2012

On Oct 4, 2012, at 2:52 AM, Chuck Tuffli <ctuffli at gmail.com> wrote:

> On Tue, Oct 2, 2012 at 3:03 AM, Nikolay Denev <ndenev at gmail.com> wrote:
>> 
>> On Sep 27, 2012, at 6:33 PM, Nikolay Denev <ndenev at gmail.com> wrote:
>> 
>>> Hi All,
>>> 
>>> With the help of Chuck Tuffli, I'm now able to use CTL to export a zvol over FC to a Linux host:
>>> 
>>> LUN Backend       Size (Blocks)   BS Serial Number    Device ID
>>>  0 block            4185915392  512 FBSDZFS001       ORA_ASM_01
>>>      lun_type=0
>>>      num_threads=14
>>>      file=/dev/zvol/tank/oracle_asm_01
>>>  1 block            4185915392  512 FBSDZFS002       ORA_ASM_02
>>>      lun_type=0
>>>      num_threads=14
>>>      file=/dev/zvol/tank/oracle_asm_02
>>>  2 block            4185915392  512 FBSDZFS003       ORA_ASM_03
>>>      lun_type=0
>>>      num_threads=14
>>>      file=/dev/zvol/tank/oracle_asm_03
>>>  3 block            4185915392  512 FBSDZFS004       ORA_ASM_04
>>>      lun_type=0
>>>      num_threads=14
>>>      file=/dev/zvol/tank/oracle_asm_04
>>> 
>>> Then we ran some tests using Oracle's ORION benchmark tool from the Linux host.
>>> We ran one test which passed successfully,
>>> then I've just disabled zfs prefetch -> "vfs.zfs.prefetch_disable=1"
>>> and rerun the test, which failed due to this error.
>>> 
>>> On the FreeBSD side:
>>> 
>>> (0:3:0:1): READ(10). CDB: 28 0 84 f9 58 0 0 4 0 0
>>> (0:3:0:1): Tag: 0x116220, Type: 1
>>> (0:3:0:1): CTL Status: SCSI Error
>>> (0:3:0:1): SCSI Status: Check Condition
>>> (0:3:0:1): SCSI sense: NOT READY asc:4b,0 (Data phase error)
> ...
>> After a whole day of orion tests without problems, we started an Oracle ASM instance from the Linux host and
>> again got an error, this time it was WRITE error :
>> 
>> (0:3:0:3): WRITE(10). CDB: 2a 0 1 5b 10 0 0 4 0 0
>> (0:3:0:3): Tag: 0x110940, Type: 1
>> (0:3:0:3): CTL Status: SCSI Error
>> (0:3:0:3): SCSI Status: Check Condition
>> (0:3:0:3): SCSI sense: NOT READY asc:4b,0 (Data phase error)
>> 
>> I've tried to track down this "Data phase error" in the CTL code and it looks like it is something related to the isp(4) driver:
> 
> This would have been my first guess if there had been something in the
> logs from isp, but since there wasn't, it's hard to tell. I been
> running orion for ~3hrs now with a different FC driver + an analyzer
> but haven't seen this problem.
> 
> Would it be possible to stick some prints in default clause of the
> ctlfedone() to confirm if this is front or back end problem?
> Especially interesting would be the value of done_ccb->ccb_h.status.
> 
> ---chuck

I have added the printfs like this :

--- sys/cam/ctl/scsi_ctl.c.orig	2012-10-04 10:52:57.413144029 +0200
+++ sys/cam/ctl/scsi_ctl.c	2012-10-04 11:23:35.501143149 +0200
@@ -1415,6 +1415,7 @@
 				 */
 				io->io_hdr.port_status = 0xbad1;
 				ctl_set_data_phase_error(&io->scsiio);
+				printf("XXX: done_ccb->ccb_h.status = %lu\n", (long unsigned int)done_ccb->ccb_h.status);
 				/*
 				 * XXX KDM figure out residual.
 				 */

But I've postponed the tests as the pool got nearly filled up, and probably the ZVOLs became very fragmented
and they were extremely slow to access and generated scsi timeout and abort command errors from the Linux host.
Even deleting them took maybe 40 minutes.

Also there was some bad interaction while accessing the zvols over CAM and at the same time using a nfs share from this host,
which bring all disk IO on the pool almost to a stop.

I will create a new zvol tomorrow and retest with the printf enabled, while the machine is idle (no nfs activity).