changed cable, server still hangs after ~24hrs ...
Marc G. Fournier
scrappy at hub.org
Sun Apr 27 06:05:46 PDT 2003
'K, after the last hang, I got the techs to replace the SCSI cable in the
box, which made no difference ...
I've removed the KVA_PAGES args from the kernel, so that there is nothing
'weird' configured into it, and now aaccli for the 5400 works (I haven't
been able to get my hands on one for the 2120s yet), and am not sure what
sort of info I should be looking at/for (or even what is particularly safe
to run) ... but does any of the above provide *anything*?
Note that this enclosure is one the Intel SR2200(s), and I'm still getting
the occasional 'Time-out', which to me indicates a problem, but according
to the controller:
AAC0> disk show smart
Executing: disk show smart
Smart Method of Enable
Capable Informational Exception Performance Error
C:ID:L Device Exceptions(MRIE) Control Enabled Count
------ ------- ---------------- --------- ----------- ------
0:00:0 Y 6 Y N 0
0:01:0 Y 6 Y N 0
0:02:0 Y 6 Y N 0
0:03:0 Y 6 Y N 0
0:04:0 Y 6 Y N 0
0:05:0 Y 6 Y N 0
I would have expected Error Count to have increased by at least 1 if there
was a problem at the hardware level ... no?
The system itself is a Dual-PIII, 4G of RAM ... Intel MOBO & Chassis, so
the only SCSI cable I'm dealing with is from the MOBO to the backplane
itself ...
The hangs are similar to the original ones, where I'd get TIMEOUT
scrolling up the screen, but since Scott's last "fix" for the 2G
allocation issue, I no longer get the actual error messages ...
On each hang, I've asked the techs to do a 'ctl-alt-esc', but, again, like
before, this doesn't work :(
Help? Anything else I can get the techs to try to eliminate 'hardware' as
the cause? :(
neptune# grep aac /var/log/messages
Apr 27 07:42:02 neptune /kernel: aac0: **Monitor** ID(0:05:0) Abort Time-out. Resetting bus.
Apr 27 07:42:05 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0
Apr 27 09:29:19 neptune /kernel: aac0: <Adaptec SCSI RAID 2120S> mem 0xf8000000-0xfbffffff irq 2 at device 9.0 on pci1
Apr 27 09:29:19 neptune /kernel: aac0: i960RX 100MHz, 48MB cache memory, optional battery present
Apr 27 09:29:19 neptune /kernel: aac0: Kernel 4.0-0, Build 5770, S/N 232fb7
Apr 27 09:29:19 neptune /kernel: aac0: Supported Options=1f7e<CLUSTERS,WCACHE,DATA64,HOSTTIME,RAID50,WINDOW4GB,SOFTERR,NORECOND,SGMAP64,ALARM,NONDASD>
Apr 27 09:29:20 neptune /kernel: aacd0: <RAID 5> on aac0
Apr 27 09:29:20 neptune /kernel: aacd0: 174993MB (358387200 sectors)
Apr 27 09:29:20 neptune /kernel: Mounting root from ufs:/dev/aacd0s1a
neptune# zgrep aac /var/log/messages.0.gz
neptune# zgrep aac /var/log/messages.1.gz
neptune# zgrep aac /var/log/messages.2.gz
Apr 24 14:56:45 neptune /kernel: aac0: **Monitor** ID(0:05:0) Abort Time-out. Resetting bus.
Apr 24 14:56:48 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0
neptune# zgrep aac /var/log/messages.3.gz
Apr 23 02:20:20 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116328, size: 4096
Apr 23 02:20:29 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 104256, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 111896, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116304, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 112576, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116952, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 113144, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 87424, size: 4096
Apr 23 02:20:30 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116312, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 117016, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116408, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 43984, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 116296, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 111224, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 112440, size: 8192
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 104840, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 111856, size: 4096
Apr 23 02:20:31 neptune /kernel: swap_pager: indefinite wait buffer: device: #aacd/0x20001, blkno: 15208, size: 4096
Apr 23 02:20:31 neptune /kernel: aac0: **Monitor** ID(0:01:0) Abort Time-out. Resetting bus.
Apr 23 02:20:31 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0
Apr 23 10:39:30 neptune /kernel: aac0: <Adaptec SCSI RAID 2120S> mem 0xf8000000-0xfbffffff irq 2 at device 9.0 on pci1
Apr 23 10:39:30 neptune /kernel: aac0: i960RX 100MHz, 48MB cache memory, optional battery present
Apr 23 10:39:30 neptune /kernel: aac0: Kernel 4.0-0, Build 5770, S/N 232fb7
Apr 23 10:39:30 neptune /kernel: aac0: Supported Options=1f7e<CLUSTERS,WCACHE,DATA64,HOSTTIME,RAID50,WINDOW4GB,SOFTERR,NORECOND,SGMAP64,ALARM,NONDASD>
Apr 23 10:39:30 neptune /kernel: aacd0: <RAID 5> on aac0
Apr 23 10:39:30 neptune /kernel: aacd0: 174993MB (358387200 sectors)
Apr 23 10:39:30 neptune /kernel: Mounting root from ufs:/dev/aacd0s1a
Apr 23 23:32:39 neptune /kernel: aac0: **Monitor** ID(0:01:0) Abort Time-out. Resetting bus.
Apr 23 23:32:42 neptune /kernel: aac0: **Monitor** SCSI bus reset issued on channel 0
AAC0> controller details
Executing: controller details
Controller Information
----------------------
Remote Computer: S
Device Name: S
Controller Type: No Info
Access Mode: READ-WRITE
Controller Serial Number: Last Six Digits = 232FB7
Number of Buses: 1
Devices per Bus: 15
Controller CPU: i960 R series
Controller CPU Speed: 100 Mhz
Controller Memory: 64 Mbytes
Battery State: Not Present
Component Revisions
-------------------
CLI: 1.0-0 (Build #5263)
API: 1.0-0 (Build #5263)
Miniport Driver: 4.0-0 (Build #5770)
Controller Software: 4.0-0 (Build #5770)
Controller BIOS: 4.0-0 (Build #5770)
Controller Firmware: (Build #5770)
Controller Hardware: 2.64
Scsi Partition Container MultiLevel
C:ID:L Offset:Size Num Type Num Type R/W
------ ------------- --- ------ --- ------ ---
0:00:0 64.0KB:34.1GB 0 RAID-5 0 None RW
0:01:0 64.0KB:34.1GB 0 RAID-5 0 None RW
0:02:0 64.0KB:34.1GB 0 RAID-5 0 None RW
0:03:0 64.0KB:34.1GB 0 RAID-5 0 None RW
0:04:0 64.0KB:34.1GB 0 RAID-5 0 None RW
0:05:0 64.0KB:34.1GB 0 RAID-5 0 None RW
Smart Method of Enable
Capable Informational Exception Performance Error
C:ID:L Device Exceptions(MRIE) Control Enabled Count
------ ------- ---------------- --------- ----------- ------
0:00:0 Y 6 Y N 0
0:01:0 Y 6 Y N 0
0:02:0 Y 6 Y N 0
0:03:0 Y 6 Y N 0
0:04:0 Y 6 Y N 0
0:05:0 Y 6 Y N 0
0:06:0 N
0:06:1 N
0:06:2 N
0:06:3 N
0:06:4 N
0:06:5 N
0:06:6 N
0:06:7 N
C:ID:L Device Type Blocks Bytes/Block Usage Shared Rate
------ -------------- --------- ----------- ---------------- ------ ----
0:00:0 Disk 71687372 512 Initialized NO 320
0:01:0 Disk 71687372 512 Initialized NO 320
0:02:0 Disk 71687372 512 Initialized NO 320
0:03:0 Disk 71687372 512 Initialized NO 320
0:04:0 Disk 71687372 512 Initialized NO 320
0:05:0 Disk 71687372 512 Initialized NO 320
Num Total Oth Stripe Scsi Partition
Label Type Size Ctr Size Usage C:ID:L Offset:Size
----- ------ ------ --- ------ ------- ------ -------------
0 RAID-5 170GB 64KB Open 0:00:0 64.0KB:34.1GB
/dev/aacd0 FreeBSD 0:01:0 64.0KB:34.1GB
0:02:0 64.0KB:34.1GB
0:03:0 64.0KB:34.1GB
0:04:0 64.0KB:34.1GB
0:05:0 64.0KB:34.1GB
Enclosure
ID (C:ID:L) Fan Power Slot Sensor Door Speaker Standard Diagnostic
----------- --- ----- ---- ------ ---- -------- -------- ----------
0 0:06:0 0 2 7 1 0 No SAF-TE PASSED
1 0:06:1 0 0 0 0 0 No SAF-TE FAILED
2 0:06:2 0 0 0 0 0 No SAF-TE FAILED
3 0:06:3 0 0 0 0 0 No SAF-TE FAILED
4 0:06:4 0 0 0 0 0 No SAF-TE FAILED
5 0:06:5 0 0 0 0 0 No SAF-TE FAILED
6 0:06:6 0 0 0 0 0 No SAF-TE FAILED
7 0:06:7 0 0 0 0 0 No SAF-TE FAILED
AAC0> enclosure show temperature
Executing: enclosure show temperature
Enclosure
ID (C:ID:L) Sensor Temperature Threshold Status
----------- ------ ----------- --------- --------
0 0:06:0 0 87 F 120 NORMAL
Is there any other information that I can pull?
More information about the freebsd-scsi
mailing list