Stable SATA pci card for FreeBSD 6.x/7.0
Sebastiaan van Erk
sebster at sebster.com
Thu Aug 21 07:49:25 UTC 2008
Hi,
Cian Hughes wrote:
> Sebastiaan,
> Have you tried connecting your 250GB drives to the troublesome
> controller? If so, does "stressing" them cause the system to panic?
>
> ~Cian Hughes
Thanks for you reply.
I have not tried stress-testing the 250GB drives on the troublesome
controller. The problem with those drives is, that even though they are
mirrored, the data is very important to me and I do not want it to get
corrupted. I do have backups of course, but the problem with data
corruption is that it often takes very long to notice...
I was thinking of buying the Promise SATA300 TX4 PCI Controller. I've
searched on google, and I do see some negative posts on them in
combination with FreeBSD, however they all date back at least 2 years...
Does anybody have positive/negative experiences using this card?
Regards,
Sebastiaan
> --
> University of Bristol Medical School
>
> On 14 Aug 2008, at 10:37, Sebastiaan van Erk wrote:
>
>> Thanks Jonathan,
>>
>> I'm starting to expect it has to be the controller as well. About 20
>> minutes after I posted this message yesterday (and thus 20 minutes
>> after ad6 got disconnected - atacontrol list showed "no device
>> present" for it) the machine crashed while writing to the remaining
>> ad4 drive (kernel panic). I attached the logs below. I also ran the
>> long smart self test on both drives, and no errors were found on
>> either drive (logs also attached).
>>
>> Unfortunately I could not attach the new disks to my mainboard SATA
>> because my mainboard SATA somehow hangs trying to detect them. So I
>> cannot test if *not* using the controller is going to solve the
>> problems, though I'm it seems logical at the moment it has to be the
>> controller, especially if other people have had similar issues.
>>
>> I guess I'll be buying another controller.
>>
>> Regards,
>> Sebastiaan
>>
>> Jonathan Groll wrote:
>>> On Wed, Aug 13, 2008 at 03:10:56PM +0200, Sebastiaan van Erk wrote:
>>>> Hi,
>>>>
>>>> Just an update on this issue.
>>>>
>>>> Quick summary: I fixed the BIOS issues, the hardware monitor issues,
>>>> and the rl0/rl1 watchdog timeout issues (it seems). However I'm
>>>> still having problems with my SATA drives (or at least one of them).
>>>> More info below.
>>>>
>>>> BIOS:
>>>> I flashed my BIOS to the latest version about a year ago, and never
>>>> noticed that there was any problem, but it turns out there was. I
>>>> never reset the BIOS to default factory settings after the upgrade,
>>>> and it seems the settings were corrupt. After having reset the BIOS
>>>> to the "default optimized factory settings" it stopped crashing when
>>>> I go into the H/W monitor and also when using healthd -d (output below):
>>>>
>>>> Temp.= 40.0, 36.0, 66.0; Rot.= 0, 0, 0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.= 0, 0, 0
>>>> Vcore = 1.44, 3.14; Volt. = 3.33, 4.97, 1.95, -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.= 0, 0, 0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 4.97, 1.95, -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.= 0, 0, 0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.= 0, 0, 0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 5.00, 1.95, -0.11, -1.54
>>>>
>>>> This also seems to have fixed the rl0 watchdog timeout problems. I
>>>> no longer see those in my logs.
>>>>
>>>> SATA DRIVES:
>>>>
>>>> I'm still having problems with the SATA drives.
>>>>
>>>> I tried connecting the 1TB Samsung drives to my mainboard, but then
>>>> the box hangs when booting with the "Detecting IDE drives" message.
>>>> The regular (PATA) IDE drives are detected first, and then it
>>>> repeats the "Detecting IDE drives" message to detect the sata
>>>> drives, and hangs. When I connect my 250GB SATA drives to my
>>>> mainboard they detect fine, and the box boots normally.
>>>>
>>>> I did another rsync of my old mirror (the 250GB disks) to the new
>>>> mirror (1TB disks), but again one of the disks got detached. This
>>>> time there are no other messages in the log, the only thing I see is
>>>> the following:
>>>>
>>>> Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5
>>>> Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached
>>>> Aug 13 14:55:38 piglet kernel: subdisk6: detached
>>>> Aug 13 14:55:38 piglet kernel: ad6: detached
>>>> Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6
>>>> disconnected.
>>>> Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to
>>>> size>100K
>>>>
>>>> (unfortunate that the log file just got rotated, but in the new log
>>>> file there is nothing execpt the one expected line:
>>>>
>>>> Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to
>>>> size>100K
>>>>
>>>> So, nothing after the disconnect...
>>>>
>>>> The questions I have now is:
>>>> 1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of
>>>> work for me, but I'll do it if there are SATA driver issues fixed).
>>> I suspect the problem may be the SiI driver in Freebsd. As a reference
>>> point, I've had a similar problem, even on 7-STABLE, but with sparc64
>>> hardware (see earlier post in this thread).
>>> It'll probably be simplest for you to just buy another controller of
>>> another brand. On the other hand, it'll be worth knowing exactly what
>>> is wrong with the SiI driver...
>>> Cheers,
>>> Jonathan
>> Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to
>> size>100K
>> Aug 13 15:11:26 piglet su: sebster to root on /dev/ttyp4
>> Aug 13 15:34:55 piglet kernel:
>> mirror/gm1s1e[WRITE(offset=875450693632, length=2048)]error = 6
>> Aug 13 15:34:55 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450695680,
>> length=2048)]error = 6
>>
>> [snip 335750 similar lines]
>>
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450931200,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450933248,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450935296,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450937344,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450939392,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450941440,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450943488,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450945536,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450947584,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450949632,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450951680,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450953728,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450955776,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450957824,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450959872,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450961920,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450963968,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450966016,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450968064,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450970112,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450972160,
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel:
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450974208,
>> length=2048)]error = 6
>> Aug 13 15:42:23 piglet syslogd: kernel boot file is /boot/kernel/kernel
>> Aug 13 15:42:23 piglet kernel: Copyright (c) 1992-2008 The FreeBSD
>> Project.
>> smartctl version 5.38 [i386-portbld-freebsd6.3] Copyright (C) 2002-8
>> Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF INFORMATION SECTION ===
>> Device Model: SAMSUNG HD103UJ
>> Serial Number: S13PJ1BQ606865
>> Firmware Version: 1AA01112
>> User Capacity: 1,000,204,886,016 bytes
>> Device is: In smartctl database [for details use: -P show]
>> ATA Version is: 8
>> ATA Standard is: ATA-8-ACS revision 3b
>> Local Time is: Thu Aug 14 11:28:13 2008 CEST
>>
>> ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual
>> for details.
>>
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status: (0x02) Offline data collection activity
>> was completed without error.
>> Auto Offline Data Collection: Disabled.
>> Self-test execution status: ( 0) The previous self-test routine
>> completed
>> without error or no self-test has ever
>> been run.
>> Total time to complete Offline
>> data collection: (11811) seconds.
>> Offline data collection
>> capabilities: (0x7b) SMART execute Offline immediate.
>> Auto Offline data collection on/off support.
>> Suspend Offline collection upon new
>> command.
>> Offline surface scan supported.
>> Self-test supported.
>> Conveyance Self-test supported.
>> Selective Self-test supported.
>> SMART capabilities: (0x0003) Saves SMART data before entering
>> power-saving mode.
>> Supports SMART auto save timer.
>> Error logging capability: (0x01) Error logging supported.
>> General Purpose Logging supported.
>> Short self-test routine
>> recommended polling time: ( 2) minutes.
>> Extended self-test routine
>> recommended polling time: ( 198) minutes.
>> Conveyance self-test routine
>> recommended polling time: ( 21) minutes.
>> SCT capabilities: (0x003f) SCT Status supported.
>> SCT Feature Control supported.
>> SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
>> UPDATED WHEN_FAILED RAW_VALUE
>> 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail
>> Always - 0
>> 3 Spin_Up_Time 0x0007 076 076 011 Pre-fail
>> Always - 8010
>> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
>> Always - 8
>> 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail
>> Always - 0
>> 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail
>> Always - 0
>> 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail
>> Offline - 10255
>> 9 Power_On_Hours 0x0032 100 100 000 Old_age
>> Always - 272
>> 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail
>> Always - 0
>> 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age
>> Always - 0
>> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
>> Always - 8
>> 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age
>> Always - 0
>> 183 Unknown_Attribute 0x0032 100 100 000 Old_age
>> Always - 0
>> 184 Unknown_Attribute 0x0033 100 100 099 Pre-fail
>> Always - 0
>> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
>> Always - 0
>> 188 Unknown_Attribute 0x0032 100 100 000 Old_age
>> Always - 0
>> 190 Airflow_Temperature_Cel 0x0022 057 052 000 Old_age
>> Always - 43 (Lifetime Min/Max 43/48)
>> 194 Temperature_Celsius 0x0022 056 050 000 Old_age
>> Always - 44 (Lifetime Min/Max 43/50)
>> 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age
>> Always - 195799724
>> 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age
>> Always - 0
>> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
>> Always - 0
>> 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age
>> Offline - 0
>> 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age
>> Always - 0
>> 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age
>> Always - 0
>> 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age
>> Always - 0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 0
>> Warning: ATA Specification requires self-test log structure revision
>> number = 1
>> Num Test_Description Status Remaining
>> LifeTime(hours) LBA_of_first_error
>> # 1 Offline Completed without error 00% 261
>> -
>> # 2 Offline Aborted by host 40% 251
>> -
>> # 3 Short offline Aborted by host 00% 250
>> -
>>
>> SMART Selective Self-Test Log Data Structure Revision Number (0)
>> should be 1
>> SMART Selective self-test log data structure revision number 0
>> Warning: ATA Specification requires selective self-test log data
>> structure revision number = 1
>> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
>> 1 0 0 Not_testing
>> 2 0 0 Not_testing
>> 3 0 0 Not_testing
>> 4 0 0 Not_testing
>> 5 0 0 Not_testing
>> Selective self-test flags (0x0):
>> After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>> delay.
>>
>> smartctl version 5.38 [i386-portbld-freebsd6.3] Copyright (C) 2002-8
>> Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF INFORMATION SECTION ===
>> Device Model: SAMSUNG HD103UJ
>> Serial Number: S13PJ1BQ607102
>> Firmware Version: 1AA01112
>> User Capacity: 1,000,204,886,016 bytes
>> Device is: In smartctl database [for details use: -P show]
>> ATA Version is: 8
>> ATA Standard is: ATA-8-ACS revision 3b
>> Local Time is: Thu Aug 14 11:28:39 2008 CEST
>>
>> ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual
>> for details.
>>
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status: (0x02) Offline data collection activity
>> was completed without error.
>> Auto Offline Data Collection: Disabled.
>> Self-test execution status: ( 0) The previous self-test routine
>> completed
>> without error or no self-test has ever
>> been run.
>> Total time to complete Offline
>> data collection: (12131) seconds.
>> Offline data collection
>> capabilities: (0x7b) SMART execute Offline immediate.
>> Auto Offline data collection on/off support.
>> Suspend Offline collection upon new
>> command.
>> Offline surface scan supported.
>> Self-test supported.
>> Conveyance Self-test supported.
>> Selective Self-test supported.
>> SMART capabilities: (0x0003) Saves SMART data before entering
>> power-saving mode.
>> Supports SMART auto save timer.
>> Error logging capability: (0x01) Error logging supported.
>> General Purpose Logging supported.
>> Short self-test routine
>> recommended polling time: ( 2) minutes.
>> Extended self-test routine
>> recommended polling time: ( 203) minutes.
>> Conveyance self-test routine
>> recommended polling time: ( 22) minutes.
>> SCT capabilities: (0x003f) SCT Status supported.
>> SCT Feature Control supported.
>> SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
>> UPDATED WHEN_FAILED RAW_VALUE
>> 1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail
>> Always - 0
>> 3 Spin_Up_Time 0x0007 077 077 011 Pre-fail
>> Always - 7810
>> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
>> Always - 10
>> 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail
>> Always - 0
>> 7 Seek_Error_Rate 0x000f 253 253 051 Pre-fail
>> Always - 0
>> 8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail
>> Offline - 9978
>> 9 Power_On_Hours 0x0032 100 100 000 Old_age
>> Always - 272
>> 10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail
>> Always - 0
>> 11 Calibration_Retry_Count 0x0012 100 100 000 Old_age
>> Always - 0
>> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
>> Always - 10
>> 13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age
>> Always - 0
>> 183 Unknown_Attribute 0x0032 100 100 000 Old_age
>> Always - 0
>> 184 Unknown_Attribute 0x0033 100 100 099 Pre-fail
>> Always - 0
>> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
>> Always - 0
>> 188 Unknown_Attribute 0x0032 100 100 000 Old_age
>> Always - 0
>> 190 Airflow_Temperature_Cel 0x0022 059 054 000 Old_age
>> Always - 41 (Lifetime Min/Max 41/46)
>> 194 Temperature_Celsius 0x0022 058 053 000 Old_age
>> Always - 42 (Lifetime Min/Max 41/47)
>> 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age
>> Always - 31616
>> 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age
>> Always - 0
>> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age
>> Always - 0
>> 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age
>> Offline - 0
>> 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age
>> Always - 0
>> 200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age
>> Always - 0
>> 201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age
>> Always - 0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 0
>> Warning: ATA Specification requires self-test log structure revision
>> number = 1
>> Num Test_Description Status Remaining
>> LifeTime(hours) LBA_of_first_error
>> # 1 Offline Completed without error 00% 261
>> -
>> # 2 Offline Aborted by host 40% 251
>> -
>> # 3 Short offline Aborted by host 00% 250
>> -
>>
>> SMART Selective Self-Test Log Data Structure Revision Number (0)
>> should be 1
>> SMART Selective self-test log data structure revision number 0
>> Warning: ATA Specification requires selective self-test log data
>> structure revision number = 1
>> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
>> 1 0 0 Not_testing
>> 2 0 0 Not_testing
>> 3 0 0 Not_testing
>> 4 0 0 Not_testing
>> 5 0 0 Not_testing
>> Selective self-test flags (0x0):
>> After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>> delay.
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3315 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080821/32322fe3/smime.bin
More information about the freebsd-stable
mailing list