Stable SATA pci card for FreeBSD 6.x/7.0

Sebastiaan van Erk sebster at sebster.com
Thu Aug 21 07:49:25 UTC 2008


Hi,

Cian Hughes wrote:
 > Sebastiaan,
 > Have you tried connecting your 250GB drives to the troublesome
 > controller? If so, does "stressing" them cause the system to panic?
 >
 > ~Cian Hughes

Thanks for you reply.

I have not tried stress-testing the 250GB drives on the troublesome 
controller. The problem with those drives is, that even though they are 
mirrored, the data is very important to me and I do not want it to get 
corrupted. I do have backups of course, but the problem with data 
corruption is that it often takes very long to notice...

I was thinking of buying the Promise SATA300 TX4 PCI Controller. I've 
searched on google, and I do see some negative posts on them in 
combination with FreeBSD, however they all date back at least 2 years...

Does anybody have positive/negative experiences using this card?

Regards,
Sebastiaan


> --
> University of Bristol Medical School
> 
> On 14 Aug 2008, at 10:37, Sebastiaan van Erk wrote:
> 
>> Thanks Jonathan,
>>
>> I'm starting to expect it has to be the controller as well. About 20 
>> minutes after I posted this message yesterday (and thus 20 minutes 
>> after ad6 got disconnected - atacontrol list showed "no device 
>> present" for it) the machine crashed while writing to the remaining 
>> ad4 drive (kernel panic). I attached the logs below. I also ran the 
>> long smart self test on both drives, and no errors were found on 
>> either drive (logs also attached).
>>
>> Unfortunately I could not attach the new disks to my mainboard SATA 
>> because my mainboard SATA somehow hangs trying to detect them. So I 
>> cannot test if *not* using the controller is going to solve the 
>> problems, though I'm it seems logical at the moment it has to be the 
>> controller, especially if other people have had similar issues.
>>
>> I guess I'll be buying another controller.
>>
>> Regards,
>> Sebastiaan
>>
>> Jonathan Groll wrote:
>>> On Wed, Aug 13, 2008 at 03:10:56PM +0200, Sebastiaan van Erk wrote:
>>>> Hi,
>>>>
>>>> Just an update on this issue.
>>>>
>>>> Quick summary: I fixed the BIOS issues, the hardware monitor issues, 
>>>> and the rl0/rl1 watchdog timeout issues (it seems). However I'm 
>>>> still having problems with my SATA drives (or at least one of them). 
>>>> More info below.
>>>>
>>>> BIOS:
>>>> I flashed my BIOS to the latest version about a year ago, and never 
>>>> noticed that there was any problem, but it turns out there was. I 
>>>> never reset the BIOS to default factory settings after the upgrade, 
>>>> and it seems the settings were corrupt. After having reset the BIOS 
>>>> to the "default optimized factory settings" it stopped crashing when 
>>>> I go into the H/W monitor and also when using healthd -d (output below):
>>>>
>>>> Temp.= 40.0, 36.0, 66.0; Rot.=    0,    0,    0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.=    0,    0,    0
>>>> Vcore = 1.44, 3.14; Volt. = 3.33, 4.97,  1.95,  -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.=    0,    0,    0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 4.97,  1.95,  -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.=    0,    0,    0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
>>>> Temp.= 40.0, 36.0, 66.0; Rot.=    0,    0,    0
>>>> Vcore = 1.44, 3.12; Volt. = 3.34, 5.00,  1.95,  -0.11, -1.54
>>>>
>>>> This also seems to have fixed the rl0 watchdog timeout problems. I 
>>>> no longer see those in my logs.
>>>>
>>>> SATA DRIVES:
>>>>
>>>> I'm still having problems with the SATA drives.
>>>>
>>>> I tried connecting the 1TB Samsung drives to my mainboard, but then 
>>>> the box hangs when booting with the "Detecting IDE drives" message. 
>>>> The regular (PATA) IDE drives are detected first, and then it 
>>>> repeats the "Detecting IDE drives" message to detect the sata 
>>>> drives, and hangs. When I connect my 250GB SATA drives to my 
>>>> mainboard they detect fine, and the box boots normally.
>>>>
>>>> I did another rsync of my old mirror (the 250GB disks) to the new 
>>>> mirror (1TB disks), but again one of the disks got detached. This 
>>>> time there are no other messages in the log, the only thing I see is 
>>>> the following:
>>>>
>>>> Aug 13 14:35:27 piglet su: sebster to root on /dev/ttyp5
>>>> Aug 13 14:55:38 piglet kernel: ad6: FAILURE - device detached
>>>> Aug 13 14:55:38 piglet kernel: subdisk6: detached
>>>> Aug 13 14:55:38 piglet kernel: ad6: detached
>>>> Aug 13 14:55:38 piglet kernel: GEOM_MIRROR: Device gm1: provider ad6 
>>>> disconnected.
>>>> Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to 
>>>> size>100K
>>>>
>>>> (unfortunate that the log file just got rotated, but in the new log 
>>>> file there is nothing execpt the one expected line:
>>>>
>>>> Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to 
>>>> size>100K
>>>>
>>>> So, nothing after the disconnect...
>>>>
>>>> The questions I have now is:
>>>> 1) Could an upgrade to FreeBSD 7-STABLE fix the issue (it's a LOT of 
>>>> work for me, but I'll do it if there are SATA driver issues fixed).
>>> I suspect the problem may be the SiI driver in Freebsd. As a reference
>>> point, I've had a similar problem, even on 7-STABLE, but with sparc64
>>> hardware (see earlier post in this thread).
>>> It'll probably be simplest for you to just buy another controller of
>>> another brand. On the other hand, it'll be worth knowing exactly what
>>> is wrong with the SiI driver...
>>> Cheers,
>>> Jonathan
>> Aug 13 15:00:00 piglet newsyslog[1800]: logfile turned over due to 
>> size>100K
>> Aug 13 15:11:26 piglet su: sebster to root on /dev/ttyp4
>> Aug 13 15:34:55 piglet kernel: 
>> mirror/gm1s1e[WRITE(offset=875450693632, length=2048)]error = 6
>> Aug 13 15:34:55 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450695680, 
>> length=2048)]error = 6
>>
>> [snip 335750 similar lines]
>>
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450931200, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450933248, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450935296, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450937344, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450939392, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450941440, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450943488, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450945536, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450947584, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450949632, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450951680, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450953728, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450955776, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450957824, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450959872, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450961920, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450963968, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450966016, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450968064, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450970112, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450972160, 
>> length=2048)]error = 6
>> Aug 13 15:36:30 piglet kernel: 
>> g_vfs_done():mirror/gm1s1e[WRITE(offset=875450974208, 
>> length=2048)]error = 6
>> Aug 13 15:42:23 piglet syslogd: kernel boot file is /boot/kernel/kernel
>> Aug 13 15:42:23 piglet kernel: Copyright (c) 1992-2008 The FreeBSD 
>> Project.
>> smartctl version 5.38 [i386-portbld-freebsd6.3] Copyright (C) 2002-8 
>> Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     SAMSUNG HD103UJ
>> Serial Number:    S13PJ1BQ606865
>> Firmware Version: 1AA01112
>> User Capacity:    1,000,204,886,016 bytes
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   8
>> ATA Standard is:  ATA-8-ACS revision 3b
>> Local Time is:    Thu Aug 14 11:28:13 2008 CEST
>>
>> ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual 
>> for details.
>>
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status:  (0x02) Offline data collection activity
>> was completed without error.
>> Auto Offline Data Collection: Disabled.
>> Self-test execution status:      (   0) The previous self-test routine 
>> completed
>> without error or no self-test has ever
>> been run.
>> Total time to complete Offline
>> data collection: (11811) seconds.
>> Offline data collection
>> capabilities: (0x7b) SMART execute Offline immediate.
>> Auto Offline data collection on/off support.
>> Suspend Offline collection upon new
>> command.
>> Offline surface scan supported.
>> Self-test supported.
>> Conveyance Self-test supported.
>> Selective Self-test supported.
>> SMART capabilities:            (0x0003) Saves SMART data before entering
>> power-saving mode.
>> Supports SMART auto save timer.
>> Error logging capability:        (0x01) Error logging supported.
>> General Purpose Logging supported.
>> Short self-test routine
>> recommended polling time: (   2) minutes.
>> Extended self-test routine
>> recommended polling time: ( 198) minutes.
>> Conveyance self-test routine
>> recommended polling time: (  21) minutes.
>> SCT capabilities:       (0x003f) SCT Status supported.
>> SCT Feature Control supported.
>> SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
>>      UPDATED  WHEN_FAILED RAW_VALUE
>>  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail 
>>  Always       -       0
>>  3 Spin_Up_Time            0x0007   076   076   011    Pre-fail 
>>  Always       -       8010
>>  4 Start_Stop_Count        0x0032   100   100   000    Old_age 
>>   Always       -       8
>>  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>>  Always       -       0
>>  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail 
>>  Always       -       0
>>  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail 
>>  Offline      -       10255
>>  9 Power_On_Hours          0x0032   100   100   000    Old_age 
>>   Always       -       272
>> 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail 
>>  Always       -       0
>> 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age 
>>   Always       -       0
>> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
>>   Always       -       8
>> 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age 
>>   Always       -       0
>> 183 Unknown_Attribute       0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 184 Unknown_Attribute       0x0033   100   100   099    Pre-fail 
>>  Always       -       0
>> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 188 Unknown_Attribute       0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 190 Airflow_Temperature_Cel 0x0022   057   052   000    Old_age 
>>   Always       -       43 (Lifetime Min/Max 43/48)
>> 194 Temperature_Celsius     0x0022   056   050   000    Old_age 
>>   Always       -       44 (Lifetime Min/Max 43/50)
>> 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age 
>>   Always       -       195799724
>> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age 
>>   Always       -       0
>> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age 
>>   Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age 
>>   Always       -       0
>> 200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age 
>>   Always       -       0
>> 201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age 
>>   Always       -       0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 0
>> Warning: ATA Specification requires self-test log structure revision 
>> number = 1
>> Num  Test_Description    Status                  Remaining 
>>  LifeTime(hours)  LBA_of_first_error
>> # 1  Offline             Completed without error       00%       261 
>>         -
>> # 2  Offline             Aborted by host               40%       251 
>>         -
>> # 3  Short offline       Aborted by host               00%       250 
>>         -
>>
>> SMART Selective Self-Test Log Data Structure Revision Number (0) 
>> should be 1
>> SMART Selective self-test log data structure revision number 0
>> Warning: ATA Specification requires selective self-test log data 
>> structure revision number = 1
>> SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>    1        0        0  Not_testing
>>    2        0        0  Not_testing
>>    3        0        0  Not_testing
>>    4        0        0  Not_testing
>>    5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>  After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute 
>> delay.
>>
>> smartctl version 5.38 [i386-portbld-freebsd6.3] Copyright (C) 2002-8 
>> Bruce Allen
>> Home page is http://smartmontools.sourceforge.net/
>>
>> === START OF INFORMATION SECTION ===
>> Device Model:     SAMSUNG HD103UJ
>> Serial Number:    S13PJ1BQ607102
>> Firmware Version: 1AA01112
>> User Capacity:    1,000,204,886,016 bytes
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   8
>> ATA Standard is:  ATA-8-ACS revision 3b
>> Local Time is:    Thu Aug 14 11:28:39 2008 CEST
>>
>> ==> WARNING: May need -F samsung or -F samsung2 enabled; see manual 
>> for details.
>>
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status:  (0x02) Offline data collection activity
>> was completed without error.
>> Auto Offline Data Collection: Disabled.
>> Self-test execution status:      (   0) The previous self-test routine 
>> completed
>> without error or no self-test has ever
>> been run.
>> Total time to complete Offline
>> data collection: (12131) seconds.
>> Offline data collection
>> capabilities: (0x7b) SMART execute Offline immediate.
>> Auto Offline data collection on/off support.
>> Suspend Offline collection upon new
>> command.
>> Offline surface scan supported.
>> Self-test supported.
>> Conveyance Self-test supported.
>> Selective Self-test supported.
>> SMART capabilities:            (0x0003) Saves SMART data before entering
>> power-saving mode.
>> Supports SMART auto save timer.
>> Error logging capability:        (0x01) Error logging supported.
>> General Purpose Logging supported.
>> Short self-test routine
>> recommended polling time: (   2) minutes.
>> Extended self-test routine
>> recommended polling time: ( 203) minutes.
>> Conveyance self-test routine
>> recommended polling time: (  22) minutes.
>> SCT capabilities:       (0x003f) SCT Status supported.
>> SCT Feature Control supported.
>> SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 16
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE 
>>      UPDATED  WHEN_FAILED RAW_VALUE
>>  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail 
>>  Always       -       0
>>  3 Spin_Up_Time            0x0007   077   077   011    Pre-fail 
>>  Always       -       7810
>>  4 Start_Stop_Count        0x0032   100   100   000    Old_age 
>>   Always       -       10
>>  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
>>  Always       -       0
>>  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail 
>>  Always       -       0
>>  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail 
>>  Offline      -       9978
>>  9 Power_On_Hours          0x0032   100   100   000    Old_age 
>>   Always       -       272
>> 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail 
>>  Always       -       0
>> 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age 
>>   Always       -       0
>> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age 
>>   Always       -       10
>> 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age 
>>   Always       -       0
>> 183 Unknown_Attribute       0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 184 Unknown_Attribute       0x0033   100   100   099    Pre-fail 
>>  Always       -       0
>> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 188 Unknown_Attribute       0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 190 Airflow_Temperature_Cel 0x0022   059   054   000    Old_age 
>>   Always       -       41 (Lifetime Min/Max 41/46)
>> 194 Temperature_Celsius     0x0022   058   053   000    Old_age 
>>   Always       -       42 (Lifetime Min/Max 41/47)
>> 195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age 
>>   Always       -       31616
>> 196 Reallocated_Event_Count 0x0032   100   100   000    Old_age 
>>   Always       -       0
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age 
>>   Always       -       0
>> 198 Offline_Uncorrectable   0x0030   100   100   000    Old_age 
>>   Offline      -       0
>> 199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age 
>>   Always       -       0
>> 200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age 
>>   Always       -       0
>> 201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age 
>>   Always       -       0
>>
>> SMART Error Log Version: 1
>> No Errors Logged
>>
>> SMART Self-test log structure revision number 0
>> Warning: ATA Specification requires self-test log structure revision 
>> number = 1
>> Num  Test_Description    Status                  Remaining 
>>  LifeTime(hours)  LBA_of_first_error
>> # 1  Offline             Completed without error       00%       261 
>>         -
>> # 2  Offline             Aborted by host               40%       251 
>>         -
>> # 3  Short offline       Aborted by host               00%       250 
>>         -
>>
>> SMART Selective Self-Test Log Data Structure Revision Number (0) 
>> should be 1
>> SMART Selective self-test log data structure revision number 0
>> Warning: ATA Specification requires selective self-test log data 
>> structure revision number = 1
>> SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>    1        0        0  Not_testing
>>    2        0        0  Not_testing
>>    3        0        0  Not_testing
>>    4        0        0  Not_testing
>>    5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>  After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute 
>> delay.
>>
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3315 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20080821/32322fe3/smime.bin


More information about the freebsd-stable mailing list