adding BBU relearn support to mfiutil

David Gwynne david at gwynne.id.au
Thu Nov 7 02:02:16 UTC 2013


On 7 Nov 2013, at 9:03 am, Mark Johnston <markj at FreeBSD.org> wrote:

> On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote:
>> Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4 
>> (we extracted r250483 and r250497 from stable/8 and applied to 
>> releng/8.4).  I'm seeing some results that make me question whether or 
>> not caching is really working correctly after a BBU relearn operation 
>> has completed -- or maybe whether or not the new BBU patch is talking to 
>> LSI controller properly.
>> 
>> Our test system had a BBU in the failed state (relearn needed).  We used 
>> the "start learn command" and it seemed to go well, but strangely, when 
>> process is seems to have completed, and now several days later, status 
>> is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery").  
>> This may be entirely normal -- maybe it says that because the autolearn 
>> feature is now enabled?
> 
> I suspect that the status is bogus and that the battery is in fact dead.
> There seem to be a few firmware bugs in the BBU status reporting, at
> least with iBBU07. In your output below, I see:
> 
>        Design Capacity: 1215 mAh
>   Full Charge Capacity: 65262 mAh
>       Current Capacity: 61543 mAh
> 
> which clearly isn't right. I've seen this problem before as well: over
> time, the full charge capacity decreases, and eventually it seems to
> wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports
> exactly the same thing, so it's a problem with the controller firmware.
> If you look at MegaCli output you get things like "Absolute charge: 6000%".
> So I suspect that the status is incorrect as well; when I've run into
> this problem, I still see "status: normal".
> 

ive been staring at bbus on dell perc5s and perc6s recently after we had a bunch of bbus get too old.

i havent seen the full charge or current capacity values wrap, but what i did figure out is that the write cache wont be enabled if the SOH flag is set in whats reported by the BBU STATE response. the SOH flag seems to either be based on whether the firmware thinks the battery will last a reasonable amount of time (like 72h or something), or whether the bbu full capacity is above 30% of its design capacity.

either way, the reality is that batteries degrade and need to be replaced. the nearly four year old battery that has gone through 120 learn cycles in your output below is what i call a good candidate for replacement.

later megaraid firmwares (well, firmwares on later megaraids) have more status bits that clearly indicate whether the firmware wants you to replace the battery. it takes an annoying amount of interpretation on the older ones.

dlg

>> 
>> The "cache" status command also suggests also is a bit strange. Here is 
>> the raw output of these status commands:
>> 
>> # mfiutil cache mfid0
>> mfi0 volume mfid0 cache settings:
>>              I/O caching: disabled
>>            write caching: write-back
>> write cache with bad BBU: disabled
>>               read ahead: adaptive
>>        drive write cache: enabled
>> Cache disabled due to dead battery or ongoing battery relearn
>> 
>> 
>> # ./mfiutil show battery
>> mfi0: Battery State:
>>      Manufacture Date: 3/18/2010
>>         Serial Number: 77
>>          Manufacturer: LS1111001A
>>                 Model: 3598501
>>             Chemistry: LION
>>       Design Capacity: 1215 mAh
>>  Full Charge Capacity: 65262 mAh
>>      Current Capacity: 61543 mAh
>>         Charge Cycles: 120
>>        Current Charge: 94%
>>        Design Voltage: 3700 mV
>>       Current Voltage: 4081 mV
>>           Temperature: 23 C
>>      Autolearn period: 30 days
>>       Next learn time: Tue Nov 26 20:06:40 2013
>>  Learn delay interval: 0 hours
>>        Autolearn mode: enabled
>>                Status: LEARN_CYCLE_REQUESTED
>> 
>> 
>> /Why does cache status now say  "Cache disabled due to dead battery or 
>> ongoing battery relearn"/?  Shouldn't this no longer be the case since 
>> I've run the "learn" operation?  Does this indicate that the I/O caching 
>> is really disabled?
> 
> I believe so. You can try changing the write caching policy to write-back
> with bad BBU and see if that re-enables the cache. If it does, that's
> more evidence that the BBU is dead and needs to be replaced.
> 
>> 
>> I'd appreciate any and all assistance.  Here's a bit of other info that 
>> might be of interest:
>> 
>> # mfiutil show adapter
>> mfi0 Adapter:
>>     Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2
>>    Serial Number:
>>         Firmware: 11.0.1-0036
>>      RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
>>   Battery Backup: present
>>            NVRAM: 32K
>>   Onboard Memory: 512M
>>   Minimum Stripe: 8k
>>   Maximum Stripe: 1M
>> 
>> # mfiutil show drives
>> mfi0 Physical Drives:
>>  1 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005JE> SAS E1:S0
>>  2 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005JV> SAS E1:S1
>>  3 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005KD> SAS E1:S4
>>  4 (  136G) ONLINE    <SEAGATE ST9146852SS 0005 serial=6TB005BQ> SAS E1:S2
>>  5 (  136G) HOT SPARE <SEAGATE ST9146852SS 0005 serial=6TB005FJ> SAS E1:S3
>> 
>> The storage volume is 4-drives, RAID10.  System has 16GB RAM, dual Xeon 
>> E5530 CPUs, on an Intel S5520UR motherboard.
> 
> It might be useful to check the output of "mfiutil show events -c info".
> 
>> 
>> Thanks!
>> 
>> Charles Owens
>> Great Bay Software
>> 
>> 
>> 
>> On Fri Apr 5 20:08:09 2013, Mark Johnston wrote:
>>> 
>>> On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote:
>>>> 
>>>> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote:
>>>>> 
>>>>> Hi Everyone,
>>>>> 
>>>>> I recently needed to add a couple of features to mfiutil related to BBU
>>>>> relearning. I've pasted a patch below which
>>>>> 
>>>>> 1. adds extra fields to the output of "mfiutil show battery" showing BBU
>>>>> properties. This is essentially the output of
>>>>> 
>>>>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL
>>>>> 
>>>>> and consists of info about battery learning: the learn period, the
>>>>> time at which the controller will start the next relearn, and the BBU
>>>>> mode (which indicates whether the battery supports transparent
>>>>> relearning).
>>>>> 
>>>>> 2. adds a couple of subcommands under "mfiutil bbu" which lets users set
>>>>> the BBU properties which can be set by MegaCli.
>>>>> 
>>>>> 3. adds a command "mfiutil start learn" which immediately kicks off a
>>>>> battery relearn.
>>>>> 
>>>>> These changes grew out of concern about the fact that the controller
>>>>> write cache is set to write-through mode during a relearn period (which
>>>>> usually lasts for several hours). This ended up causing some mysterious
>>>>> and intermittent performance issues, so I needed a way of getting more
>>>>> info about what was going on (using MegaCli isn't really an option for
>>>>> several reasons). Some BBUs support transparent relearning, which
>>>>> basically means that the controller write cache doesn't get turned off
>>>>> during a relearn. However, LSI's default config doesn't enable it, and
>>>>> now mfiutil can be used to do that (through "mfiutil bbu bbu-mode").
>>>>> 
>>>>> I was hoping someone would be able to review the patch. If anyone's able
>>>>> and willing to test it, I'd very much appreciate feedback from that.
>>>>> 
>>>>> Thanks!
>>>>> -Mark
>>>> 
>>>> 
>>>> Just to document for the record. Finally got around to testing this
>>>> today with Mark providing updates. Looks good overall with a couple of
>>>> nits that he is handling at the moment (man page and variable name
>>>> collision).
>>> 
>>> 
>>> The updated patch is here:
>>> http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff
>>> 
>>> I'll commit it in a few days if there aren't any problems.
>>> 
>>> Thanks,
>>> -Mark
>>> _______________________________________________
>>> freebsd-scsi at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
>>> 
>>> 
>>> 
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"



More information about the freebsd-scsi mailing list