adding BBU relearn support to mfiutil
Mark Johnston
markj at freebsd.org
Wed Nov 6 22:04:14 UTC 2013
On Wed, Nov 06, 2013 at 12:01:55PM -0500, Charles Owens wrote:
> Hi, we've been playing with this patch in the context of 8.4-RELEASE-p4
> (we extracted r250483 and r250497 from stable/8 and applied to
> releng/8.4). I'm seeing some results that make me question whether or
> not caching is really working correctly after a BBU relearn operation
> has completed -- or maybe whether or not the new BBU patch is talking to
> LSI controller properly.
>
> Our test system had a BBU in the failed state (relearn needed). We used
> the "start learn command" and it seemed to go well, but strangely, when
> process is seems to have completed, and now several days later, status
> is still LEARN_CYCLE_REQUESTED (as seen with "mfiutil show battery").
> This may be entirely normal -- maybe it says that because the autolearn
> feature is now enabled?
I suspect that the status is bogus and that the battery is in fact dead.
There seem to be a few firmware bugs in the BBU status reporting, at
least with iBBU07. In your output below, I see:
Design Capacity: 1215 mAh
Full Charge Capacity: 65262 mAh
Current Capacity: 61543 mAh
which clearly isn't right. I've seen this problem before as well: over
time, the full charge capacity decreases, and eventually it seems to
wrap around to 65535. MegaCli (LSI's binary RAID management tool) reports
exactly the same thing, so it's a problem with the controller firmware.
If you look at MegaCli output you get things like "Absolute charge: 6000%".
So I suspect that the status is incorrect as well; when I've run into
this problem, I still see "status: normal".
>
> The "cache" status command also suggests also is a bit strange. Here is
> the raw output of these status commands:
>
> # mfiutil cache mfid0
> mfi0 volume mfid0 cache settings:
> I/O caching: disabled
> write caching: write-back
> write cache with bad BBU: disabled
> read ahead: adaptive
> drive write cache: enabled
> Cache disabled due to dead battery or ongoing battery relearn
>
>
> # ./mfiutil show battery
> mfi0: Battery State:
> Manufacture Date: 3/18/2010
> Serial Number: 77
> Manufacturer: LS1111001A
> Model: 3598501
> Chemistry: LION
> Design Capacity: 1215 mAh
> Full Charge Capacity: 65262 mAh
> Current Capacity: 61543 mAh
> Charge Cycles: 120
> Current Charge: 94%
> Design Voltage: 3700 mV
> Current Voltage: 4081 mV
> Temperature: 23 C
> Autolearn period: 30 days
> Next learn time: Tue Nov 26 20:06:40 2013
> Learn delay interval: 0 hours
> Autolearn mode: enabled
> Status: LEARN_CYCLE_REQUESTED
>
>
> /Why does cache status now say "Cache disabled due to dead battery or
> ongoing battery relearn"/? Shouldn't this no longer be the case since
> I've run the "learn" operation? Does this indicate that the I/O caching
> is really disabled?
I believe so. You can try changing the write caching policy to write-back
with bad BBU and see if that re-enables the cache. If it does, that's
more evidence that the BBU is dead and needs to be replaced.
>
> I'd appreciate any and all assistance. Here's a bit of other info that
> might be of interest:
>
> # mfiutil show adapter
> mfi0 Adapter:
> Product Name: Integrated Intel(R) RAID Controller SROMBSASMP2
> Serial Number:
> Firmware: 11.0.1-0036
> RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
> Battery Backup: present
> NVRAM: 32K
> Onboard Memory: 512M
> Minimum Stripe: 8k
> Maximum Stripe: 1M
>
> # mfiutil show drives
> mfi0 Physical Drives:
> 1 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005JE> SAS E1:S0
> 2 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005JV> SAS E1:S1
> 3 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005KD> SAS E1:S4
> 4 ( 136G) ONLINE <SEAGATE ST9146852SS 0005 serial=6TB005BQ> SAS E1:S2
> 5 ( 136G) HOT SPARE <SEAGATE ST9146852SS 0005 serial=6TB005FJ> SAS E1:S3
>
> The storage volume is 4-drives, RAID10. System has 16GB RAM, dual Xeon
> E5530 CPUs, on an Intel S5520UR motherboard.
It might be useful to check the output of "mfiutil show events -c info".
>
> Thanks!
>
> Charles Owens
> Great Bay Software
>
>
>
> On Fri Apr 5 20:08:09 2013, Mark Johnston wrote:
> >
> > On Fri, Apr 05, 2013 at 02:22:36PM -0700, Sean Bruno wrote:
> >>
> >> On Sun, 2013-03-03 at 22:38 -0500, Mark Johnston wrote:
> >>>
> >>> Hi Everyone,
> >>>
> >>> I recently needed to add a couple of features to mfiutil related to BBU
> >>> relearning. I've pasted a patch below which
> >>>
> >>> 1. adds extra fields to the output of "mfiutil show battery" showing BBU
> >>> properties. This is essentially the output of
> >>>
> >>> # MegaCli -AdpBbuInfo -GetBbuProperties -aLL
> >>>
> >>> and consists of info about battery learning: the learn period, the
> >>> time at which the controller will start the next relearn, and the BBU
> >>> mode (which indicates whether the battery supports transparent
> >>> relearning).
> >>>
> >>> 2. adds a couple of subcommands under "mfiutil bbu" which lets users set
> >>> the BBU properties which can be set by MegaCli.
> >>>
> >>> 3. adds a command "mfiutil start learn" which immediately kicks off a
> >>> battery relearn.
> >>>
> >>> These changes grew out of concern about the fact that the controller
> >>> write cache is set to write-through mode during a relearn period (which
> >>> usually lasts for several hours). This ended up causing some mysterious
> >>> and intermittent performance issues, so I needed a way of getting more
> >>> info about what was going on (using MegaCli isn't really an option for
> >>> several reasons). Some BBUs support transparent relearning, which
> >>> basically means that the controller write cache doesn't get turned off
> >>> during a relearn. However, LSI's default config doesn't enable it, and
> >>> now mfiutil can be used to do that (through "mfiutil bbu bbu-mode").
> >>>
> >>> I was hoping someone would be able to review the patch. If anyone's able
> >>> and willing to test it, I'd very much appreciate feedback from that.
> >>>
> >>> Thanks!
> >>> -Mark
> >>
> >>
> >> Just to document for the record. Finally got around to testing this
> >> today with Mark providing updates. Looks good overall with a couple of
> >> nits that he is handling at the moment (man page and variable name
> >> collision).
> >
> >
> > The updated patch is here:
> > http://people.freebsd.org/~markj/patches/20130405-mfi-bbu.diff
> >
> > I'll commit it in a few days if there aren't any problems.
> >
> > Thanks,
> > -Mark
> > _______________________________________________
> > freebsd-scsi at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"
> >
> >
> >
More information about the freebsd-scsi
mailing list