cvs commit: src/etc Makefile sensorsd.conf src/etc/defaults
rc.conf src/etc/rc.d Makefile sensorsd src/lib/libc/gen
sysctl.3 src/sbin/sysctl sysctl.8 sysctl.c src/share/man/man5
rc.conf.5 src/share/man/man9 Makefile sensor_attach.9 src/sys/conf f
John Baldwin
jhb at freebsd.org
Wed Oct 17 06:42:50 PDT 2007
On Tuesday 16 October 2007 06:14:34 pm Constantine A. Murenin wrote:
> On 16/10/2007 17:01, John Baldwin wrote:
>
> > On Monday 15 October 2007 10:57:48 pm Constantine A. Murenin wrote:
> >
> >>On 15/10/2007, John Baldwin <jhb at freebsd.org> wrote:
> >>
> >>>On Monday 15 October 2007 09:43:21 am Alexander Leidinger wrote:
> >>>
> >>>>Quoting Scott Long <scottl at samsco.org> (from Mon, 15 Oct 2007
> >>>
> >>>01:47:59 -0600):
> >>>
> >>>>>Alexander Leidinger wrote:
> >>>>>
> >>>>>>Quoting Poul-Henning Kamp <phk at phk.freebsd.dk> (from Sun, 14 Oct
> >>>>>>2007 17:54:21 +0000):
> >>>>
> >>>>>>>listen to the various mumblings about putting RAID-controller status
> >>>>>>>under sensors framework.
> >>>>>>
> >>>>>>What's wrong with this? Currently each RAID driver has to come up
> >>>>>>with his own way of displaying the RAID status. It's like saying
> >>>>>>that each network driver has to implement/display the stuff you can
> >>>>>> see with ifconfig in its own way, instead of using the proper
> >>>>>>network driver interface for this.
> >>>>>>
> >>>>>
> >>>>>For the love of God, please don't use RAID as an example to support
> >
> > your
> >
> >>>>>argument for the sensord framework. Representing RAID state is
> >
> > several
> >
> >>>>>orders of magnitude more involved than representing network state.
> >>>>>There are also landmines in the OpenBSD bits of RAID support that are
> >>>>>best left out of FreeBSD, unless you like alienating vendors and
> >
> > risking
> >
> >>>>>legal action. Leave it alone. Please. I don't care what you do with
> >>>>>lmsensors or cpu power settings or whatever. Leave RAID out of it.
> >>>>
> >>>>Talking about RAID status is not talking about alienating vendors. I
> >>>>don't talk about alienating vendors and I don't intent to do. You may
> >>>>not be able to display a full blown RAID status with the sensors
> >>>>framework, but it allows for a generic "wors/works not" or
> >>>>"OK/degraded" status display in drivers we have the source for. This
> >>>>is enough for status monitoring (e.g., nagios).
> >>>
> >>>As I mentioned in the thread on arch@ where people brought up objections
> >
> > that
> >
> >>>were apparently completely ignored, this is far from useful for RAID
> >>>monitoring. For example, if my RAID is down, which disk do I need to
> >>>replace? Again, all this was covered earlier and (apparently) ignored.
> >>>Also, what strikes me as odd is that I didn't see this patch posted again
> >
> > for
> >
> >>>review this time around before it was committed.
> >>
> >>This has been addressed back in July. You'd use bioctl to see which
> >>exact disc needs to be replaced. Sensorsd is intended for an initial
> >>alert about something being wrong.
> >
> >
> > In July you actually said you weren't sure about bioctl(8). :) But also, this
> > model really isn't very sufficient since it doesn't handle things like drives
> > going away, etc. You really need to maintain a decent amount of state to
> > keep all that, and this is far easier done in userland rather than in the
> > kernel. However, you can choose to ignore real-world experience if you
> > choose.
> >
> > Basically, by having so little data in hw.sensors if I had to write a RAID
> > monitoring daemon I would just not use hw.sensors since it's easier for me to
> > figure out the simple status myself based on the other state I already have
> > to track (unless you write an event-driven daemon based on messages posted by
> > the firmware in which case again you wouldn't use hw.sensors for that either).
>
> There is no other daemon that you'd need, you'd simply use sensorsd for
> this. You could write a script that would be executed by sensorsd if a
> certain logical disc drive sensor changes state, and then this script
> would call the bio framework and give you additional details on why the
> state was changed.
That's actually not quite good enough as, for example, I want to keep yelling
about a busted volume on a periodic basis until its fixed. Also, having a volume
change state doesn't tell me if a drive was pulled. On at least one RAID
controller firmware I am familiar with, the only way you can figure this out is
to keep track of which drives are currently present with a generation count and
use that to determine when a drive goes away. Even my monitoring daemon for
ata-raid has to do this since the ata(4) driver just detaches and removes a drive
when it fails and you have no way to figure out which drive died as the kernel
thinks that drive no longer exists.
--
John Baldwin
More information about the cvs-src
mailing list