RFC: GEOM and hard disk LEDs

From: Alan Somers <asomers_at_freebsd.org>
Date: Tue, 07 Feb 2023 17:06:15 UTC
Most modern SES backplanes have two LEDs per hard disk.  There's a
"fault" LED and a "locate" LED.  You can control either one with
sesutil(8) or, with a little more work, sg_ses from
sysutils/sg3_utils.  They're very handy for tasks like replacing a
failed disk, especially in large enclosures.  However, there isn't any
way to automatically control them.  It would be very convenient if,
for example, zfsd(8) could do it.  Basically, it would just set the
fault LED for any disk that has been kicked out of a ZFS pool, and
clear it for any disk that is healthy or is being resilvered.  But
zfsd does not do that.  Instead, users' only options are to write a
custom daemon or to use sesutil by hand.  Instead of forcing all of us
to write our own custom daemons, why not train zfsd to do it?

My proposal is to add boolean GEOM attributes for "fault" and
"locate".  A userspace program would be able to look up their values
for any geom with DIOCGATTR.  Setting them would require a new ioctl
(DIOCSATTR?).  The disk class would issue a ENCIOC_SETELMSTAT to
actually change the LEDs whenever this attribute changes.  GEOM
transforms such as geli would simply pass the attribute through to
lower layers.  Many-to-one transforms like gmultipath would pass the
attribute through to all lower layers.  zfsd could then set all vdevs'
fault attributes when it starts up, and adjust individual disk's as
appropriate on an event-driven basis.

Questions:

* Are there any obvious flaws in this plan, any reasons why GEOM
attributes can't be used this way?

* For one-to-many transforms like gpart the correct behavior is less
clear: what if a disk has two partitions in two different pools, and
one of them is healthy but the other isn't?

* Besides ZFS, are there any other systems that could take advantage?

* SATA enclosures uses SGPIO instead of SES.  SGPIO is too limited,
IMHO, to be of almost any use at all.  I suggest not even trying to
make it work with this scheme.