[RFC] [patch] periodic status-zfs: list pools in daily emails
Jeremy Chadwick
freebsd at jdc.parodius.com
Wed Jun 29 11:19:18 UTC 2011
On Wed, Jun 29, 2011 at 06:37:32AM -0400, Glen Barber wrote:
> Hi Alexander,
>
> On 6/29/11 4:46 AM, Alexander Leidinger wrote:
> >> I added a default behavior to list the pools on the system, in
> >> addition to
> >> checking if the pool is healthy. I think it might be useful for
> >> others to
> >> have this as the default behavior, for example on systems where dedup is
> >> enabled to track the dedup statistics over time.
> >
> > I do not think this is a bad idea to be able to see the pools... but
> > IMHO it should be configurable (no strong opinion about "enabled or
> > disabled by default").
> >
>
> Agreed. I can add this in.
>
> >> The output of the the script after my changes follows:
> >
> > Info to others: this is the default output, there is no special option
> > to track DEDUP.
> >
> >> Checking status of zfs pools:
> >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> >> zroot 456G 147G 309G 32% 1.00x ONLINE -
> >> zstore 928G 258G 670G 27% 1.00x ONLINE -
> >> all pools are healthy
> >>
> >> Feedback would be appreciated. A diff is attached.
> >
> > Did you test it with an unhealthy pool? If yes, how does the result look
> > like?
> >
>
> I have not, yet. I can do this later today by breaking a mirror.
>
> > For the healthy case we have redundant info (but as the brain is good at
> > pattern matching, I would object to replace the status with the list
> > output, in case someone would suggest this). In the unhealthy case we
> > will surely have more info, my inquiry about it is if an empty line
> > between the list and the status would make it more readable or not.
> >
>
> I will reply later today with of the script with an unhealthy pool, and
> will make listing the pools configurable. I imagine an empty line would
> certainly make it more readable in either case. I would be reluctant to
> replace 'status' output with 'list' output for healthy pools mostly to
> avoid headaches for people parsing their daily email, specifically
> looking for (or missing) 'all pools are healthy.'
At my workplace we use a heavily modified version of Netsaint, with bits
and pieces Nagios-like created. I happened to write the perl code used
to monitor our production Solaris systems (~2000+ servers) for ZFS pool
status. It parses "zpool status -x" output, monitoring read, write, and
checksum errors per pool, vdev, and device, in addition to general pool
status. I tested too many conditions, not to mention had to deal with
parsing pains as a result of ZFS code changes, plus supporting
completely different revisions of Solaris 10 in production. And before
someone asks: no, I cannot provide the source (employee agreements, LCA,
etc...). I did have to dig through ZFS source code to figure out a
bunch of necessary bits too, so don't be surprised if you have to too.
My recommendation: just look for pools which are in any state other than
ONLINE (don't try to be smart with an OR regex looking for all the
combos; it doesn't scale when ZFS changes), and you should also handle
situations where a device is currently undergoing manual or automatic
device replacement (specifically regex '^[\t\s]+replacing\s+DEGRADED'),
which will be important to people who keep spares in pools. This might
be difficult with just standard BSD sh, but BSD awk should be able to
handle this.
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-fs
mailing list