docs/159897: [patch] improve HAST section of Handbook

Sun Aug 21 02:00:29 UTC 2011

The following reply was made to PR docs/159897; it has been noted by GNATS.

From: Benjamin Kaduk <kaduk at MIT.EDU>
To: Warren Block <wblock at wonkity.com>
Cc: freebsd-gnats-submit at freebsd.org, freebsd-doc at freebsd.org
Subject: Re: docs/159897: [patch] improve HAST section of Handbook
Date: Sat, 20 Aug 2011 21:56:06 -0400 (EDT)

 On Thu, 18 Aug 2011, Warren Block wrote:

 > FreeBSD lightning 8.2-STABLE FreeBSD 8.2-STABLE #0: Wed Aug 17 19:31:39 MDT 2011     root at lightning:/usr/obj/usr/src/sys/LIGHTNING  i386
 >> Description:
 > Edit and polish the HAST section of the Handbook with an eye to conciseness and clarity.
 "concision" is three fewer characters :) (though OED has conciseness as 
 older)
 >> How-To-Repeat:
 >
 >> Fix:
 > Apply patch.
 >
 > Patch attached with submission follows:
 >
 > --- en_US.ISO8859-1/books/handbook/disks/chapter.sgml.orig	2011-08-18 15:22:56.000000000 -0600
 > +++ en_US.ISO8859-1/books/handbook/disks/chapter.sgml	2011-08-18 16:35:46.000000000 -0600
 > @@ -4038,7 +4038,7 @@
 >     <sect2>
 >       <title>Synopsis</title>
 >
 > -      <para>High-availability is one of the main requirements in serious
 > +      <para>High availability is one of the main requirements in serious
 > 	business applications and highly-available storage is a key
 > 	component in such environments.  Highly Available STorage, or
 > 	<acronym>HAST<remark role="acronym">Highly Available
 > @@ -4109,7 +4109,7 @@
 > 	  drives.</para>
 > 	</listitem>
 > 	<listitem>
 > -	  <para>File system agnostic, thus allowing to use any file
 > +	  <para>File system agnostic, thus allowing use of any file

 I think "allowing the use" is better here.

 > 	    system supported by &os;.</para>
 > 	</listitem>
 > 	<listitem>
 > @@ -4152,7 +4152,7 @@
 > 	total.</para>
 >       </note>
 >
 > -      <para>Since the <acronym>HAST</acronym> works in
 > +      <para>Since <acronym>HAST</acronym> works in

 "in a primary-secondary"

 > 	primary-secondary configuration, it allows only one of the
 > 	cluster nodes to be active at any given time.  The
 > 	<literal>primary</literal> node, also called
 > @@ -4334,51 +4334,51 @@
 > 	  available.</para>
 >       </note>
 >
 > -      <para>HAST is not responsible for selecting node's role
 > -	(<literal>primary</literal> or <literal>secondary</literal>).
 > -	Node's role has to be configured by an administrator or other
 > -	software like <application>Heartbeat</application> using the
 > +      <para>A HAST node's role (<literal>primary</literal> or
 > +        <literal>secondary</literal>) is selected by an administrator
 > +        or other
 > +        software like <application>Heartbeat</application> using the
 > 	&man.hastctl.8; utility.  Move to the primary node
 > 	(<literal><replaceable>hasta</replaceable></literal>) and
 > -	issue the following command:</para>
 > +	issue this command:</para>
 >
 >       <screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen>
 >
 > -      <para>Similarly, run the following command on the secondary node
 > +      <para>Similarly, run this command on the secondary node
 > 	(<literal><replaceable>hastb</replaceable></literal>):</para>
 >
 >       <screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen>
 >
 >       <caution>
 > -	<para>It may happen that both of the nodes are not able to
 > -	  communicate with each other and both are configured as
 > -	  primary nodes; the consequence of this condition is called
 > -	  <literal>split-brain</literal>.  In order to troubleshoot
 > +	<para>When the nodes are unable to
 > +	  communicate with each other, and both are configured as
 > +	  primary nodes, the condition is called
 > +	  <literal>split-brain</literal>.  To troubleshoot
 > 	  this situation, follow the steps described in <xref
 > 	  linkend="disks-hast-sb">.</para>
 >       </caution>
 >
 > -      <para>It is possible to verify the result with the
 > +      <para>Verify the result with the
 > 	&man.hastctl.8; utility on each node:</para>
 >
 >       <screen>&prompt.root; <userinput>hastctl status test</userinput></screen>
 >
 > -      <para>The important text is the <literal>status</literal> line
 > -	from its output and it should say <literal>complete</literal>
 > +      <para>The important text is the <literal>status</literal> line,
 > +	which should say <literal>complete</literal>
 > 	on each of the nodes.  If it says <literal>degraded</literal>,
 > 	something went wrong.  At this point, the synchronization
 > 	between the nodes has already started.  The synchronization
 > -	completes when the <command>hastctl status</command> command
 > +	completes when <command>hastctl status</command>
 > 	reports 0 bytes of <literal>dirty</literal> extents.</para>
 >
 >
 > -      <para>The last step is to create a filesystem on the
 > +      <para>The next step is to create a filesystem on the
 > 	<devicename>/dev/hast/<replaceable>test</replaceable></devicename>
 > -	GEOM provider and mount it.  This has to be done on the
 > -	<literal>primary</literal> node (as the
 > +	GEOM provider and mount it.  This must be done on the
 > +	<literal>primary</literal> node, as
 > 	<filename>/dev/hast/<replaceable>test</replaceable></filename>
 > -	appears only on the <literal>primary</literal> node), and
 > -	it can take a few minutes depending on the size of the hard
 > +	appears only on the <literal>primary</literal> node.
 > +	It can take a few minutes depending on the size of the hard

 The pronoun "it" may be confusing, here -- I would probably just say 
 "Creating the filesystem".

 > 	drive:</para>
 >
 >       <screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput>
 > @@ -4387,9 +4387,9 @@
 >
 >       <para>Once the <acronym>HAST</acronym> framework is configured
 > 	properly, the final step is to make sure that
 > -	<acronym>HAST</acronym> is started during the system boot time
 > -	automatically.  The following line should be added to the
 > -	<filename>/etc/rc.conf</filename> file:</para>
 > +	<acronym>HAST</acronym> is started automatically during the system
 > +	boot.  This line is added to
 > +	<filename>/etc/rc.conf</filename>:</para>

 "This line is added" is a pretty unusual grammatical construct for what is 
 attempting to be conveyed.  "To do so, add this line to" I think says 
 things more clearly.

 >
 >       <programlisting>hastd_enable="YES"</programlisting>
 >
 > @@ -4397,26 +4397,25 @@
 > 	<title>Failover Configuration</title>
 >
 > 	<para>The goal of this example is to build a robust storage
 > -	  system which is resistant from the failures of any given node.
 > -	  The key task here is to remedy a scenario when a
 > -	  <literal>primary</literal> node of the cluster fails.  Should
 > -	  it happen, the <literal>secondary</literal> node is there to
 > +	  system which is resistant to failures of any given node.

 The plural is not consistent between "failures" and "node".  "resistant to 
 the failure of any given node" is I think the conventional way to say 
 this (note that the original also had the incorrect plural "failures").

 > +	  The scenario is that a
 > +	  <literal>primary</literal> node of the cluster fails.  If
 > +	  this happens, the <literal>secondary</literal> node is there to
 > 	  take over seamlessly, check and mount the file system, and
 > 	  continue to work without missing a single bit of data.</para>
 >
 > -	<para>In order to accomplish this task, it will be required to
 > -	  utilize another feature available under &os; which provides
 > +	<para>To accomplish this task, another &os; feature provides
 > 	  for automatic failover on the IP layer —
 > -	  <acronym>CARP</acronym>.  <acronym>CARP</acronym> stands for
 > -	  Common Address Redundancy Protocol and allows multiple hosts
 > +	  <acronym>CARP</acronym>.  <acronym>CARP</acronym> (Common Address
 > +	  Redundancy Protocol) allows multiple hosts
 > 	  on the same network segment to share an IP address.  Set up
 >  	  <acronym>CARP</acronym> on both nodes of the cluster according
 > 	  to the documentation available in <xref linkend="carp">.
 > -	  After completing this task, each node should have its own
 > +	  After setup, each node will have its own
 > 	  <devicename>carp0</devicename> interface with a shared IP
 > 	  address <replaceable>172.16.0.254</replaceable>.
 > -	  Obviously, the primary <acronym>HAST</acronym> node of the
 > -	  cluster has to be the master <acronym>CARP</acronym>
 > +	  The primary <acronym>HAST</acronym> node of the
 > +	  cluster must be the master <acronym>CARP</acronym>
 > 	  node.</para>
 >
 > 	<para>The <acronym>HAST</acronym> pool created in the previous
 > @@ -4430,17 +4429,17 @@
 >
 > 	<para>In the event of <acronym>CARP</acronym> interfaces going
 > 	  up or down, the &os; operating system generates a &man.devd.8;
 > -	  event, which makes it possible to watch for the state changes
 > +	  event, making it possible to watch for the state changes
 > 	  on the <acronym>CARP</acronym> interfaces.  A state change on
 > 	  the <acronym>CARP</acronym> interface is an indication that
 > -	  one of the nodes failed or came back online.  In such a case,
 > -	  it is possible to run a particular script which will
 > +	  one of the nodes failed or came back online.  These state change
 > +	  events make it possible to run a script which will
 > 	  automatically handle the failover.</para>

 I think "handle HAST failover" would be an improvement.

 >
 > -	<para>To be able to catch the state changes on the
 > -	  <acronym>CARP</acronym> interfaces, the following
 > -	  configuration has to be added to the
 > -	  <filename>/etc/devd.conf</filename> file on each node:</para>
 > +	<para>To be able to catch state changes on the
 > +	  <acronym>CARP</acronym> interfaces, add this
 > +	  configuration to
 > +	  <filename>/etc/devd.conf</filename> on each node:</para>
 >
 > 	<programlisting>notify 30 {
 > 	match "system" "IFNET";
 > @@ -4456,12 +4455,12 @@
 > 	action "/usr/local/sbin/carp-hast-switch slave";
 > };</programlisting>
 >
 > -	<para>To put the new configuration into effect, run the
 > -	  following command on both nodes:</para>
 > +	<para>Restart &man.devd.8; on both nodes o put the new configuration

 "to"

 > +	  into effect:</para>
 >
 > 	<screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen>
 >
 > -	<para>In the event that the <devicename>carp0</devicename>
 > +	<para>When the <devicename>carp0</devicename>
 > 	  interface goes up or down (i.e. the interface state changes),
 > 	  the system generates a notification, allowing the &man.devd.8;
 > 	  subsystem to run an arbitrary script, in this case
 > @@ -4615,41 +4614,40 @@
 >       <sect3>
 > 	<title>General Troubleshooting Tips</title>
 >
 > -	<para><acronym>HAST</acronym> should be generally working
 > -	  without any issues, however as with any other software
 > +	<para><acronym>HAST</acronym> should generally work
 > +	  without issues.  However, as with any other software
 > 	  product, there may be times when it does not work as
 > 	  supposed.  The sources of the problems may be different, but
 > 	  the rule of thumb is to ensure that the time is synchronized
 > 	  between all nodes of the cluster.</para>
 >
 > -	<para>The debugging level of the &man.hastd.8; should be
 > -	  increased when troubleshooting <acronym>HAST</acronym>
 > -	  problems.  This can be accomplished by starting the
 > +	<para>When troubleshooting <acronym>HAST</acronym> problems,
 > +	  the debugging level of &man.hastd.8; should be increased
 > +	  by starting the
 > 	  &man.hastd.8; daemon with the <literal>-d</literal>
 > -	  argument.  Note, that this argument may be specified
 > +	  argument.  Note that this argument may be specified
 > 	  multiple times to further increase the debugging level.  A
 > -	  lot of useful information may be obtained this way.  It
 > -	  should be also considered to use <literal>-F</literal>
 > -	  argument, which will start the &man.hastd.8; daemon in
 > +	  lot of useful information may be obtained this way.  Consider
 > +	  also using the <literal>-F</literal>
 > +	  argument, which starts the &man.hastd.8; daemon in the
 > 	  foreground.</para>
 >      </sect3>
 >
 >       <sect3 id="disks-hast-sb">
 > 	<title>Recovering from the Split-brain Condition</title>
 >
 > -	<para>The consequence of a situation when both nodes of the
 > -	  cluster are not able to communicate with each other and both
 > -	  are configured as primary nodes is called
 > -	  <literal>split-brain</literal>.  This is a dangerous
 > +	<para><literal>Split-brain</literal> is when the nodes of the
 > +	  cluster are unable to communicate with each other, and both
 > +	  are configured as primary.  This is a dangerous
 > 	  condition because it allows both nodes to make incompatible
 > -	  changes to the data.  This situation has to be handled by
 > -	  the system administrator manually.</para>
 > +	  changes to the data.  This problem must be corrected
 > +	  manually by the system administrator.</para>
 >
 > -	<para>In order to fix this situation the administrator has to
 > +	<para>The administrator must
 > 	  decide which node has more important changes (or merge them
 > -	  manually) and let the <acronym>HAST</acronym> perform
 > +	  manually) and let <acronym>HAST</acronym> perform
 > 	  the full synchronization of the node which has the broken

 Just "full synchronization", I think.

 Thanks for spotting these grammar rough edges and putting together a 
 patch!

 -Ben Kaduk

 > -	  data.  To do this, issue the following commands on the node
 > +	  data.  To do this, issue these commands on the node
 > 	  which needs to be resynchronized:</para>
 >
 >         <screen>&prompt.root; <userinput>hastctl role init <resource></userinput>
 >
 >