docs/159897: [patch] improve HAST section of Handbook
Benjamin Kaduk
kaduk at MIT.EDU
Sun Aug 21 01:56:10 UTC 2011
On Thu, 18 Aug 2011, Warren Block wrote:
> FreeBSD lightning 8.2-STABLE FreeBSD 8.2-STABLE #0: Wed Aug 17 19:31:39 MDT 2011 root at lightning:/usr/obj/usr/src/sys/LIGHTNING i386
>> Description:
> Edit and polish the HAST section of the Handbook with an eye to conciseness and clarity.
"concision" is three fewer characters :) (though OED has conciseness as
older)
>> How-To-Repeat:
>
>> Fix:
> Apply patch.
>
> Patch attached with submission follows:
>
> --- en_US.ISO8859-1/books/handbook/disks/chapter.sgml.orig 2011-08-18 15:22:56.000000000 -0600
> +++ en_US.ISO8859-1/books/handbook/disks/chapter.sgml 2011-08-18 16:35:46.000000000 -0600
> @@ -4038,7 +4038,7 @@
> <sect2>
> <title>Synopsis</title>
>
> - <para>High-availability is one of the main requirements in serious
> + <para>High availability is one of the main requirements in serious
> business applications and highly-available storage is a key
> component in such environments. Highly Available STorage, or
> <acronym>HAST<remark role="acronym">Highly Available
> @@ -4109,7 +4109,7 @@
> drives.</para>
> </listitem>
> <listitem>
> - <para>File system agnostic, thus allowing to use any file
> + <para>File system agnostic, thus allowing use of any file
I think "allowing the use" is better here.
> system supported by &os;.</para>
> </listitem>
> <listitem>
> @@ -4152,7 +4152,7 @@
> total.</para>
> </note>
>
> - <para>Since the <acronym>HAST</acronym> works in
> + <para>Since <acronym>HAST</acronym> works in
"in a primary-secondary"
> primary-secondary configuration, it allows only one of the
> cluster nodes to be active at any given time. The
> <literal>primary</literal> node, also called
> @@ -4334,51 +4334,51 @@
> available.</para>
> </note>
>
> - <para>HAST is not responsible for selecting node's role
> - (<literal>primary</literal> or <literal>secondary</literal>).
> - Node's role has to be configured by an administrator or other
> - software like <application>Heartbeat</application> using the
> + <para>A HAST node's role (<literal>primary</literal> or
> + <literal>secondary</literal>) is selected by an administrator
> + or other
> + software like <application>Heartbeat</application> using the
> &man.hastctl.8; utility. Move to the primary node
> (<literal><replaceable>hasta</replaceable></literal>) and
> - issue the following command:</para>
> + issue this command:</para>
>
> <screen>&prompt.root; <userinput>hastctl role primary test</userinput></screen>
>
> - <para>Similarly, run the following command on the secondary node
> + <para>Similarly, run this command on the secondary node
> (<literal><replaceable>hastb</replaceable></literal>):</para>
>
> <screen>&prompt.root; <userinput>hastctl role secondary test</userinput></screen>
>
> <caution>
> - <para>It may happen that both of the nodes are not able to
> - communicate with each other and both are configured as
> - primary nodes; the consequence of this condition is called
> - <literal>split-brain</literal>. In order to troubleshoot
> + <para>When the nodes are unable to
> + communicate with each other, and both are configured as
> + primary nodes, the condition is called
> + <literal>split-brain</literal>. To troubleshoot
> this situation, follow the steps described in <xref
> linkend="disks-hast-sb">.</para>
> </caution>
>
> - <para>It is possible to verify the result with the
> + <para>Verify the result with the
> &man.hastctl.8; utility on each node:</para>
>
> <screen>&prompt.root; <userinput>hastctl status test</userinput></screen>
>
> - <para>The important text is the <literal>status</literal> line
> - from its output and it should say <literal>complete</literal>
> + <para>The important text is the <literal>status</literal> line,
> + which should say <literal>complete</literal>
> on each of the nodes. If it says <literal>degraded</literal>,
> something went wrong. At this point, the synchronization
> between the nodes has already started. The synchronization
> - completes when the <command>hastctl status</command> command
> + completes when <command>hastctl status</command>
> reports 0 bytes of <literal>dirty</literal> extents.</para>
>
>
> - <para>The last step is to create a filesystem on the
> + <para>The next step is to create a filesystem on the
> <devicename>/dev/hast/<replaceable>test</replaceable></devicename>
> - GEOM provider and mount it. This has to be done on the
> - <literal>primary</literal> node (as the
> + GEOM provider and mount it. This must be done on the
> + <literal>primary</literal> node, as
> <filename>/dev/hast/<replaceable>test</replaceable></filename>
> - appears only on the <literal>primary</literal> node), and
> - it can take a few minutes depending on the size of the hard
> + appears only on the <literal>primary</literal> node.
> + It can take a few minutes depending on the size of the hard
The pronoun "it" may be confusing, here -- I would probably just say
"Creating the filesystem".
> drive:</para>
>
> <screen>&prompt.root; <userinput>newfs -U /dev/hast/test</userinput>
> @@ -4387,9 +4387,9 @@
>
> <para>Once the <acronym>HAST</acronym> framework is configured
> properly, the final step is to make sure that
> - <acronym>HAST</acronym> is started during the system boot time
> - automatically. The following line should be added to the
> - <filename>/etc/rc.conf</filename> file:</para>
> + <acronym>HAST</acronym> is started automatically during the system
> + boot. This line is added to
> + <filename>/etc/rc.conf</filename>:</para>
"This line is added" is a pretty unusual grammatical construct for what is
attempting to be conveyed. "To do so, add this line to" I think says
things more clearly.
>
> <programlisting>hastd_enable="YES"</programlisting>
>
> @@ -4397,26 +4397,25 @@
> <title>Failover Configuration</title>
>
> <para>The goal of this example is to build a robust storage
> - system which is resistant from the failures of any given node.
> - The key task here is to remedy a scenario when a
> - <literal>primary</literal> node of the cluster fails. Should
> - it happen, the <literal>secondary</literal> node is there to
> + system which is resistant to failures of any given node.
The plural is not consistent between "failures" and "node". "resistant to
the failure of any given node" is I think the conventional way to say
this (note that the original also had the incorrect plural "failures").
> + The scenario is that a
> + <literal>primary</literal> node of the cluster fails. If
> + this happens, the <literal>secondary</literal> node is there to
> take over seamlessly, check and mount the file system, and
> continue to work without missing a single bit of data.</para>
>
> - <para>In order to accomplish this task, it will be required to
> - utilize another feature available under &os; which provides
> + <para>To accomplish this task, another &os; feature provides
> for automatic failover on the IP layer —
> - <acronym>CARP</acronym>. <acronym>CARP</acronym> stands for
> - Common Address Redundancy Protocol and allows multiple hosts
> + <acronym>CARP</acronym>. <acronym>CARP</acronym> (Common Address
> + Redundancy Protocol) allows multiple hosts
> on the same network segment to share an IP address. Set up
> <acronym>CARP</acronym> on both nodes of the cluster according
> to the documentation available in <xref linkend="carp">.
> - After completing this task, each node should have its own
> + After setup, each node will have its own
> <devicename>carp0</devicename> interface with a shared IP
> address <replaceable>172.16.0.254</replaceable>.
> - Obviously, the primary <acronym>HAST</acronym> node of the
> - cluster has to be the master <acronym>CARP</acronym>
> + The primary <acronym>HAST</acronym> node of the
> + cluster must be the master <acronym>CARP</acronym>
> node.</para>
>
> <para>The <acronym>HAST</acronym> pool created in the previous
> @@ -4430,17 +4429,17 @@
>
> <para>In the event of <acronym>CARP</acronym> interfaces going
> up or down, the &os; operating system generates a &man.devd.8;
> - event, which makes it possible to watch for the state changes
> + event, making it possible to watch for the state changes
> on the <acronym>CARP</acronym> interfaces. A state change on
> the <acronym>CARP</acronym> interface is an indication that
> - one of the nodes failed or came back online. In such a case,
> - it is possible to run a particular script which will
> + one of the nodes failed or came back online. These state change
> + events make it possible to run a script which will
> automatically handle the failover.</para>
I think "handle HAST failover" would be an improvement.
>
> - <para>To be able to catch the state changes on the
> - <acronym>CARP</acronym> interfaces, the following
> - configuration has to be added to the
> - <filename>/etc/devd.conf</filename> file on each node:</para>
> + <para>To be able to catch state changes on the
> + <acronym>CARP</acronym> interfaces, add this
> + configuration to
> + <filename>/etc/devd.conf</filename> on each node:</para>
>
> <programlisting>notify 30 {
> match "system" "IFNET";
> @@ -4456,12 +4455,12 @@
> action "/usr/local/sbin/carp-hast-switch slave";
> };</programlisting>
>
> - <para>To put the new configuration into effect, run the
> - following command on both nodes:</para>
> + <para>Restart &man.devd.8; on both nodes o put the new configuration
"to"
> + into effect:</para>
>
> <screen>&prompt.root; <userinput>/etc/rc.d/devd restart</userinput></screen>
>
> - <para>In the event that the <devicename>carp0</devicename>
> + <para>When the <devicename>carp0</devicename>
> interface goes up or down (i.e. the interface state changes),
> the system generates a notification, allowing the &man.devd.8;
> subsystem to run an arbitrary script, in this case
> @@ -4615,41 +4614,40 @@
> <sect3>
> <title>General Troubleshooting Tips</title>
>
> - <para><acronym>HAST</acronym> should be generally working
> - without any issues, however as with any other software
> + <para><acronym>HAST</acronym> should generally work
> + without issues. However, as with any other software
> product, there may be times when it does not work as
> supposed. The sources of the problems may be different, but
> the rule of thumb is to ensure that the time is synchronized
> between all nodes of the cluster.</para>
>
> - <para>The debugging level of the &man.hastd.8; should be
> - increased when troubleshooting <acronym>HAST</acronym>
> - problems. This can be accomplished by starting the
> + <para>When troubleshooting <acronym>HAST</acronym> problems,
> + the debugging level of &man.hastd.8; should be increased
> + by starting the
> &man.hastd.8; daemon with the <literal>-d</literal>
> - argument. Note, that this argument may be specified
> + argument. Note that this argument may be specified
> multiple times to further increase the debugging level. A
> - lot of useful information may be obtained this way. It
> - should be also considered to use <literal>-F</literal>
> - argument, which will start the &man.hastd.8; daemon in
> + lot of useful information may be obtained this way. Consider
> + also using the <literal>-F</literal>
> + argument, which starts the &man.hastd.8; daemon in the
> foreground.</para>
> </sect3>
>
> <sect3 id="disks-hast-sb">
> <title>Recovering from the Split-brain Condition</title>
>
> - <para>The consequence of a situation when both nodes of the
> - cluster are not able to communicate with each other and both
> - are configured as primary nodes is called
> - <literal>split-brain</literal>. This is a dangerous
> + <para><literal>Split-brain</literal> is when the nodes of the
> + cluster are unable to communicate with each other, and both
> + are configured as primary. This is a dangerous
> condition because it allows both nodes to make incompatible
> - changes to the data. This situation has to be handled by
> - the system administrator manually.</para>
> + changes to the data. This problem must be corrected
> + manually by the system administrator.</para>
>
> - <para>In order to fix this situation the administrator has to
> + <para>The administrator must
> decide which node has more important changes (or merge them
> - manually) and let the <acronym>HAST</acronym> perform
> + manually) and let <acronym>HAST</acronym> perform
> the full synchronization of the node which has the broken
Just "full synchronization", I think.
Thanks for spotting these grammar rough edges and putting together a
patch!
-Ben Kaduk
> - data. To do this, issue the following commands on the node
> + data. To do this, issue these commands on the node
> which needs to be resynchronized:</para>
>
> <screen>&prompt.root; <userinput>hastctl role init <resource></userinput>
>
>
More information about the freebsd-doc
mailing list