[simon at FreeBSD.org: Re: Hints for precision benchmarking...]

Sun Feb 8 13:18:39 UTC 2004

[ Since mailman seems to have sent the first version of this to the
eternal bit fields, I'm resending this. ]

From: "Simon L. Nielsen" <simon at FreeBSD.org>
Date: Mon, 26 Jan 2004 22:56:02 +0100
To: freebsd-doc at freebsd.org
Cc: Poul-Henning Kamp <phk at phk.freebsd.dk>,
	Robert Watson <rwatson at FreeBSD.org>
User-Agent: Mutt/1.5.5.1i
Subject: Re: Hints for precision benchmarking...

On 2004.01.26 12:12:58 -0500, Robert Watson wrote:
> 
> On Mon, 26 Jan 2004, Poul-Henning Kamp wrote:
> 
> > *   Run in single user mode.  Cron(8) and and other daemons
> >     only add noise.
> 
> Of particular interest here is sshd -- either disable its SSHv1 key
> regeneration, or kill the parent sshd (or don't run sshd) during the
> tests. 

I thought these lists were nice to have in documentation for future
reference, so I converted them into DocBook.  I couldn't really find a
good place in the Developers Handbook to put the list, so Robert Watson
suggsted I creating a new Regression and Performance Testing chapter.

I had to do some rewriting to make the wording more like documentation
so I probably managed to totally mess up the gramma, but I hope I don't
mess up the actual content.

Compiled version at http://people.freebsd.org/~simon/work/testing.html .

Comments, suggestions ?

<!--
     The FreeBSD Documentation Project

     $FreeBSD$
-->

<chapter id="testing">
  <title>Regression and Performance Testing</title>

  <para>Regression tests are used to exercise a particular bit of the
    system to check that it works as expected, and to make sure that
    old bugs are not reintroduced.</para>

  <para>The &os; regression testing tools can be found in the &os;
    source tree in the directory <filename
      class="directory">src/tools/regression</filename>.</para>

  <section id="testing-micro-benchmark">
    <title>Micro Benchmark Checklist</title>

    <para>This section contains a suggestions for doing proper
      micro-benchmarking on &os; or off &os; itself.</para>

    <para>It is not possible to use all of the suggestions below every
      single time, but the more used, the better the benchmark's
      ability to test small differences will be.</para>

    <itemizedlist>
      <listitem>
	<para>Disable <acronym>APM</acronym> and any other kind of
	  clock fiddling (<acronym>ACPI</acronym> ?).</para>
      </listitem>

      <listitem>
	<para>Run in single user mode. E.g. &man.cron.8;, and and
	  other daemons only add noise.  The &man.sshd.8; daemon can
	  also cause problems.  If ssh access is required during test
	  either disable the SSHv1 key regeneration, or kill the
	  parent <command>sshd</command> daemon during the tests.</para>
      </listitem>

      <listitem>
	<para>Do not run &man.ntpd.8;.</para>
      </listitem>

      <listitem>
	<para>If &man.syslog.3; events are generated, run
	  &man.syslogd.8; with an empty
	  <filename>/etc/syslogd.conf</filename>, otherwise, do not
	  run it.</para>
      </listitem>

      <listitem>
	<para>Minimize disk-I/O, avoid it entirely if possible.</para>
      </listitem>

      <listitem>
	<para>Do not mount file systems that are not needed.</para>
      </listitem>

      <listitem>
	<para>Mount <filename class="directory">/</filename>,
	  <filename class="directory">/usr</filename>, and any other
	  file system as read-only if possible.  This removes atime
	  updates to disk (etc.) from the I/O picture.</para>
      </listitem>

      <listitem>
	<para>Reinitialize the read/write test file system with
	  &man.newfs.8; and populate it from a &man.tar.1; or
	  &man.dump.8; file before every run.  Unmount and mount it
	  before starting the test.  This results in a consistent file
	  system layout.  For a worldstone test this would apply to
	  <filename class="directory">/usr/obj</filename> (just
	  reinitialize with <command>newfs</command> and mount).  To
	  get 100% reproducibility, populate the file system from a
	  &man.dd.1; file (i.e.: <command>dd
	  if=<filename>myimage</filename> of=<filename
	  class="devicefile">/dev/ad0s1h</filename>
	  bs=1m</command>)</para>
      </listitem>

      <listitem>
	<para>Use malloc backed or preloaded &man.md.4;
	  partitions.</para>
      </listitem>

      <listitem>
	<para>Reboot between individual iterations of the test, this
	  gives a more consistent state.</para>
      </listitem>

      <listitem>
	<para>Remove all non-essential device drivers from the kernel.
	  For instance if USB is not needed for the test, do not put
	  USB in the kernel.  Drivers which attach often have timeouts
	  ticking away.</para>
      </listitem>

      <listitem>
	<para>Unconfigure hardware that are not in use.  Detach disks
	  with &man.atacontrol.8; and &man.camcontrol.8; if the disk
	  are not used for the test.</para>
      </listitem>

      <listitem>
	<para>Do not configure the network unless it is being tested,
	  or wait until after the test has been performed to ship the
	  results off to another computer.</para>

	<para>If the system must be connected to a public network,
	  watch out for spikes of broadcast traffic.  Even though it
	  is hardly noticeable, it will take up CPU cycles.  Multicast
	  has similar caveats.</para>
      </listitem>

      <listitem>
	<para>Put each file system on its own disk.  This minimizes
	  jitter from head-seek optimizations.</para>
      </listitem>

      <listitem>
	<para>Minimize output to serial or VGA consoles.  Running
	  output into files gives less jitter.  (Serial consoles
	  easily become a bottleneck.)  Do not touch keyboard while
	  the test is running, even <keycap>space</keycap> or
	  <keycap>back-space</keycap> shows up in the numbers.</para>
      </listitem>

      <listitem>
	<para>Make sure the test is long enough, but not too long.  If
	  the test is too short, timestamping is a problem.  If it is
	  too long temperature changes and drift will affect the
	  frequency of the quartz crystals in the computer.  Rule of
	  thumb: more than a minute, less than an hour.</para>
      </listitem>

      <listitem>
	<para>Try to keep the temperature as stable as possible around
	  the machine.  This affects both quartz crystals and disk
	  drive algorithms.  To get real stable clock, consider
	  stabilized clock injection.  E.g. get a OCXO + PLL, inject
	  output into clock circuits instead of motherboard xtal.
	  Contact &a.phk; for more information about this.</para>
      </listitem>

      <listitem>
	<para>Run the test at least 3 times but it is better to run
	  more than 20 times both for <quote>before</quote> and
	  <quote>after</quote> code.  Try to interleave if possible
	  (i.e.: do no run 20 times before then 20 times after), this
	  makes it possible to spot environmental effects.  Do not
	  interleave 1:1, but 3:3, this makes it possible to spot
	  interaction effects.</para>

	<para>A good pattern is: <literal>bababa{bbbaaa}*</literal>.
	  This gives hint after the first 1+1 runs (so it is possible
	  to stop the test if it goes entirely the wrong way), a
	  standard deviation after the first 3+3 (gives a good
	  indication if it is going to be worth a long run) and
	  trending and interaction numbers later on.</para>
      </listitem>

      <listitem>
	<para>Use <filename
	    class="directory">usr/src/tools/tools/ministat</filename>
	  to see if the numbers are significant.  Consider buying
	  <quote>Cartoon guide to statistics</quote> ISBN:
	  0062731025, highly recommended, if you have forgotten or
	  never learned about standard deviation and Student's
	  T.</para>
      </listitem>

      <listitem>
	<para>Do not use background &man.fsck.8; unless the test is a
	  benchmark of background <command>fsck</command>.  Also,
	  disable <varname>background_fsck</varname> in
	  <filename>/etc/rc.conf</filename> unless the benchmark is
	  not started at least 60+<quote><command>fsck</command>
	  runtime</quote> seconds after the boot, as &man.rc.8; wakes
	  up and checks if <command>fsck</command> needs to run on any
	  file systems when background <command>fsck</command> is
	  enabled.  Likewise, make sure there are no snapshots lying
	  around unless the benchmark is a test with snapshots.</para>
      </listitem>

      <listitem>
	<para>If the benchmark show unexpected bad performance, check
	  for things like high interrupt volume from an unexpected
	  source.  Some versions of <acronym>ACPI</acronym> have been
	  reported to <quote>misbehave</quote> and generate excess
	  interrupts.  To help diagnose odd test results, take a few
	  snapshots of <command>vmstat -i</command> and look for
	  anything unusual.</para>
      </listitem>

      <listitem>
	<para>Make sure to be careful about optimization parameters
	  for kernel and userspace, likewise debugging.  It is easy to
	  let something slip through and realize later the test was
	  not comparing the same thing.</para>
      </listitem>

      <listitem>
	<para>Do not ever benchmark with the
	  <literal>WITNESS</literal> and <literal>INVARIANTS</literal>
	  kernel options enabled unless the test is interested to
	  benchmarking those features.  <literal>WITNESS</literal> can
	  cause 400%+ drops in performance.  Likewise, userspace
	  &man.malloc.3; parameters default differently in -CURRENT
	  from the way they ship in production releases.</para>
      </listitem>
    </itemizedlist>
  </section>
</chapter>
<!--
     Local Variables:
     mode: sgml
     sgml-declaration: "../chapter.decl"
     sgml-indent-data: t
     sgml-omittag: nil
     sgml-always-quote-attributes: t
     sgml-parent-document: ("../book.sgml" "part" "chapter")
     End:
-->

----- End forwarded message -----

-- 
Simon L. Nielsen
FreeBSD Documentation Team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-doc/attachments/20040208/f523c259/attachment.sig>