stress2 is now in projects
Peter Holm
peter at holm.cc
Sun Jan 18 14:25:00 PST 2009
On Sun, Jan 18, 2009 at 12:12:02PM -0800, Bakul Shah wrote:
> On Sun, 18 Jan 2009 15:09:24 +0100 Peter Holm <pho at freebsd.org> wrote:
> > On Sun, Jan 18, 2009 at 03:28:19PM +0200, Kostik Belousov wrote:
> > > On Sun, Jan 18, 2009 at 02:10:28PM +0100, Peter Holm wrote:
> > > > On Sun, Jan 18, 2009 at 01:11:25PM +0100, Dag-Erling Sm?rgrav wrote:
> > > > > Peter Holm <pho at freebsd.org> writes:
> > > > > > The key functionality of this test suite is that it runs a random
> > > > > > number of test programs for a random period, in random incarnations
> > > > > > and in random sequence.
> > > > >
> > > > > In other words, it's non-deterministic and non-reproducable.
> > > > >
> > > >
> > > > Yes, by design.
> > > >
> > > > > You should at the very least allow the user to specify the random seed.
> > > > >
> > > >
> > > > Yes, it would be interesting to see if this is enough to reproduce a
> > > > problem in a deterministic way. I'll look into this.
> > >
> > > I shall state from my experience using it (or, rather, inspecting bug
> > > reports generated by stress2), that in fact it is quite repeatable.
> > > I.e., when looking into one area, you almost always get _that_ problem,
> > > together with 2-3 related issues.
> > >
> > > Due to the nature of the tests and kernel undeterministic operations,
> > > I think that use of the same random seed gains nothing in regard with
> > > repeatability of the tests.
> >
> > It is an old issue that has come up many times: It would be so great
> > if it was possible to some how record the exact sequence that lead up
> > to a panic and play it back.
> >
> > But on the other hand, as you say, it *is* repeatable. The only
> > issue is that it may take 5 minutes or 5 hours.
> >
> > But I'm still game to see if it is possible at all (in single user
> > mode with no network activity etc.)
>
> Allowing a user to specify the random seed (and *always*
> reporting the random seed of every test) can't hurt and it
> may actually gain you repeatability in some cases. Most bugs
> are typically of garden variety, not dependent on some
Ah, yes if that was the case. But most of the problems I encounter are
typically lock related.
> complex interactions between parallel programs (or worse, on
> processor heisenbugs). You can always try repeating a failing
> test on a more deterministic set up like qemu etc.
>
Different hardware also seems to play a big role in finding bugs:
Number of CPUs etc.
> One trick I have used in the past is to record "significant"
> events in one or more ring buffers using some cheap encoding.
> You have then access to past N events during any post kernel
> crash analysis. This has far less of an overhead than debug
> printfs and you can even leave it enabled in production use.
--
Peter Holm
More information about the freebsd-arch
mailing list