gmirror crash writing to disk? Or is it su+j crash?
Zaphod Beeblebrox
zbeeble at gmail.com
Thu Sep 5 15:16:43 UTC 2013
Replying to myself again, I again doubled the bio_transient_maxcnt:
original value 160, failed doubling 360, new value 720; and the machine was
able to successfully "for i in jot 10; do make -j4 buildkernel; done" ...
But doesn't this mean that we still have a resource exhaustion to worry
about? Isn't this just another race waiting for the the right set of
conditions?
On Tue, Sep 3, 2013 at 11:06 AM, Zaphod Beeblebrox <zbeeble at gmail.com>wrote:
> Since there weren't any more ideas here, I tried turning off
> hyper-threading. This is an old pentium-D type CPU --- that is: one core
> with HT. I'm wondering if the HT nature is helping this resource
> exhaustion, so I turned off HT (basically making this a single-threaded
> CPU) and it seems to have made the problem go away.
>
> That is not to say that the problem is fixed: it simply means that
> replication may be tied to multiple CPUs and/or the allocation of resources
> by an HT CPU core.
>
>
> On Mon, Sep 2, 2013 at 3:53 AM, Zaphod Beeblebrox <zbeeble at gmail.com>wrote:
>
>> The first one (kern.geom.transient_map_retries) causes the system to
>> wedge.
>>
>> The second one (default is 180, I doubled to 360) causes the system to
>> crash but not dump.
>>
>> So... neither fixes the problem.
>>
>>
>> On Sat, Aug 31, 2013 at 5:27 AM, Edward Tomasz Napierała <
>> trasz at freebsd.org> wrote:
>>
>>> Wiadomość napisana przez Zaphod Beeblebrox <zbeeble at gmail.com> w dniu
>>> 31 sie 2013, o godz. 00:49:
>>> > Because someone said that there would be no logging of unerlying ATA
>>> errors without verbose, I rebooted with verbose and tried the same make -j4
>>> again... and here is the relatively similar core.txt.5
>>> >
>>> >
>>> https://uk.eicat.ca/owncloud/public.php?service=files&t=d99648ef5876b91c5957148445e60c87
>>> >
>>> > Looking at it, gmirror is dropping the same error and the underlying
>>> hardware is not causing the error...
>>>
>>> Let me quote Konstantin:
>>>
>>> > It is either an exhaustion of the transient map, or a deadlock.
>>> > For the first, setting kern.geom.transient_map_retries to 0 could help.
>>> > For the second, the count of the transient buffers must be increased,
>>> > by kern.bio_transient_maxcnt loader tunable.
>>>
>>> Could you try both and tell which one of them fixed the problem? Thanks!
>>>
>>>
>>
>
More information about the freebsd-stable
mailing list