r314708: panic: tdsendsignal: ksi on queue
Bryan Drewery
bdrewery at FreeBSD.org
Thu Mar 9 23:57:35 UTC 2017
On 3/9/2017 3:47 PM, Bryan Drewery wrote:
> On 3/9/2017 3:11 PM, Jilles Tjoelker wrote:
>> On Thu, Mar 09, 2017 at 04:46:46PM +0200, Konstantin Belousov wrote:
>>> Yes, there is a race, apparently, with the child zombie still not finishing
>>> sending the SIGCHLD to the parent and parent exiting. The following should
>>> fix the issue, but I do not think that reproducing the problem is easy.
>>
>>> diff --git a/sys/kern/kern_exit.c b/sys/kern/kern_exit.c
>>> index c524fe5df37..ba5ff84e9de 100644
>>> --- a/sys/kern/kern_exit.c
>>> +++ b/sys/kern/kern_exit.c
>>> @@ -189,6 +189,7 @@ exit1(struct thread *td, int rval, int signo)
>>> {
>>> struct proc *p, *nq, *q, *t;
>>> struct thread *tdt;
>>> + ksiginfo_t ksi;
>>>
>>> mtx_assert(&Giant, MA_NOTOWNED);
>>> KASSERT(rval == 0 || signo == 0, ("exit1 rv %d sig %d", rval, signo));
>>> @@ -456,7 +457,12 @@ exit1(struct thread *td, int rval, int signo)
>>> proc_reparent(q, q->p_reaper);
>>> if (q->p_state == PRS_ZOMBIE) {
>>> PROC_LOCK(q->p_reaper);
>>> - pksignal(q->p_reaper, SIGCHLD, q->p_ksi);
>>> + if (q->p_ksi != NULL) {
>>> + ksiginfo_init(&ksi);
>>> + ksiginfo_copy(q->p_ksi, &ksi);
>>> + }
>>> + pksignal(q->p_reaper, SIGCHLD, q->p_ksi !=
>>> + NULL ? &ksi : NULL);
>>> PROC_UNLOCK(q->p_reaper);
>>> }
>>> } else {
>
> I just got something weird with this patch that wasn't happening before:
>
> /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j
> exp-10amd64 -p commit -z test devel/ccache
> [poudriere runs and completes with exit status 0]
>> time: command terminated abnormally
>> 28.08 real 9.92 user 10.38 sys
>> 23464 maximum resident set size
>> 4996 average shared memory size
>> 88 average unshared data size
>> 127 average unshared stack size
>> 282705 page reclaims
>> 5623 page faults
>> 0 swaps
>> 2673 block input operations
>> 4836 block output operations
>> 33 messages sent
>> 0 messages received
>> 37 signals received
>> 11226 voluntary context switches
>> 780 involuntary context switches
>> zsh: alarm /usr/bin/time -l src/bin/poudriere -e /usr/local/etc testport -j exp-10amd64
> exit status: 142 (SIGALRM).
>
> I don't see time(1) using SIGALRM or proc reaper at all.
>
> Rerunning it, and trying other simpler test cases, does not produce the
> same result. It may be some race unrelated to this patch, dunno.
>
I'm consistently getting foreground processes getting the wrong signals
now. I'm removing this patch for now.
>
>>
>> This patch introduces a subtle correctness bug. A real SIGCHLD ksiginfo
>> should always be the zombie's p_ksi; otherwise, the siginfo may be lost
>> if there are too many signals pending for the target process or in the
>> system. If the siginfo is lost and the reaper normally passes si_pid to
>> waitpid() or similar (instead of passing WAIT_ANY or P_ALL), a zombie
>> will remain until the reaper terminates.
>>
>> Conceptually the siginfo is sent to one process at a time only, so the
>> bug is an artifact of the implementation. Perhaps the piece of code
>> added in r309886 can be moved or the ksiginfo can be removed from the
>> parent's queue.
>>
>> If such a fix is not possible, it may be better to send a bare SIGCHLD
>> (si_code is SI_KERNEL or 0, depending on how many signals are pending)
>> in this situation and document that reapers must use WAIT_ANY or P_ALL.
>> (However, compared to the pre-r309886 situation they can still use
>> SIGCHLD to get notified when to call waitpid() or similar.)
>>
>
>
--
Regards,
Bryan Drewery
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20170309/e2922cf6/attachment.sig>
More information about the freebsd-current
mailing list