Help needed to identify golang fork / memory corruption issue on FreeBSD

Fri Mar 17 13:00:15 UTC 2017

On 17/03/2017 12:44, Konstantin Belousov wrote:
> On Fri, Mar 17, 2017 at 11:27:52AM +0000, Steven Hartland wrote:
>> On 17/03/2017 08:23, Konstantin Belousov wrote:
>>> On Fri, Mar 17, 2017 at 06:30:49AM +0000, Steven Hartland wrote:
>>>> Ok I think I've identified the cause.
>>>>
>>>> If an alternative signal stack is applied to a non-main thread and that
>>>> thread calls execve then the signal stack is not cleared.
>>>>
>>>> This results in all sorts of badness.
>>>>
>>>> Full details, including a small C reproduction case can be found here:
>>>> https://github.com/golang/go/issues/15658#issuecomment-287276856
>>>>
>>>> So looks like its kernel bug. If anyone has an ideas about that before I
>>>> look tomorrow that would be appreciated.
>>> Yes, there is definitely a kernel bug, which should be fixed by the patch
>>> below.
>>>
>>> Still, what I saw when I looked at the issue, is not quite resembling
>>> potential consequences of the bug.  Using wrong memory for signal stack
>>> would result either in much more significant memory corruption if the
>>> alt stack range is mapped and used for something unrelated, or in killed
>>> process on signal delivery, if the range is not mapped.  While I saw a
>>> systematic 'off by 0x10' in some gc structures.
>>>
>>> Anyway, patch for the issue you identified:
>>>
>>> diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
>>> index 29d5dd4b132..9bf3ba66f5c 100644
>>> --- a/sys/kern/kern_sig.c
>>> +++ b/sys/kern/kern_sig.c
>>> @@ -976,7 +976,6 @@ execsigs(struct proc *p)
>>>    	 * and are now ignored by default).
>>>    	 */
>>>    	PROC_LOCK_ASSERT(p, MA_OWNED);
>>> -	td = FIRST_THREAD_IN_PROC(p);
>>>    	ps = p->p_sigacts;
>>>    	mtx_lock(&ps->ps_mtx);
>>>    	while (SIGNOTEMPTY(ps->ps_sigcatch)) {
>>> @@ -1007,6 +1006,8 @@ execsigs(struct proc *p)
>>>    	 * Reset stack state to the user stack.
>>>    	 * Clear set of signals caught on the signal stack.
>>>    	 */
>>> +	td = curthread;
>>> +	MPASS(td->td_proc == p);
>>>    	td->td_sigstk.ss_flags = SS_DISABLE;
>>>    	td->td_sigstk.ss_size = 0;
>>>    	td->td_sigstk.ss_sp = 0;
>> Thanks Kostik, pretty obvious now looking at :)
>>
>> Testing here we've seen all sorts of corruption looking things, mainly
>> around random signals from SIGILL to SIGSEGV but also random kernel
>> messages including:
>> pid 4603 (test): sigreturn copying xfpustate failed
>> pid 5013 (test): sigreturn xfpusave_len = 0x44d9bb
>>
>> I'm currently running a test, but its looking good as the test case
>> usually crashes in a matter of seconds.
>>
>> Would you mind if I committed it?
> I am capable of committing the patches.
No problem, wouldn't ever suggest otherwise, just didn't want to add to 
your workload ;-)
>
>> I'm guessing given its nature this is something we'd want MFC'ed and
>> Errata's issued for all supported versions?
> MFC will be done for sure.  I am not so sure about EN, this is a routine
> bugfix.  For some reasons 10.3 errata might be indeed the only way to get
> this for 10.x users, but I do not see why bother re/so with 11.0.
My argument for doing an EN would for 11.0 as well as 10.3 be two fold:
1. It exposes other processes memory so could be considered as security 
issue?
2. Given its causing quite a bit of pain for golang users (random 
crashes), which is getting used more and more now, it would be good to 
get a fix out sooner rather than later and 11.1 is still over 4months off.

     Regards
     Steve