[Bug 283747] kernel panic after telegraf service restart

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 28 Mar 2025 18:06:03 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283747

--- Comment #47 from Gleb Smirnoff <glebius@FreeBSD.org> ---
Mike, my current hypothesis is that we have a 32-bit overflow in credential
reference counting.  The overflow happens, when we reap a group of processes,
and reference counts of the group summed up together overflow.  AFAIU, telegraf
will fork+exec arbitrary programs, which in their turn can also fork+exec more
programs.  While telegraf itself seems to do proper wait(2)-ing on zombies, but
some external program may leak zombies, and do not exit itself.  Then, when
telegraf is restarted, this pack of zombies is reaped and this is where
overflow could be hit.

This is fixed by attachment 258804.  I am not sure in my hypothesis, that's why
it is not even committed to CURRENT.  However, everyone affected by the bug are
advices to use this patch and let's see what happens.  We still have some time
before 14.3.  I will probably start review process to get it into CURRENT,
anyway.

With this info, you may have some idea on how to reproduce it.  I know, you are
good at chasing bugs, Mike :) Sorry that it hits you, but I'm glad that you
joined the team chasing this bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.