maintainer-feedback requested: [Bug 255445] lang/python 3.8/3.9 SIGSEV core dumps in libthr TrueNAS
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Tue Apr 27 18:41:24 UTC 2021
Bugzilla Automation <bugzilla at FreeBSD.org> has asked freebsd-python (Nobody)
<python at FreeBSD.org> for maintainer-feedback:
Bug 255445: lang/python 3.8/3.9 SIGSEV core dumps in libthr TrueNAS
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255445
--- Description ---
Seeing many TrueNAS (previously FreeNAS) users dump core on the main
middlewared process (python) starting with our version 12.0 release.
Relevant OS information:
12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 f2858df162b(HEAD) TRUENAS amd64
Python versions that experience the core dump:
Python 3.8.7
Python 3.9.4
When initially researching this, I did find a regression with threading and
python 3.8 on freeBSD and was able to resolve that particular problem by
backporting the commits:
https://github.com/python/cpython/commit/4d96b4635aeff1b8ad41d41422ce808ce0b971
c8
and
https://github.com/python/cpython/commit/9ad58acbe8b90b4d0f2d2e139e38bb5aa32b7f
b6.
The reason why I backported those commits is because all of the core dumps that
I've analyzed are panic'ing in the same spot (or very close to it). For
example, here are 2 backtraces showing null-ptr dereference.
Core was generated by `python3.8: middlewared'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 cond_signal_common (cond=<optimized out>) at
/truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:457
warning: Source file is more recent than executable.
457 mp = td->mutex_obj;
[Current thread is 1 (LWP 100733)]
(gdb) list
452 _sleepq_unlock(cvp);
453 return (0);
454 }
455
456 td = _sleepq_first(sq);
457 mp = td->mutex_obj;
458 cvp->__has_user_waiters = _sleepq_remove(sq, td);
459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) {
460 if (curthread->nwaiter_defer >= MAX_DEFER_WAITERS) {
461 _thr_wake_all(curthread->defer_waiters,
(gdb) p *td
Cannot access memory at address 0x0
and another one
Core was generated by `python3.8: middlewared'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 cond_signal_common (cond=<optimized out>) at
/truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:459warning: Source
file is more recent than executable.
459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) {
[Current thread is 1 (LWP 101105)]
(gdb) list
454 }
455
456 td = _sleepq_first(sq);
457 mp = td->mutex_obj;
458 cvp->__has_user_waiters = _sleepq_remove(sq, td);
459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) {
460 if (curthread->nwaiter_defer >= MAX_DEFER_WAITERS) {
461 _thr_wake_all(curthread->defer_waiters,
462 curthread->nwaiter_defer);
463 curthread->nwaiter_defer = 0;
(gdb) p *mp
Cannot access memory at address 0x0
I'm trying to instrument a program to "stress" test threading (tearing down and
recreating etc etc) but I've been unsuccessful at tickling this particular
problem. The end-users that have seen this core dump sometimes go 1month +
without a problem. Hoping someone more knowledgeable can at least give me a
pointer or help me figure this one out. I have access to my VM that has all the
relevant core dumps available so if someone needs remote access to it to "poke"
around, please let me know. You can reach me at caleb [at] ixsystems.com
More information about the freebsd-python
mailing list