[Bug 255445] lang/python 3.8/3.9 SIGSEV core dumps in libthr TrueNAS
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Tue Apr 27 18:41:24 UTC 2021
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255445
Bug ID: 255445
Summary: lang/python 3.8/3.9 SIGSEV core dumps in libthr
TrueNAS
Product: Ports & Packages
Version: Latest
Hardware: amd64
OS: Any
Status: New
Keywords: crash
Severity: Affects Many People
Priority: ---
Component: Individual Port(s)
Assignee: python at FreeBSD.org
Reporter: yocalebo at gmail.com
Flags: maintainer-feedback?(python at FreeBSD.org)
Assignee: python at FreeBSD.org
Seeing many TrueNAS (previously FreeNAS) users dump core on the main
middlewared process (python) starting with our version 12.0 release.
Relevant OS information:
12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 f2858df162b(HEAD) TRUENAS amd64
Python versions that experience the core dump:
Python 3.8.7
Python 3.9.4
When initially researching this, I did find a regression with threading and
python 3.8 on freeBSD and was able to resolve that particular problem by
backporting the commits:
https://github.com/python/cpython/commit/4d96b4635aeff1b8ad41d41422ce808ce0b971c8
and
https://github.com/python/cpython/commit/9ad58acbe8b90b4d0f2d2e139e38bb5aa32b7fb6.
The reason why I backported those commits is because all of the core dumps that
I've analyzed are panic'ing in the same spot (or very close to it). For
example, here are 2 backtraces showing null-ptr dereference.
Core was generated by `python3.8: middlewared'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 cond_signal_common (cond=<optimized out>) at
/truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:457
warning: Source file is more recent than executable.
457 mp = td->mutex_obj;
[Current thread is 1 (LWP 100733)]
(gdb) list
452 _sleepq_unlock(cvp);
453 return (0);
454 }
455
456 td = _sleepq_first(sq);
457 mp = td->mutex_obj;
458 cvp->__has_user_waiters = _sleepq_remove(sq, td);
459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) {
460 if (curthread->nwaiter_defer >= MAX_DEFER_WAITERS) {
461 _thr_wake_all(curthread->defer_waiters,
(gdb) p *td
Cannot access memory at address 0x0
and another one
Core was generated by `python3.8: middlewared'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 cond_signal_common (cond=<optimized out>) at
/truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:459warning: Source
file is more recent than executable.
459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) {
[Current thread is 1 (LWP 101105)]
(gdb) list
454 }
455
456 td = _sleepq_first(sq);
457 mp = td->mutex_obj;
458 cvp->__has_user_waiters = _sleepq_remove(sq, td);
459 if (PMUTEX_OWNER_ID(mp) == TID(curthread)) {
460 if (curthread->nwaiter_defer >= MAX_DEFER_WAITERS) {
461 _thr_wake_all(curthread->defer_waiters,
462 curthread->nwaiter_defer);
463 curthread->nwaiter_defer = 0;
(gdb) p *mp
Cannot access memory at address 0x0
I'm trying to instrument a program to "stress" test threading (tearing down and
recreating etc etc) but I've been unsuccessful at tickling this particular
problem. The end-users that have seen this core dump sometimes go 1month +
without a problem. Hoping someone more knowledgeable can at least give me a
pointer or help me figure this one out. I have access to my VM that has all the
relevant core dumps available so if someone needs remote access to it to "poke"
around, please let me know. You can reach me at caleb [at] ixsystems.com
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-python
mailing list