[Bug 275597] Samba: smbd sometimes aborts by PANIC when 'vfs objects = cap'

From: <bugzilla-noreply_at_freebsd.org>
Date: Thu, 07 Dec 2023 10:55:41 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275597

            Bug ID: 275597
           Summary: Samba: smbd sometimes aborts by PANIC when 'vfs
                    objects = cap'
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: ports-bugs@FreeBSD.org
          Reporter: uratan@miomio.jp

Created attachment 246852
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=246852&action=edit
dmesg output of evaluating 14.0R

I started evaluating FreeBSD 14.0R(i386) for use as my home server.
And I have noticed that Samba: smbd often aborts abnormally like below.
  +----------------------------
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.286037,  0]
../../lib/util/fault.c:172(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:  
===============================================================
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307281,  0]
../../lib/util/fault.c:176(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   INTERNAL ERROR: Signal 11:
Segmentation fault in pid 10655 (4.16.11)
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307391,  0]
../../lib/util/fault.c:181(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   If you are running a recent Samba
version, and if you think this problem is not yet fixed in the latest versions,
please consider reporting this bug, see
https://wiki.samba.org/index.php/Bug_Reporting
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307505,  0]
../../lib/util/fault.c:182(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:  
===============================================================
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307601,  0]
../../lib/util/fault.c:184(smb_panic_log)
  |Dec  4 13:06:58 oxygen smbd[10655]:   PANIC (pid 10655): Signal 11:
Segmentation fault in 4.16.11
  |Dec  4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.370441,  0]
../../lib/util/fault.c:245(log_stack_trace)
  |Dec  4 13:06:58 oxygen smbd[10655]:   BACKTRACE:
  |Dec  4 13:06:58 oxygen smbd[10655]:    #0 log_stack_trace + 0x43
[ip=0x203e84d3] [sp=0xffbfd35c]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #1 smb_panic_log + 0x6c
[ip=0x203e835c] [sp=0xffbfd96c]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #2 smb_panic + 0x1a [ip=0x203e86da]
[sp=0xffbfd980]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #3 fault_setup + 0xae [ip=0x203e82ee]
[sp=0xffbfd994]
  |Dec  4 13:06:58 oxygen smbd[10655]:    #4 _pthread_sigmask + 0x5c9
[ip=0x21162db9] [sp=0xffbfda2c]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #5 _pthread_setschedparam + 0x969
[ip=0x21162289] [sp=0xffbfdd40]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #6 <unknown symbol> [ip=0xffbff004]
[sp=0xffbfdd80]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #7 vfs_readdirname + 0x56
[ip=0x2019ea46] [sp=0xffbfe15c]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #8 ReadDirName + 0x111
[ip=0x201396f1] [sp=0xffbfe18c]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #9 smbd_dirptr_get_entry + 0x161
[ip=0x20137f81] [sp=0xffbfe1c8]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #10 smbd_dirptr_lanman2_entry + 0x151
[ip=0x2016f231] [sp=0xffbfe350]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #11
smbd_smb2_request_process_query_directory + 0xd9b [ip=0x201dbadb]
[sp=0xffbfe480]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #12
smbd_smb2_request_process_query_directory + 0x884 [ip=0x201db5c4]
[sp=0xffbfe578]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #13 smbd_smb2_request_dispatch +
0xce4 [ip=0x201c4834] [sp=0xffbfe604]
  |Dec  4 13:06:59 oxygen smbd[10655]:    #14 smbd_smb2_process_negprot +
0x244a [ip=0x201c9cba] [sp=0xffbfe678]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #15 tevent_common_invoke_fd_handler +
0x8b [ip=0x2056b64b] [sp=0xffbfe6cc]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #16 tevent_context_same_loop + 0xdd2
[ip=0x2056e612] [sp=0xffbfe6f4]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #17 _tevent_loop_once + 0xcf
[ip=0x2056a5af] [sp=0xffbfe740]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #18 tevent_common_loop_wait + 0x2f
[ip=0x2056a7bf] [sp=0xffbfe768]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #19 _tevent_loop_wait + 0x1c
[ip=0x2056a84c] [sp=0xffbfe784]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #20 smbd_process + 0x795
[ip=0x201b2205] [sp=0xffbfe798]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #21 main + 0x4670 [ip=0xe690]
[sp=0xffbfe7fc]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #22 tevent_common_invoke_fd_handler +
0x8b [ip=0x2056b64b] [sp=0xffbfe8b4]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #23 tevent_context_same_loop + 0xdd2
[ip=0x2056e612] [sp=0xffbfe8dc]
  |Dec  4 13:07:00 oxygen smbd[10655]:    #24 _tevent_loop_once + 0xcf
[ip=0x2056a5af] [sp=0xffbfe924]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #25 tevent_common_loop_wait + 0x2f
[ip=0x2056a7bf] [sp=0xffbfe94c]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #26 _tevent_loop_wait + 0x1c
[ip=0x2056a84c] [sp=0xffbfe968]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #27 main + 0x2ab9 [ip=0xcad9]
[sp=0xffbfe97c]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #28 main + 0x15e9 [ip=0xb609]
[sp=0xffbfe9a0]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #29 __libc_start1 + 0x155
[ip=0x205e38c5] [sp=0xffbfebcc]
  |Dec  4 13:07:01 oxygen smbd[10655]:    #30 _start + 0x36 [ip=0x9d06]
[sp=0xffbfebf0]
  |Dec  4 13:07:01 oxygen smbd[10655]: [2023/12/04 13:07:01.547212,  0]
../../source3/lib/dumpcore.c:310(dump_core)
  |Dec  4 13:07:01 oxygen smbd[10655]:   unable to change to %N.core
  |Dec  4 13:07:01 oxygen smbd[10655]:   refusing to dump core
  +----------------------------

The various environments and configurations are as follows:
 (hostname is 'oxygen' and using i386 architecture)
  +----------------------------
  |% uname -a
  |FreeBSD oxygen 14.0-RELEASE FreeBSD 14.0-RELEASE #3: Mon Nov 27 22:32:52 JST
2023     uratan@oxygen:/usr/src/sys/i386/compile/OXYGEN i386
  |
  |% pkg info | grep samba
  |samba416-4.16.11     Free SMB/CIFS and AD/DC server and client for Unix
  +----------------------------

  (essence of) /usr/local/etc/smb4.conf
  +----------------------------
  |[global]
  |   workgroup = METXXXXX
  |   server string = Samba Server
  |    local master = no
  |    security     = user
  |    map to guest = Bad User
  |    load printers = yes
  |    printing      = bsd
  |    printcap cache time = 0
  |    unix charset = cp932
  |    dos  charset = cp932
  |   server role = standalone server
  |    hosts allow = 10.
  |    guest account = nobody
  |   log file = /var/log/samba4/log.%m
  |   max log size = 50
  |   dns proxy = no
  |
  |[printers]
  |   comment = All Printers
  |   path = /var/spool/samba4
  |   browsable = no
  |   guest ok = no
  |   writeable = no
  |   printable = yes
  |
  |[maindish]
  |    comment = exported for PC
  |    path    = /home
  |   vfs objects = cap
  |    browseable = yes
  |    guest ok   = yes
  |    guest only = yes
  |    writeable  = yes
  |    create    mask = 0775
  |    directory mask = 0775
  +----------------------------

  (essence of) /etc/rc.conf
  +----------------------------
  |samba_server_enable="YES"
  |nmbd_enable="NO"
  |smbd_enable="YES"
  |winbindd_enable="NO"
  +----------------------------

Confirm attached file: dmesg-today.txt for misc hardware environments.
note: re0 driver is replaced by RealTek's to avoid 'watchdog timeout'.
 (rtl_bsd_drv_v199.04.tgz from
 
https://www.realtek.com/ja/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software
)
It should have nothing to do with this Samba problem because,
when using GENERIC kernel at first phase, samba PANIC and
re0: watchdog timeout were both seen in kernel messages.
I suspected watchdog timeout was the core cause at first,
but it wasn't. (maybe)

I am using windows7 (also x86/32bit version) as the test client,
and using win32 native executable diff.exe from UnxUtils
in the subsequent reports.
 (see https://unxutils.sourceforge.net/ for UnxUtils)
  +----------------------------
  |U:\> ver
  |Microsoft Windows [Version 6.1.7601]
  |
  |U:\> diff.exe --version
  |diff - GNU diffutils version 2.7
  |
  |U:\> net use
  |--------------------------------------------------------
  |OK  I:  \\silver\maindish  Microsoft Wi...  <== 10.2R server (samba36)
  |OK  J:  \\oxygen\maindish  Microsoft Wi...  <== 14.0R (samba416)
  |
  +----------------------------

hostname 'silver' is my current home server,
running with FreeBSD 10.2R (i386).
  +----------------------------
  |% uname -a
  |FreeBSD silver 10.2-RELEASE FreeBSD 10.2-RELEASE #1: Sun Mar 12 09:07:49 JST
2017     uratan@silver:/usr/src/sys/i386/compile/OXYGEN  i386
  |
  |% pkg info | grep samba
  |samba36-3.6.25_1     Free SMB and CIFS client and server for Unix
  +----------------------------

 (10.2R ? Win7 ? i386 ? The reason why various things are all outdated)
 (is because I am an old man also outdated who strongly likes stability.)

 - * - * -

Show the error situation at first.
 (Please note the difference in drive letter i: or j:)
"x:\win\Hardwares" has: 1454 dirs 9644 files 144GB in total.

These errors reported are caused by smbd PANIC abort.
The error files/dirs are different for each run.
There is NO error that says the file contents are different.
  +----------------------------
  |U:\> diff -rq i:\win\Hardwares j:\win\Hardwares
  |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_4444\igdlh.cat: No such file or
directory
  |diff: j:\win\Hardwares\T100Chi\someDir_777: Invalid argument
  |
  |U:\> diff -rq j:\win\Hardwares i:\win\Hardwares
  |diff: j:\win\Hardwares\ATerm\someDir_88: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_4444\igdlh.cat: No such file or
directory
  |diff: j:\win\Hardwares\ScanSnap\someDir_6666: Invalid argument
  |diff: j:\win\Hardwares\T100Chi\someDir_777: Invalid argument
  +----------------------------

 - * - * -

I also tried another version of samba: samba413-4.13.17_6.pkg
but the situation was the same, smbd aborts, too.

I analyzed the PANIC abort logs left in the /var/log/messages
from beginning of the evaluation to today.
There are 54 logs in total by both version of samba.
See the summary of BACKTRACEs below, the intermediate flow is slightly
different, but the flow to PANIC abort seems to be almost same.

The <unknown symbol> is very suspicious and, I think,
the failure of ReadDirName() or vfs_readdirname() are not
so far, I think, from the errors reported by diff.exe:
"Invalid argument" or "No such file or directory".
  -----------------------------------------------------------------------------
  |    25 times by samba416                  |    29 times by samba413
   ----25times----------------COMMON FLOW----------29times---------------------
  | _start + 0x36                            | _start + 0x36
  | __libc_start1 + 0x155                    | __libc_start1 + 0x155
  | main + 0x15e9                            | main + 0x199d
  | main + 0x2ab9                            | main + 0x2db9
  | _tevent_loop_wait + 0x1c                 | _tevent_loop_wait + 0x1c
  | tevent_common_loop_wait + 0x2f           | tevent_common_loop_wait + 0x2f
  | _tevent_loop_once + 0xcf                 | _tevent_loop_once + 0xcf
  | tevent_context_same_loop + 0xdd2         | tevent_context_same_loop + 0xdd2
  | tevent_common_invoke_fd_handler + 0x8b   | tevent_common_invoke_fd_handler
+ 0x8b
  | main + 0x4670                            | main + 0x4925
  | smbd_process + 0x795                     | smbd_process + 0x79b
  | _tevent_loop_wait + 0x1c                 | _tevent_loop_wait + 0x1c
  | tevent_common_loop_wait + 0x2f           | tevent_common_loop_wait + 0x2f
  | _tevent_loop_once + 0xcf                 | _tevent_loop_once + 0xcf
  | tevent_context_same_loop + 0xdd2         | tevent_context_same_loop + 0xdd2
  | tevent_common_invoke_fd_handler + 0x8b   | tevent_common_invoke_fd_handler
+ 0x8b
  | smbd_smb2_process_negprot + 0x244a       | smbd_smb2_process_negprot +
0x237a
  | smbd_smb2_request_dispatch + 0xXXX       | smbd_smb2_request_dispatch +
0xXXX
   
----15times----------------FLOW-CASE-A-----------6times---------------------
   A smbd_smb2_request_process_create         (
smbd_smb2_request_process_create
   A smbd_smb2_request_process_create         (
smbd_smb2_request_process_create
   A filename_convert                         ( filename_convert
   A                                          ( filename_convert
   A unix_convert        2                    ( unix_convert
   A get_real_filename_full_scan              ( get_real_filename_full_scan
  ----10times----------------FLOW-CASE-B----------23times---------------------
 B smbd_smb2_request_process_query_directory(
smbd_smb2_request_process_query_directory
 B smbd_smb2_request_process_query_directory(
smbd_smb2_request_process_query_directory
 B smbd_dirptr_lanman2_entry                ( smbd_dirptr_lanman2_entry
 B smbd_dirptr_get_entry                    ( smbd_dirptr_get_entry
 B                                          ( can_delete_directory_fsp
   ----25times----------------COMMON FLOW----------29times---------------------
  | ReadDirName + 0x111                      | ReadDirName + 0x10b
  | vfs_readdirname + 0x56                   | vfs_readdirname + 0x53
  | <unknown symbol>                         | <unknown symbol>
  | _pthread_setschedparam + 0x969           | _pthread_setschedparam + 0x969
  | _pthread_sigmask + 0x5c9                 | _pthread_sigmask + 0x5c9
  | fault_setup + 0xae                       | fault_setup + 0xae
  | smb_panic + 0x1a                         | smb_panic + 0x1a
  | smb_panic_log + 0x6c                     | smb_panic_log + 0x6c
  | log_stack_trace + 0x43                   | log_stack_trace + 0x43
  -----------------------------------------------------------------------------

 - * - * -

To narrow down the factors, I added one more share
for debug to /usr/local/etc/smb4.conf.
This share is same as [maindish] except the
configuration "vfs object". (and "comment")
  +----------------------------
  |[m2-test]
  |    comment = test
  |    path    = /home
  |;   vfs objects = cap
  |    browseable = yes
  |    guest ok   = yes
  |    guest only = yes
  |    writeable  = yes
  |    create    mask = 0775
  |    directory mask = 0775
  +----------------------------

Now they are seen like below from Win7 client.
  +----------------------------
  |U:\> net use
  |--------------------------------------------------------
  |OK  I:  \\silver\maindish  Microsoft Wi...  <== 10.2R server (samba36)
  |OK  J:  \\oxygen\maindish  Microsoft Wi...  <== 14.0R (samba416, vfs
object=cap)
  |OK  K:  \\oxygen\m2-test   Microsoft Wi...  <== 14.0R (samba416, NO vfs
object)
  +----------------------------

See the result of the test below.
"x:\win\Hardwares\N3x50B" has: 622 dirs 3356 files 3.1GB in total.
(I chose an area that did not contain Japanese file names.)
 (Please note the difference in drive letter i:, j: or k:)

There, No error is reported when using drive k:.
So 'vfs objects = cap' has the key of the BUG I think.
I've gotten the obvious response and ...
an unfortunate fact that a error reported from drive i: !!!
"WAO NANTTE KOTTAI" (means "Oh my god !" or "what's the hell" in Japanese)
  +----------------------------
  |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B
  |diff: j:\win\Hardwares\N3x50B\someDir_22: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_333: Invalid argument
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B
  |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B
  |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument
**|diff: i:\win\Hardwares\N3x50B\someDir_22: Invalid argument
  |diff: j:\win\Hardwares\N3x50B\someDir_333\libmfxhw32.dll: No such file or
directory
  |diff: j:\win\Hardwares\N3x50B\someDir_55555\igdusc32.dll: No such file or
directory
  |
  |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B
  |U:\>
  +----------------------------

 - * - * -

I trusted my current home server silver(10.2R) because
it works well for about 8 years.
However, 74 smbd abortion logs are found in the /var/log/messages
and "Subject: silver daily security run output" mails to root.
But I have never felt that my files were lost.
  +----------------------
  |2016-03-21  03:01:25 (smbd) on signal 6
    :             :
**|2023-12-05  22:21:02 silver kernel: pid 90073 (smbd), uid 0: exited on
signal 6
  +----------------------

The configuration of samba36 is almost same as samba416 because
smb4.conf inherited smb.conf of smaba36, except mainly below:
  +---------------------- smb4.conf
  |    security     = user
  |    map to guest = Bad User
  +---------------------- smb.conf for samba36 on silver
  |    security = share
  +----------------------

Also, silver is running on same motherboard to oxygen.
 (except number of CPU core, silver is 2 and oxygen is 4)

 - * - * -

Aside from that, I have issued same test with
replacing charset configuration to:
  +----------------------------
  |    unix charset = ascii
  |    dos  charset = ascii
  +----------------------------
The result is same, smbd aborts only when "vfs objects = cap".

 - * - * -

CONCLUSION

I think...
Samba smbd with configuration 'vfs objects = cap' must have
a bug near vfs_readdirname() from version 3.xx potentially,
and it is exposed when some timing condition is match.


The cap function is necessary for for my home server.
Someone help me !

 - * - * -

From here, I would like to state my feelings, etc at last.

(a) Why "I have never felt that my files were lost." ?
  Because it seems that Explorer.exe retries so seriously against
  any errors in the copy operation both src/dst direction.
   (So this problem does not become so critical because of this.)
      (Actually, it may not do any real harm to me either....except)
      (I'll be surprised when I look at the /var/log/messages....)

(b) This PANIC abort occurs rather high frequency when:
  (b1) just after starting copy from Expoloer.exe
  (b2) issuing "Properties" from Explorer.exe
        (It seems that, Unlike in the case of copy, NO retries are)
        (made while "Properties", so it reports less capacities or )
        (number of files if the PANIC abort occurs while counting up)

  In both case, Explorer.exe scans only directories
  to gather infos without access to the file contents itself.
  High density of the repeated request to the directories
  is the trigger of this problem, I think.

  For example... like this...
  +----------------------------
  | initial_jobs();
  | while(1) {
  |   /* ready */
  |   wait_for_request();
  |   exec_the_request();
  |   answer_the_request();   /* because the client gets answer at here */
  |   post_jobs_for_ready();   /* next request may be arrived while here */
  | }                           /* before returning to ready... */
  +----------------------------

p.s.
 I noticed just NOW, that the depth of the directory hierarchy might have
 something to do with the problem, so I have also attached a file:
   list-of-err-files.txt
 which has a list of the actual full path names where the error occurred.

-- 
You are receiving this mail because:
You are the assignee for the bug.