[Bug 275597] Samba: smbd sometimes aborts by PANIC when 'vfs objects = cap'
Date: Thu, 07 Dec 2023 10:55:41 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275597 Bug ID: 275597 Summary: Samba: smbd sometimes aborts by PANIC when 'vfs objects = cap' Product: Ports & Packages Version: Latest Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: Individual Port(s) Assignee: ports-bugs@FreeBSD.org Reporter: uratan@miomio.jp Created attachment 246852 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=246852&action=edit dmesg output of evaluating 14.0R I started evaluating FreeBSD 14.0R(i386) for use as my home server. And I have noticed that Samba: smbd often aborts abnormally like below. +---------------------------- |Dec 4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.286037, 0] ../../lib/util/fault.c:172(smb_panic_log) |Dec 4 13:06:58 oxygen smbd[10655]: =============================================================== |Dec 4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307281, 0] ../../lib/util/fault.c:176(smb_panic_log) |Dec 4 13:06:58 oxygen smbd[10655]: INTERNAL ERROR: Signal 11: Segmentation fault in pid 10655 (4.16.11) |Dec 4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307391, 0] ../../lib/util/fault.c:181(smb_panic_log) |Dec 4 13:06:58 oxygen smbd[10655]: If you are running a recent Samba version, and if you think this problem is not yet fixed in the latest versions, please consider reporting this bug, see https://wiki.samba.org/index.php/Bug_Reporting |Dec 4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307505, 0] ../../lib/util/fault.c:182(smb_panic_log) |Dec 4 13:06:58 oxygen smbd[10655]: =============================================================== |Dec 4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.307601, 0] ../../lib/util/fault.c:184(smb_panic_log) |Dec 4 13:06:58 oxygen smbd[10655]: PANIC (pid 10655): Signal 11: Segmentation fault in 4.16.11 |Dec 4 13:06:58 oxygen smbd[10655]: [2023/12/04 13:06:58.370441, 0] ../../lib/util/fault.c:245(log_stack_trace) |Dec 4 13:06:58 oxygen smbd[10655]: BACKTRACE: |Dec 4 13:06:58 oxygen smbd[10655]: #0 log_stack_trace + 0x43 [ip=0x203e84d3] [sp=0xffbfd35c] |Dec 4 13:06:58 oxygen smbd[10655]: #1 smb_panic_log + 0x6c [ip=0x203e835c] [sp=0xffbfd96c] |Dec 4 13:06:58 oxygen smbd[10655]: #2 smb_panic + 0x1a [ip=0x203e86da] [sp=0xffbfd980] |Dec 4 13:06:58 oxygen smbd[10655]: #3 fault_setup + 0xae [ip=0x203e82ee] [sp=0xffbfd994] |Dec 4 13:06:58 oxygen smbd[10655]: #4 _pthread_sigmask + 0x5c9 [ip=0x21162db9] [sp=0xffbfda2c] |Dec 4 13:06:59 oxygen smbd[10655]: #5 _pthread_setschedparam + 0x969 [ip=0x21162289] [sp=0xffbfdd40] |Dec 4 13:06:59 oxygen smbd[10655]: #6 <unknown symbol> [ip=0xffbff004] [sp=0xffbfdd80] |Dec 4 13:06:59 oxygen smbd[10655]: #7 vfs_readdirname + 0x56 [ip=0x2019ea46] [sp=0xffbfe15c] |Dec 4 13:06:59 oxygen smbd[10655]: #8 ReadDirName + 0x111 [ip=0x201396f1] [sp=0xffbfe18c] |Dec 4 13:06:59 oxygen smbd[10655]: #9 smbd_dirptr_get_entry + 0x161 [ip=0x20137f81] [sp=0xffbfe1c8] |Dec 4 13:06:59 oxygen smbd[10655]: #10 smbd_dirptr_lanman2_entry + 0x151 [ip=0x2016f231] [sp=0xffbfe350] |Dec 4 13:06:59 oxygen smbd[10655]: #11 smbd_smb2_request_process_query_directory + 0xd9b [ip=0x201dbadb] [sp=0xffbfe480] |Dec 4 13:06:59 oxygen smbd[10655]: #12 smbd_smb2_request_process_query_directory + 0x884 [ip=0x201db5c4] [sp=0xffbfe578] |Dec 4 13:06:59 oxygen smbd[10655]: #13 smbd_smb2_request_dispatch + 0xce4 [ip=0x201c4834] [sp=0xffbfe604] |Dec 4 13:06:59 oxygen smbd[10655]: #14 smbd_smb2_process_negprot + 0x244a [ip=0x201c9cba] [sp=0xffbfe678] |Dec 4 13:07:00 oxygen smbd[10655]: #15 tevent_common_invoke_fd_handler + 0x8b [ip=0x2056b64b] [sp=0xffbfe6cc] |Dec 4 13:07:00 oxygen smbd[10655]: #16 tevent_context_same_loop + 0xdd2 [ip=0x2056e612] [sp=0xffbfe6f4] |Dec 4 13:07:00 oxygen smbd[10655]: #17 _tevent_loop_once + 0xcf [ip=0x2056a5af] [sp=0xffbfe740] |Dec 4 13:07:00 oxygen smbd[10655]: #18 tevent_common_loop_wait + 0x2f [ip=0x2056a7bf] [sp=0xffbfe768] |Dec 4 13:07:00 oxygen smbd[10655]: #19 _tevent_loop_wait + 0x1c [ip=0x2056a84c] [sp=0xffbfe784] |Dec 4 13:07:00 oxygen smbd[10655]: #20 smbd_process + 0x795 [ip=0x201b2205] [sp=0xffbfe798] |Dec 4 13:07:00 oxygen smbd[10655]: #21 main + 0x4670 [ip=0xe690] [sp=0xffbfe7fc] |Dec 4 13:07:00 oxygen smbd[10655]: #22 tevent_common_invoke_fd_handler + 0x8b [ip=0x2056b64b] [sp=0xffbfe8b4] |Dec 4 13:07:00 oxygen smbd[10655]: #23 tevent_context_same_loop + 0xdd2 [ip=0x2056e612] [sp=0xffbfe8dc] |Dec 4 13:07:00 oxygen smbd[10655]: #24 _tevent_loop_once + 0xcf [ip=0x2056a5af] [sp=0xffbfe924] |Dec 4 13:07:01 oxygen smbd[10655]: #25 tevent_common_loop_wait + 0x2f [ip=0x2056a7bf] [sp=0xffbfe94c] |Dec 4 13:07:01 oxygen smbd[10655]: #26 _tevent_loop_wait + 0x1c [ip=0x2056a84c] [sp=0xffbfe968] |Dec 4 13:07:01 oxygen smbd[10655]: #27 main + 0x2ab9 [ip=0xcad9] [sp=0xffbfe97c] |Dec 4 13:07:01 oxygen smbd[10655]: #28 main + 0x15e9 [ip=0xb609] [sp=0xffbfe9a0] |Dec 4 13:07:01 oxygen smbd[10655]: #29 __libc_start1 + 0x155 [ip=0x205e38c5] [sp=0xffbfebcc] |Dec 4 13:07:01 oxygen smbd[10655]: #30 _start + 0x36 [ip=0x9d06] [sp=0xffbfebf0] |Dec 4 13:07:01 oxygen smbd[10655]: [2023/12/04 13:07:01.547212, 0] ../../source3/lib/dumpcore.c:310(dump_core) |Dec 4 13:07:01 oxygen smbd[10655]: unable to change to %N.core |Dec 4 13:07:01 oxygen smbd[10655]: refusing to dump core +---------------------------- The various environments and configurations are as follows: (hostname is 'oxygen' and using i386 architecture) +---------------------------- |% uname -a |FreeBSD oxygen 14.0-RELEASE FreeBSD 14.0-RELEASE #3: Mon Nov 27 22:32:52 JST 2023 uratan@oxygen:/usr/src/sys/i386/compile/OXYGEN i386 | |% pkg info | grep samba |samba416-4.16.11 Free SMB/CIFS and AD/DC server and client for Unix +---------------------------- (essence of) /usr/local/etc/smb4.conf +---------------------------- |[global] | workgroup = METXXXXX | server string = Samba Server | local master = no | security = user | map to guest = Bad User | load printers = yes | printing = bsd | printcap cache time = 0 | unix charset = cp932 | dos charset = cp932 | server role = standalone server | hosts allow = 10. | guest account = nobody | log file = /var/log/samba4/log.%m | max log size = 50 | dns proxy = no | |[printers] | comment = All Printers | path = /var/spool/samba4 | browsable = no | guest ok = no | writeable = no | printable = yes | |[maindish] | comment = exported for PC | path = /home | vfs objects = cap | browseable = yes | guest ok = yes | guest only = yes | writeable = yes | create mask = 0775 | directory mask = 0775 +---------------------------- (essence of) /etc/rc.conf +---------------------------- |samba_server_enable="YES" |nmbd_enable="NO" |smbd_enable="YES" |winbindd_enable="NO" +---------------------------- Confirm attached file: dmesg-today.txt for misc hardware environments. note: re0 driver is replaced by RealTek's to avoid 'watchdog timeout'. (rtl_bsd_drv_v199.04.tgz from https://www.realtek.com/ja/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software ) It should have nothing to do with this Samba problem because, when using GENERIC kernel at first phase, samba PANIC and re0: watchdog timeout were both seen in kernel messages. I suspected watchdog timeout was the core cause at first, but it wasn't. (maybe) I am using windows7 (also x86/32bit version) as the test client, and using win32 native executable diff.exe from UnxUtils in the subsequent reports. (see https://unxutils.sourceforge.net/ for UnxUtils) +---------------------------- |U:\> ver |Microsoft Windows [Version 6.1.7601] | |U:\> diff.exe --version |diff - GNU diffutils version 2.7 | |U:\> net use |-------------------------------------------------------- |OK I: \\silver\maindish Microsoft Wi... <== 10.2R server (samba36) |OK J: \\oxygen\maindish Microsoft Wi... <== 14.0R (samba416) | +---------------------------- hostname 'silver' is my current home server, running with FreeBSD 10.2R (i386). +---------------------------- |% uname -a |FreeBSD silver 10.2-RELEASE FreeBSD 10.2-RELEASE #1: Sun Mar 12 09:07:49 JST 2017 uratan@silver:/usr/src/sys/i386/compile/OXYGEN i386 | |% pkg info | grep samba |samba36-3.6.25_1 Free SMB and CIFS client and server for Unix +---------------------------- (10.2R ? Win7 ? i386 ? The reason why various things are all outdated) (is because I am an old man also outdated who strongly likes stability.) - * - * - Show the error situation at first. (Please note the difference in drive letter i: or j:) "x:\win\Hardwares" has: 1454 dirs 9644 files 144GB in total. These errors reported are caused by smbd PANIC abort. The error files/dirs are different for each run. There is NO error that says the file contents are different. +---------------------------- |U:\> diff -rq i:\win\Hardwares j:\win\Hardwares |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument |diff: j:\win\Hardwares\N3x50B\someDir_4444\igdlh.cat: No such file or directory |diff: j:\win\Hardwares\T100Chi\someDir_777: Invalid argument | |U:\> diff -rq j:\win\Hardwares i:\win\Hardwares |diff: j:\win\Hardwares\ATerm\someDir_88: Invalid argument |diff: j:\win\Hardwares\N3x50B\someDir_4444\igdlh.cat: No such file or directory |diff: j:\win\Hardwares\ScanSnap\someDir_6666: Invalid argument |diff: j:\win\Hardwares\T100Chi\someDir_777: Invalid argument +---------------------------- - * - * - I also tried another version of samba: samba413-4.13.17_6.pkg but the situation was the same, smbd aborts, too. I analyzed the PANIC abort logs left in the /var/log/messages from beginning of the evaluation to today. There are 54 logs in total by both version of samba. See the summary of BACKTRACEs below, the intermediate flow is slightly different, but the flow to PANIC abort seems to be almost same. The <unknown symbol> is very suspicious and, I think, the failure of ReadDirName() or vfs_readdirname() are not so far, I think, from the errors reported by diff.exe: "Invalid argument" or "No such file or directory". ----------------------------------------------------------------------------- | 25 times by samba416 | 29 times by samba413 ----25times----------------COMMON FLOW----------29times--------------------- | _start + 0x36 | _start + 0x36 | __libc_start1 + 0x155 | __libc_start1 + 0x155 | main + 0x15e9 | main + 0x199d | main + 0x2ab9 | main + 0x2db9 | _tevent_loop_wait + 0x1c | _tevent_loop_wait + 0x1c | tevent_common_loop_wait + 0x2f | tevent_common_loop_wait + 0x2f | _tevent_loop_once + 0xcf | _tevent_loop_once + 0xcf | tevent_context_same_loop + 0xdd2 | tevent_context_same_loop + 0xdd2 | tevent_common_invoke_fd_handler + 0x8b | tevent_common_invoke_fd_handler + 0x8b | main + 0x4670 | main + 0x4925 | smbd_process + 0x795 | smbd_process + 0x79b | _tevent_loop_wait + 0x1c | _tevent_loop_wait + 0x1c | tevent_common_loop_wait + 0x2f | tevent_common_loop_wait + 0x2f | _tevent_loop_once + 0xcf | _tevent_loop_once + 0xcf | tevent_context_same_loop + 0xdd2 | tevent_context_same_loop + 0xdd2 | tevent_common_invoke_fd_handler + 0x8b | tevent_common_invoke_fd_handler + 0x8b | smbd_smb2_process_negprot + 0x244a | smbd_smb2_process_negprot + 0x237a | smbd_smb2_request_dispatch + 0xXXX | smbd_smb2_request_dispatch + 0xXXX ----15times----------------FLOW-CASE-A-----------6times--------------------- A smbd_smb2_request_process_create ( smbd_smb2_request_process_create A smbd_smb2_request_process_create ( smbd_smb2_request_process_create A filename_convert ( filename_convert A ( filename_convert A unix_convert 2 ( unix_convert A get_real_filename_full_scan ( get_real_filename_full_scan ----10times----------------FLOW-CASE-B----------23times--------------------- B smbd_smb2_request_process_query_directory( smbd_smb2_request_process_query_directory B smbd_smb2_request_process_query_directory( smbd_smb2_request_process_query_directory B smbd_dirptr_lanman2_entry ( smbd_dirptr_lanman2_entry B smbd_dirptr_get_entry ( smbd_dirptr_get_entry B ( can_delete_directory_fsp ----25times----------------COMMON FLOW----------29times--------------------- | ReadDirName + 0x111 | ReadDirName + 0x10b | vfs_readdirname + 0x56 | vfs_readdirname + 0x53 | <unknown symbol> | <unknown symbol> | _pthread_setschedparam + 0x969 | _pthread_setschedparam + 0x969 | _pthread_sigmask + 0x5c9 | _pthread_sigmask + 0x5c9 | fault_setup + 0xae | fault_setup + 0xae | smb_panic + 0x1a | smb_panic + 0x1a | smb_panic_log + 0x6c | smb_panic_log + 0x6c | log_stack_trace + 0x43 | log_stack_trace + 0x43 ----------------------------------------------------------------------------- - * - * - To narrow down the factors, I added one more share for debug to /usr/local/etc/smb4.conf. This share is same as [maindish] except the configuration "vfs object". (and "comment") +---------------------------- |[m2-test] | comment = test | path = /home |; vfs objects = cap | browseable = yes | guest ok = yes | guest only = yes | writeable = yes | create mask = 0775 | directory mask = 0775 +---------------------------- Now they are seen like below from Win7 client. +---------------------------- |U:\> net use |-------------------------------------------------------- |OK I: \\silver\maindish Microsoft Wi... <== 10.2R server (samba36) |OK J: \\oxygen\maindish Microsoft Wi... <== 14.0R (samba416, vfs object=cap) |OK K: \\oxygen\m2-test Microsoft Wi... <== 14.0R (samba416, NO vfs object) +---------------------------- See the result of the test below. "x:\win\Hardwares\N3x50B" has: 622 dirs 3356 files 3.1GB in total. (I chose an area that did not contain Japanese file names.) (Please note the difference in drive letter i:, j: or k:) There, No error is reported when using drive k:. So 'vfs objects = cap' has the key of the BUG I think. I've gotten the obvious response and ... an unfortunate fact that a error reported from drive i: !!! "WAO NANTTE KOTTAI" (means "Oh my god !" or "what's the hell" in Japanese) +---------------------------- |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B |diff: j:\win\Hardwares\N3x50B\someDir_22: Invalid argument |diff: j:\win\Hardwares\N3x50B\someDir_333: Invalid argument | |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument | |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B | |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B | |U:\> diff -rq i:\win\Hardwares\N3x50B j:\win\Hardwares\N3x50B |diff: j:\win\Hardwares\N3x50B\someDir_1: Invalid argument **|diff: i:\win\Hardwares\N3x50B\someDir_22: Invalid argument |diff: j:\win\Hardwares\N3x50B\someDir_333\libmfxhw32.dll: No such file or directory |diff: j:\win\Hardwares\N3x50B\someDir_55555\igdusc32.dll: No such file or directory | |U:\> diff -rq i:\win\Hardwares\N3x50B k:\win\Hardwares\N3x50B |U:\> +---------------------------- - * - * - I trusted my current home server silver(10.2R) because it works well for about 8 years. However, 74 smbd abortion logs are found in the /var/log/messages and "Subject: silver daily security run output" mails to root. But I have never felt that my files were lost. +---------------------- |2016-03-21 03:01:25 (smbd) on signal 6 : : **|2023-12-05 22:21:02 silver kernel: pid 90073 (smbd), uid 0: exited on signal 6 +---------------------- The configuration of samba36 is almost same as samba416 because smb4.conf inherited smb.conf of smaba36, except mainly below: +---------------------- smb4.conf | security = user | map to guest = Bad User +---------------------- smb.conf for samba36 on silver | security = share +---------------------- Also, silver is running on same motherboard to oxygen. (except number of CPU core, silver is 2 and oxygen is 4) - * - * - Aside from that, I have issued same test with replacing charset configuration to: +---------------------------- | unix charset = ascii | dos charset = ascii +---------------------------- The result is same, smbd aborts only when "vfs objects = cap". - * - * - CONCLUSION I think... Samba smbd with configuration 'vfs objects = cap' must have a bug near vfs_readdirname() from version 3.xx potentially, and it is exposed when some timing condition is match. The cap function is necessary for for my home server. Someone help me ! - * - * - From here, I would like to state my feelings, etc at last. (a) Why "I have never felt that my files were lost." ? Because it seems that Explorer.exe retries so seriously against any errors in the copy operation both src/dst direction. (So this problem does not become so critical because of this.) (Actually, it may not do any real harm to me either....except) (I'll be surprised when I look at the /var/log/messages....) (b) This PANIC abort occurs rather high frequency when: (b1) just after starting copy from Expoloer.exe (b2) issuing "Properties" from Explorer.exe (It seems that, Unlike in the case of copy, NO retries are) (made while "Properties", so it reports less capacities or ) (number of files if the PANIC abort occurs while counting up) In both case, Explorer.exe scans only directories to gather infos without access to the file contents itself. High density of the repeated request to the directories is the trigger of this problem, I think. For example... like this... +---------------------------- | initial_jobs(); | while(1) { | /* ready */ | wait_for_request(); | exec_the_request(); | answer_the_request(); /* because the client gets answer at here */ | post_jobs_for_ready(); /* next request may be arrived while here */ | } /* before returning to ready... */ +---------------------------- p.s. I noticed just NOW, that the depth of the directory hierarchy might have something to do with the problem, so I have also attached a file: list-of-err-files.txt which has a list of the actual full path names where the error occurred. -- You are receiving this mail because: You are the assignee for the bug.