[Bug 260146] NFS client (13.0p5) to NFS server (12.2p6), hangs and unkillable processes

From: <bugzilla-noreply_at_freebsd.org>
Date: Tue, 30 Nov 2021 19:53:35 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=260146

            Bug ID: 260146
           Summary: NFS client (13.0p5) to NFS server (12.2p6), hangs and
                    unkillable processes
           Product: Base System
           Version: 13.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: pi@FreeBSD.org

NFS client (13.0p5) to NFS server (12.2p6), hangs and unkillable processes

When trying to pinpoint the processes involved, using fstat:

fstat proc hangs here, unkillable:
mi_switch+0xc1 sleeplk+0xec lockmgr_slock_hard+0x382 nfs_lock+0x2c
vop_sigdefer+0x2b vn_fill_kinfo_vnode+0xd5 export_vnode_to_sb+0x84
kern_proc_filedesc_out+0x1ee sysctl_kern_proc_filedesc+0x7d
sysctl_root_handler_locked+0x91 sysctl_root+0x24c userland_sysctl+0x173
sys___sysctl+0x5f amd64_syscall+0x10c fast_syscall_common+0xf8 

nfsdumpstate on both server and client does not show anything.

- tcpdump sample between server (12.2p6) and client (13.0p5), taken on server:
20:38:29.912755 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6),
length 176)
    <client>.1015 > <server>.2049: Flags [P.], cksum 0x95b3 (correct), seq
1984:2108, ack 1857, win 4353, options [nop,nop,TS val 1900843486 ecr
3042085228], length 124: NFS request xid 1920691674 120 getattr fh
Unknown/ED995FFBDE3515680A00080000000000B3310E000000000000000000
        0x0000:  4500 00b0 0000 4000 4006 0b5b d447 c32d  E.....@.@..[.G.-
        0x0010:  d447 c330 03f7 0801 61be 799f 625c c9d9  .G.0....a.y.b\..
        0x0020:  8018 1101 95b3 0000 0101 080a 714c 91de  ............qL..
        0x0030:  b552 896c 8000 0078 727b 6dda 0000 0000  .R.l...xr{m.....
        0x0040:  0000 0002 0001 86a3 0000 0003 0000 0001  ................
        0x0050:  0000 0001 0000 0030 61a6 30b7 0000 0018  .......0a.0.....
        0x0060:  xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx  xxxxxxxxxxxxxxxx
        0x0070:  xxxx xxxx xxxx xxxx 0000 03e8 0000 03e8  xxxxxxxx........
        0x0080:  0000 0001 0000 0000 0000 0000 0000 0000  ................
        0x0090:  0000 001c ed99 5ffb de35 1568 0a00 0800  ......_..5.h....
        0x00a0:  0000 0000 b331 0e00 0000 0000 0000 0000  .....1..........
20:38:29.912762 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6),
length 168)
    <server>.2049 > <client>.1015: Flags [P.], cksum 0xce36 (correct), seq
1857:1973, ack 2108, win 29128, options [nop,nop,TS val 3042085228 ecr
1900843486], length 116: NFS reply xid 1920691674 reply ok 112 getattr DIR 755
ids 0/0 sz 15
        0x0000:  4500 00a8 0000 4000 4006 0b63 d447 c330  E.....@.@..c.G.0
        0x0010:  d447 c32d 0801 03f7 625c c9d9 61be 7a1b  .G.-....b\..a.z.
        0x0020:  8018 71c8 ce36 0000 0101 080a b552 896c  ..q..6.......R.l
        0x0030:  714c 91de 8000 0070 727b 6dda 0000 0001  qL.....pr{m.....
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000 0000 0002 0000 01ed 0000 000f  ................
        0x0060:  0000 0000 0000 0000 0000 0000 0000 000f  ................
        0x0070:  0000 0000 0000 3000 0000 0000 0000 0008  ......0.........
        0x0080:  0000 0000 fb5f 99ed 0000 0000 0000 0008  ....._..........
        0x0090:  5878 e8b4 39f5 8258 5824 2e46 0000 0000  Xx..9..XX$.F....
        0x00a0:  5878 e8b4 39f5 8a28                      Xx..9..(

This condition has already survived a nfs client reboot, we'll reboot the nfs
server in approx. 12 hours. So if someone has ideas what we can try to catch
before we reboot the nfs server... ?

-- 
You are receiving this mail because:
You are the assignee for the bug.