From nobody Fri Jan 20 14:26:39 2023 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Nz2B81LcWz2shVw for ; Fri, 20 Jan 2023 14:36:32 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Nz2B656Lsz3vL3 for ; Fri, 20 Jan 2023 14:36:30 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of pmc@citylink.dinoex.sub.org designates 2a0b:f840::12 as permitted sender) smtp.mailfrom=pmc@citylink.dinoex.sub.org; dmarc=none Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.17.1/8.17.1) with ESMTPS id 30KEa5Yn015293 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 20 Jan 2023 15:36:06 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from uucp@localhost) by uucp.dinoex.org (8.17.1/8.17.1/Submit) with UUCP id 30KEa5wP015289; Fri, 20 Jan 2023 15:36:05 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (disp-e.intra.daemon.contact [IPv6:fd00:0:0:0:0:0:0:112]) by admn.intra.daemon.contact (8.16.1/8.16.1) with ESMTPS id 30KESrf4039800 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK); Fri, 20 Jan 2023 15:28:54 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (localhost [127.0.0.1]) by disp.intra.daemon.contact (8.16.1/8.16.1) with ESMTPS id 30KEQdLb037148 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Fri, 20 Jan 2023 15:26:40 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from pmc@localhost) by disp.intra.daemon.contact (8.16.1/8.16.1/Submit) id 30KEQdfO037147; Fri, 20 Jan 2023 15:26:39 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: disp.intra.daemon.contact: pmc set sender to pmc@citylink.dinoex.sub.org using -f Date: Fri, 20 Jan 2023 15:26:39 +0100 From: Peter To: Mateusz Guzik Cc: freebsd-stable@freebsd.org Subject: Re: close() taking minutes (ZFS related) Message-ID: References: List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Fri, 20 Jan 2023 15:36:08 +0100 (CET) X-Spamd-Result: default: False [-3.29 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.994]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_TO(0.00)[gmail.com]; MLMMJ_DEST(0.00)[freebsd-stable@freebsd.org]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; DMARC_NA(0.00)[sub.org]; ASN(0.00)[asn:205376, ipnet:2a0b:f840::/32, country:DE]; RCVD_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; HAS_XAW(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[] X-Rspamd-Queue-Id: 4Nz2B656Lsz3vL3 X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N On Fri, Jan 20, 2023 at 12:05:20PM +0100, Mateusz Guzik wrote: ! On 1/20/23, Peter wrote: ! > ! > Does this look familiar with anybody? ("truss -dD ls -l something") ! > ! > 0.047760369 0.000017548 open("/etc/nsswitch.conf",O_RDONLY|O_CLOEXEC,0666) = ! > 4 (0x4) ! > 0.047845099 0.000011290 fstat(4,{ mode=-rw-r--r-- ! > ,inode=13886,size=272,blksize=4096 }) = 0 (0x0) ! > 0.047932166 0.000034182 read(4,"#\n# nsswitch.conf(5) - name ser"...,4096) = ! > 272 (0x110) ! > 0.048007308 0.000010661 read(4,0x801848000,4096) = 0 (0x0) ! > 45.827173802 45.779056767 close(4) = 0 (0x0) ! > ^^^^^^^^^^^^ ! > ... ! > 45.866272643 0.000020608 open("/etc/localtime",O_RDONLY,077) = 6 (0x6) ! > 45.866398048 0.000017731 fstat(6,{ mode=-r--r--r-- ! > ,inode=41572,size=2309,blksize=4096 }) = 0 (0x0) ! > 45.866505596 0.000054084 read(6,"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0"...,41448) ! > = 2309 (0x905) ! > 49.511709215 3.645026276 close(6) = 0 (0x0) ! > ^^^^^^^^^^^ ! > ! > (The longest I've seen was more than 5 minutes) ! > ! > There is otherwise nothing wrong with the machine, only I am running ! > a postgres table merge for some 10 mio. rows. And only during that the ! > effect happens, reproducibly. ! > ! ! This is most likely stalls happening on atime update. I agree, that is the most plausible location. ! Most likely the following will help: ! sysctl vfs.zfs.per_txg_dirty_frees_percent=30 ! ! see https://github.com/openzfs/zfs/issues/13932 for more information Thank You, but sadly that doesn't change it. I did now revert some file-locking and async in postgres to defaults. And I run truss from UFS, because otherwise it would already hang: root ~# truss -dD /j/admn/bin/ls -l /j/admn/etc/ And so I get things like this: ... 0.010328158 0.000009112 fstatat(AT_FDCWD,"newsyslog.conf",{ mode=-rw-r--r-- ,inode=13822,size=1708,blksize=4096 },AT_SYMLINK_NOFOLLOW) = 0 (0x0) 0.010367043 0.000008568 fstatat(AT_FDCWD,"mail",{ mode=drwxr-xr-x ,inode=39,size=30,blksize=4096 },AT_SYMLINK_NOFOLLOW) = 0 (0x0) 0.010405598 0.000008903 fstatat(AT_FDCWD,"apmd.conf",{ mode=-rw-r--r-- ,inode=44969,size=1240,blksize=4096 },AT_SYMLINK_NOFOLLOW) = 0 (0x0) 16.746787754 16.736352005 fstatat(AT_FDCWD,"libmap.conf",{ mode=-rw-r--r-- ,inode=13980,size=47,blksize=4096 },AT_SYMLINK_NOFOLLOW) = 0 (0x0) 16.746863876 0.000012441 fstatat(AT_FDCWD,"rc.conf.d",{ mode=drwxr-xr-x ,inode=51,size=2,blksize=4096 },AT_SYMLINK_NOFOLLOW) = 0 (0x0) 16.746908593 0.000010568 fstatat(AT_FDCWD,"autofs",{ mode=drwxr-xr-x ,inode=31,size=9,blksize=4096 },AT_SYMLINK_NOFOLLOW) = 0 (0x0) 16.746951173 0.000010509 fstatat(AT_FDCWD,"rc.initdiskless",{ mode=-rw-r--r-- ,inode=14262,size=13680,blksize=13824 },AT_SYMLINK_NOFOLLOW) = 0 (0x0) ... Is there an atime involved? (I tried to switch them of, but it seems that requires a remount to become effective) and 'ps' gives me the usual "zfs is busy waiting": UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 90738 90737 8 23 0 13396 3156 tx->tx_s DX+ 4 0:00.00 /j/admn/bin/ls -l /j/admn/etc/ # sysctl vfs.zfs.txg.timeout=1 vfs.zfs.txg.timeout: 15 -> 1 Doesn't change things noticeably. Do You have an idea where to look further? (dtrace should be working, but I don't know where to look)