From nobody Sun Mar 03 21:14:44 2024 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TnvjY5Yd4z5CKwF for ; Sun, 3 Mar 2024 21:14:57 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TnvjY1rCsz4DXs for ; Sun, 3 Mar 2024 21:14:57 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-29a74c88f74so2463539a91.3 for ; Sun, 03 Mar 2024 13:14:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709500496; x=1710105296; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nNvtF9/dyHz4G818mKjFJOLfH3CMn85UIbHGtp30V8Q=; b=MhxHfQ+5A+dn6sKQarOs+/GDkCKGPSQXo8hTKbdVqlaUre+6Ommn9GDzFUOfbMWE0t K1dtB1L0My4Ex1GQk0YdDburagWD+EvU0Wx44dFZgHEtw8DHxb1UEpq9AWsuQB7iUAMu TvMBelIggXNyyNB7EIJGSonfSy6St5bl76bDiZk9vM3HXlD85uLLFtrwoWUrIsfNkPS7 kJn/763hiHBChZyrvTQv9tY6ggNJ4LMGGXEhsh7dOPIwg5YXgf29bmYKm01jR1EbpwEW MJH8pjilhiQcm6LTSz/zXX87mbQ2MR5oWHxDsfN42chZawG/1icT6WenLHjddUAkXQLw cEUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709500496; x=1710105296; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nNvtF9/dyHz4G818mKjFJOLfH3CMn85UIbHGtp30V8Q=; b=Fo+tn5z+a4NjCxfecZdjsdCyFv+V4yphPLLAB7jcntE3yEolk1elOjyXsvgB394ASo sLCG2VXNMkqal+IG0TaD/2M1goyMnf2CdB0Nx9hpovpVzVpsNVA1MkADU3rBAyREIWAV 3D5GjWj/OrJ2BsUjZ7pFOPv4bxY9GnAPD1s5ZbqTa6f2/dUiPw2lfgbUoqhO8kFA4XEm Q+zJLWexOhYCd32FTYjjDdKg6CfcdEJe13herzGxZ1BVWbV8pIOIcLQcfOzoj4Wmzg2Q Ze0PS4XbUB6MqnzJoxNBVvyGOE7M0eKeQYBGckfHIOZdkCjaaF8RXVAofBWvdwq2ywX0 9nIA== X-Gm-Message-State: AOJu0Yzlpl6IUXI1A6asDv+fPNGyvFIstdMpTPNy1F5sO95IjnUThSel YCmKDphtQ+3ire9duD8SZl/sTLGki0xSgXayYxMhpfCgKG0tNCm2+6c+DD2tLAYpZgAuY3YN1uQ E8uXz2u+80F3/Nsc3Y8pkC+qRgSPEt0w= X-Google-Smtp-Source: AGHT+IEASfYYmIHwktibqZ31d9+ZqnKZcMfp8c3E0JMa/ols3Md65NAy2YarqyTYA3pzlqLukpjDCNg6+lxbsijjcg0= X-Received: by 2002:a17:90a:c397:b0:29b:178e:d9cb with SMTP id h23-20020a17090ac39700b0029b178ed9cbmr4981531pjt.44.1709500495651; Sun, 03 Mar 2024 13:14:55 -0800 (PST) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 References: <26078.50375.679881.64018@hergotha.csail.mit.edu> <26083.64612.717082.366639@hergotha.csail.mit.edu> <26084.2494.962383.278446@hergotha.csail.mit.edu> In-Reply-To: <26084.2494.962383.278446@hergotha.csail.mit.edu> From: Rick Macklem Date: Sun, 3 Mar 2024 13:14:44 -0800 Message-ID: Subject: Re: 13-stable NFS server hang To: Garrett Wollman Cc: stable@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4TnvjY1rCsz4DXs On Sat, Mar 2, 2024 at 9:25=E2=80=AFPM Garrett Wollman wrote: > > < > > I believe this explains why vn_copy_file_range sometimes takes much > > longer than a second: our servers often have lots of data waiting to > > be written to disk, and if the file being copied was recently modified > > (and so is dirty), this might take several seconds. I've set > > vfs.zfs.dmu_offset_next_sync=3D0 on the server that was hurting the mos= t > > and am watching to see if we have more freezes. > > In case anyone is wondering why this is an issue, it's the combination > of two factors: > > 1) vn_generic_copy_file_range() attempts to preserve holes in the > source file. Just fyi, when I was first doing the copy_file_range(2) syscall, the discus= sion seemed to think this was a reasonable thing to do. It is now not so obvious for file systems doing compression, such as ZFS. It happens that ZFS will no longer use vn_generic_copy_file_range() when block cloning is enabled and I have no idea what block cloning does w.r.t. preserving holes. For non-compression file systems, comparing va_size with va_bytes should serve as a reasonable hint w.r.t. the file being sparse. If the file is not sparse, vn_generic_copy_file_range() should not bother doing SEEK_DATA/SEEK_HOLE. (I had intended to do such a patch, but I cannot now remember if I did do s= o. I'll take a look.) Note that this patch would not affect ZFS, but could improve UFS performain= ce where vn_generic_copy_file_range() is used to do the copying. rick > > 2) ZFS does automatic hole-punching on write for filesystems where > compression is enabled. It happens in the same code path as > compression, checksum generation, and redundant-write suppression, and > thus does not happen until the dirty blocks are about to be committed > to disk. So if the file is dirty, ZFS doesn't "know" whether thare > where the then-extant holes are until a sync has completed. > > While vn_generic_copy_file_range() has a flag to stop and return > partial success after a second of copying, this flag does not affect > sleeps internal to the filesystem, so zfs_holey() can sleep > indefinitely and vn_generic_copy_file_range() can't do anything about > it until the sync has already happened. > > -GAWollman >