From nobody Fri Sep 27 15:55:36 2024 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XFZnN64vNz5Y8JX for ; Fri, 27 Sep 2024 15:55:52 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XFZnM5dT9z43P7 for ; Fri, 27 Sep 2024 15:55:51 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=ds0QsCcV; spf=pass (mx1.freebsd.org: domain of jdavidlists@gmail.com designates 2607:f8b0:4864:20::530 as permitted sender) smtp.mailfrom=jdavidlists@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-7db299608e7so1586404a12.1 for ; Fri, 27 Sep 2024 08:55:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727452550; x=1728057350; darn=freebsd.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=9B+ZWVKiaYlKqoYEyclOGHhqFL4tC8nEcbmyp5HgQ8k=; b=ds0QsCcVhc8/9wG7TWCDrbocD7ne7WCrOj59DAe8J/zlz0hwPTaPCldrNad/d1GiXm +nECkrtQDOrtfkOLrX+Y7yYFVm0x805aQEM7ubJS+70qnmhyGoF8uJEwntWrr4Yqhb8S tmEwjH+hwMUzPmQBnpEs52aq34CsI8QelSwYpYjmThhYXIgNmjbqTMyXUb7rmp2kkYNN X/PUnFxxRaoViZeD9/Sb/cgx+Aq71IW3Yolv3O5B/0Lo209dYcQyxfSoWfoMStsjF2EX IqwLRmsrAoazaa6jwbvtjOT974pP7x2fghlRBdM9CxcLwvP/b7kTuC/q8LWVgOoA0G0X 4UHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727452550; x=1728057350; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9B+ZWVKiaYlKqoYEyclOGHhqFL4tC8nEcbmyp5HgQ8k=; b=doXLDTyIwpT1YymOyxZlt6gIZdpvhb1hzx4x3KeA4gY7pFshvcXh0i6NIZVfI1V+Jm aTpfRjTjod2L0xSjHnRyerkm1pi7GAvDb4w1KPEEF8hVe4CYIK/gO40PqTV/Smi+mdhQ 9D3TDomLs9+LfmI6oxCICOkQhcYG0na5flLDZ+R70v+hwBaJMP66Lqb5tFXf1SF9KFoA 0A6xRexm6ekXUBrIw4auS8mAwTIy9LozYS3YdpLsV9cYJHpy84mrl340J7Uxw7Hcrn9a IJ+jvoi/a1cOMP3ttZr39pmT8/f1B39gXrvZxCanhVZ9l1dKnz2Kivtvxh1tqejv3DxD AUVg== X-Gm-Message-State: AOJu0YzytT5VXw63MSrkmfA7kOfPR/eHEq0mVpVm4hL/nl3wQ9N5hJLY aXgx8y/3SxL4z2B9YlrA1Im+UOlzA9dI67JqqFFKIB773DHruOa4Ge13ibCyuxYo65v3IpYevUB E+DNBPwbdRM2rp6KkSCH01ancUiCEgSvSovc= X-Google-Smtp-Source: AGHT+IHCcopq1TBjh+D7uPgv4xMfFJG9cQE3plKC4LqDXry9pxM3LRJYS73WTDebEUG4typuxlNjIi44XQPn/csWUPg= X-Received: by 2002:a05:6a21:392:b0:1cf:4c70:f26f with SMTP id adf61e73a8af0-1d4fa6999cdmr4978258637.17.1727452549987; Fri, 27 Sep 2024 08:55:49 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@FreeBSD.org MIME-Version: 1.0 From: J David Date: Fri, 27 Sep 2024 11:55:36 -0400 Message-ID: Subject: kernel: nfsrv_cache_session: no session IPaddr=10.0.0.8, check NFS clients for unique /etc/hostid's To: FreeBSD FS Content-Type: text/plain; charset="UTF-8" X-Spamd-Result: default: False [-3.75 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.75)[-0.750]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; MID_RHS_MATCH_FROMTLD(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; TAGGED_FROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MISSING_XM_UA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::530:from] X-Rspamd-Queue-Id: 4XFZnM5dT9z43P7 X-Spamd-Bar: --- (Posting this separately because, due to timing and conditions, I'm reasonably sure it's unrelated to the other issue.) While recovering from the problems earlier today, this was dominating the syslog on the NFS fileserver. Sep 27 09:02:07 fs kernel: nfsrv_cache_session: no session IPaddr=10.0.0.8, check NFS clients for unique /etc/hostid's Sep 27 09:02:38 fs syslogd: last message repeated 31 times Sep 27 09:04:39 fs syslogd: last message repeated 121 times Sep 27 09:14:40 fs syslogd: last message repeated 599 times Sep 27 09:24:41 fs syslogd: last message repeated 599 times Sep 27 09:34:43 fs syslogd: last message repeated 600 times Sep 27 09:44:44 fs syslogd: last message repeated 600 times Sep 27 09:54:45 fs syslogd: last message repeated 600 times Sep 27 10:02:05 fs syslogd: last message repeated 439 times That started during the incident. It looks like it started right about the time I rebooted 10.0.0.8 a second time (to switch it back to "nullfs mode"), with the server logging "last message repeated 600 times" every ten minutes. (I.e., once per second) On the client side, it's spewing this with equal frequency: Sep 27 14:50:01 worker8 kernel: Initiate recovery. If server has not rebooted, check NFS clients for unique /etc/hostid's It's just that one client machine out of 28. It happens regardless of whether jobs are run via nullfs or NFS. And I can absolutely guarantee that the /etc/hostid files are unique: $ cluster -p -c job_runners uname -n | wc -l 28 $ cluster -p -c job_runners cat /etc/hostid | sort -u | wc -l 28 $ cluster -p -c job_runners sysctl kern.hostid | sort -u | wc -l 28 This continued happening every second, even hours after the incident. Everything else appeared to be running normally. I spared that machine out of the cluster, waited for it to quiesce, and then manually unmounted its NFS mount to the server. Even so, these messages continued to generate on both client and server. Finally, I halted the client machine. It kept at it all the way down: Uptime: 3h56m13s Initiate recovery. If server has not rebooted, check NFS clients for unique /etc/hostid's Initiate recovery. If server has not rebooted, check NFS clients for unique /etc/hostid's Initiate recovery. If server has not rebooted, check NFS clients for unique /etc/hostid's Initiate recovery. If server has not rebooted, check NFS clients for unique /etc/hostid's Initiate recovery. If server has not rebooted, check NFS clients for unique /etc/hostid's uhub0: detached acpi0: Powering system off The messages stopped on the server after that, and did not reoccur once I restarted it and returned it to service. I don't know what's up with that, but it seems strange. Possibly something related to rebooting twice (~30 min apart) during a situation where not everything was working properly put NFS on that client machine into an unhappy state?