From nobody Tue Feb 20 22:17:34 2024 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TfYgd62Frz59lkq for ; Tue, 20 Feb 2024 22:17:49 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TfYgd4PjYz3xWl for ; Tue, 20 Feb 2024 22:17:49 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1dc29f1956cso2964015ad.0 for ; Tue, 20 Feb 2024 14:17:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708467468; x=1709072268; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ec47/2YetVDnU1mIdi3rof5Zz8+HMxOjDU+jeMnOW8U=; b=CjEu0FNP5HdVwNyuy4FruZmwxWiuiFzV9KFVhES0Fj7CRGNgbVeji3wy4ARLIMP6Zk /YVfdAAY37/eYrJ9ZoDfgvmG3TaKxQ1o27ERdpQEN4k6VM6CEThDfBoUOnfbqcQMzO7P 2Ti6HV5Y7TWSEDoip/dkydxk4WLbZT50juM3B75T2p42KMlXn2eUrW8AOJpX+eSowg1p uv9pagbPAOTftosP2wh2OpKej+KY/7FmMNjxtPqFza65Hl80EQGodvcrh0fYibrx+wQQ 5cZ2jAFi9UaKpsND+FqJhFtP1xdoYKxBlsQxh8oOuOMOo7SMVtduVyYYE/WC9bcdfn+d 324w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708467468; x=1709072268; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ec47/2YetVDnU1mIdi3rof5Zz8+HMxOjDU+jeMnOW8U=; b=MGsZwhcyNPVy+bDrWz4iXYf0yEwQDfGZfLmJZi+hzcAc0g1KY4/zWOLax6TR2NGWnR cUPrihp6PvEcO+LqsL1WsdlXExIIv+zx4JzFWJALO5w9IxirxNdbnkQzVMGP8cwKnsjg A+EAQhYH34TEdSx2jsZXfQvj52kV2ueVCHZT8O2/82Vjy3jJ5Eds2QXlBZEuGCZzdyeL 1Fh/CVWV+OUHEP4Wt4LLqfnP7qsjUY5mxjThu3Jm3ypb62LkUXD30ElakDRPeiCbVpLk Wwd9gcjGwOmXoQSXeZE0aFvGmBAScWEcuD4S1VqLxcDpfac86dlKgV5CWovpx4dJIoTj 29Fg== X-Gm-Message-State: AOJu0YxCRPkUKgeWwzGof/Gmz3bG9owLIz+Uyby1wUiDRKytUxL2HRNG /JM3Ec6tSuns+dAPgrOl0RsiMIBon4nLAHd75/d+nHNxWQofWB9LZZ4+LYagKJ0mwDthSV+ZPjE mj8/QGw53M5zj6WkRiMqG1Ywz41x7FfE= X-Google-Smtp-Source: AGHT+IHpTMR45ckHeb+alVKnkjiG0WRne+3Ds5ZpgAOfbmiXnrzAv8UbKhci3pb/4wQREhcZ7NY7H7Hhs+HFSbo1NxE= X-Received: by 2002:a17:90a:f193:b0:299:1aa6:f20b with SMTP id bv19-20020a17090af19300b002991aa6f20bmr12470707pjb.21.1708467468038; Tue, 20 Feb 2024 14:17:48 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <3ea6d241-b9cc-4294-aef8-ae1c6d9d8161@dartmouth.edu> <53139ffd-3e42-4aaf-a523-b8f4dc8b29a9@dartmouth.edu> <04d1f2e1-021e-42fb-9732-94fa98fd05fc@dartmouth.edu> <4c71109f-52fe-4dc7-ad58-e10ba7ef5668@dartmouth.edu> In-Reply-To: <4c71109f-52fe-4dc7-ad58-e10ba7ef5668@dartmouth.edu> From: Rick Macklem Date: Tue, 20 Feb 2024 14:17:34 -0800 Message-ID: Subject: Re: FreeBSD panics possibly caused by nfs clients To: "Matthew L. Dailey" Cc: "freebsd-current@freebsd.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ---- X-Rspamd-Queue-Id: 4TfYgd4PjYz3xWl X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] On Tue, Feb 20, 2024 at 11:21=E2=80=AFAM Matthew L. Dailey wrote: > > Hi all, > > I induced a panic on my CURRENT (20240215-d79b6b8ec267-268300) VM after > about 24 hours. This is the one without any debugging, so it only > confirms the fact that the panics we've been experiencing still exist in > CURRENT. There was some disk issue that prevented the dump, so all I > have is the panic, pasted below. > > The two test systems with full debugging are still running after a week > and a half. > > > You might want to set > > kern.kstack_pages=3D6 > > in /boot/loader.conf in these setups. > > > > I would normally expect double faults when a kernel stack is blown, > > but maybe there is a reason that you do now see that for a blown kernel > > stack. (The impact of increasing stack pages from 4->6 should be minima= l.) > > > > rick > Rick - I'm a little confused by the kstack_pages tunable and just want > to clarify. Are you proposing that this might solve the panic issues > we've been having, or that it will make the panics/dumps more useful by > avoiding false positives? Well, blowing the kernel stack would certainly corrupt variables. I'll admit I would normally expect to see a 'double fault", but there may be some reason that is not happening in your case? Note that your kernels with debugging have not crashed yet after increasing the kernel stack size, so Ifigured it is worth a try. (ie. It might solve the panics?) When you talked about random panics, I thought of a blown kernel stack, but shelved the idea since you weren't reporting double faults. (In the past, I have needed to move things off the stack after a patch causes "random" problems, to fix the problem.) > We've only ever seen that "double fault" once > in over 100 observed panics, and that was only when we enabled just > KASAN on a 14.0p4 system. I'm not a VM guy, so I can't answer why a kernel stack violation normally (always?) results in a double fault. rick > > -Matt > > > [85751] Fatal trap 12: page fault while in kernel mode > [85751] cpuid =3D 3; apic id =3D 06 > [85751] fault virtual address =3D 0x4f0f760 > [85751] fault code =3D supervisor read data, page not present > [85751] instruction pointer =3D 0x20:0xffffffff820022f7 > [85751] stack pointer =3D 0x28:0xfffffe010bdf8d50 > [85751] frame pointer =3D 0x28:0xfffffe010bdf8d80 > [85751] code segment =3D base 0x0, limit 0xfffff, type 0x1b > [85751] =3D DPL 0, pres 1, long 1, def32 0, gran 1 > [85751] processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > [85751] current process =3D 0 (z_wr_int_h_3) > [85751] rdi: fffff802d1036900 rsi: fffff80416887300 rdx: fffff80416887380 > [85751] rcx: fffff802d1036908 r8: 0000000000000100 r9: 8013070f000700ff > [85751] rax: 0000000004f0f748 rbx: fffff802d1036900 rbp: fffffe010bdf8d80 > [85751] r10: fffff80412c4f708 r11: 0000000000000000 r12: fffff8000944ed58 > [85751] r13: 0000000000000000 r14: 0000000004f0f748 r15: fffffe010caa9438 > [85751] trap number =3D 12 > [85751] panic: page fault > [85751] cpuid =3D 3 > [85751] time =3D 1708451091 > [85751] KDB: stack backtrace: > [85751] #0 0xffffffff80b9803d at kdb_backtrace+0x5d > [85751] #1 0xffffffff80b4a8d5 at vpanic+0x135 > [85751] #2 0xffffffff80b4a793 at panic+0x43 > [85751] #3 0xffffffff81026b8f at trap_fatal+0x40f > [85751] #4 0xffffffff81026bdf at trap_pfault+0x4f > [85751] #5 0xffffffff80ffd9f8 at calltrap+0x8 > [85751] #6 0xffffffff81fea83b at dmu_sync_late_arrival_done+0x6b > [85751] #7 0xffffffff8214a78e at zio_done+0xc6e > [85751] #8 0xffffffff821442cc at zio_execute+0x3c > [85751] #9 0xffffffff80bae402 at taskqueue_run_locked+0x182 > [85751] #10 0xffffffff80baf692 at taskqueue_thread_loop+0xc2 > [85751] #11 0xffffffff80b0484f at fork_exit+0x7f > [85751] #12 0xffffffff80ffea5e at fork_trampoline+0xe > [85751] Uptime: 23h49m11s