From nobody Wed May 11 00:11:02 2022 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 6D7F51AE1A23 for ; Wed, 11 May 2022 00:11:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Kyb0k1cPgz4m8S for ; Wed, 11 May 2022 00:11:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 0DD4F7153 for ; Wed, 11 May 2022 00:11:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 24B0B1Tj024348 for ; Wed, 11 May 2022 00:11:02 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 24B0B16I024347 for bugs@FreeBSD.org; Wed, 11 May 2022 00:11:01 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 263908] Something spawning many "sh" process, system no longer boots, in single user /var/log empty Date: Wed, 11 May 2022 00:11:02 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: misc X-Bugzilla-Version: 13.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: greg@teamworkweb.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1652227862; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=YJ+i7lXEfogjFeVSpNmNDWTn0qwwEDrahOmaBTZ/A0I=; b=vd7RY44jkQh5PsJ7HYetIfZoHU55t4k6LEXyZghro/wc5LZ6Td+8HbcNqhV70+fsjoovXG Uk169ONnXb20+nS9rWHEpVW6jqP/VyZoZZu3bizg1mw4ALg1miS7QQ9ikc1Q4/zc7yjfAw T16eZU1sXcqbY2FW7LeQGnIGVlHWLoEzB8BUdvmHSS61iLXMCBnEzqBN02GC8gSeDsOrBs S6N8XIc4OiskI2pQoOEGDLo+wNRw1MnEDnusHPUYCQKbSHBHwBXVq2iWsVfE30Xjs3p4EG PVZXXjPi4QM4dkVmPWPK/l1dX8EIJ91RdPFJ9tkCvP/c5Bv8suZS+OqQT7+m6g== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1652227862; a=rsa-sha256; cv=none; b=DZmhr0VpKBbZAtrLLDPC36lXD0sJjM375hh8+zMtm8AEW5R04Coj7tmwPkdEmID0bcdxWl +Mo52tXkAqaQITeAtTa2U9RbfWkJ0A+45Dkq0SaBlzgG2QCPn0LfjPL1mEAGmfsouhSL3c FZCHcCFfessqIjmHmUFjSqWmqru7qf6tqg1sIgdoL2ooKGFySu+c0e8yg57Emh5LxYrP0b zdC9PFlM8sv5F1UAYcB+cRkAaMWHxGjwCZXwLJLE/vs9uR3FkH+O5ppYQqvINFDsteTMDX kY7Bi0jMs2bBqHILL8mhzsHHd+zPvHi0njkkH00EGx3jTMUstazkrA1/dYcjtw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D263908 Bug ID: 263908 Summary: Something spawning many "sh" process, system no longer boots, in single user /var/log empty Product: Base System Version: 13.1-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: misc Assignee: bugs@FreeBSD.org Reporter: greg@teamworkweb.com Not sure how, or even if, I should report this. However figured I should say something, since process I am using to install and run 13.1-RC6 is basically same as what I had going with 13.0. But... with a serious issue! All things being equal, issues point to a flaw or difference in 13.1-RC6 compared to 1= 3.0. Did a fresh install of 13.1RC-6 on Sunday (05/08) evening. Ran into an issue with MFI driver (reported as bug 263906) but was able to work around with M= RSAS driver (which I intended to use anyway). Installed common packages for benchmarks. Built a zpool using dRAID out of HDDs and special vdev using 3x mirror of SSDs. Applied mix of system tunables that had been working reliab= ly under 13.0 (can provide if requested). Started a test set of back to back f= io and iozone benchmarks. Next morning went to check results. Found I could not run anything, was get= ting "No more processes" on my shell. Left it running, later Monday evening foun= d I was able to run processes. But there were over 37,000+ instances of "sh" running! Mostly in sleep. I was able to pull /var/log/messages, and found: May 9 20:11:00 freebsd kernel: maxproc limit exceeded by uid 2 (pid 21916); see tuning(7) and login.conf(5) Results from top at the time: last pid: 22684; load averages: 0.26, 0.18, 0.11=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20 up 0+22:20:59 20:15:46 37976 processes:1 running, 37975 sleeping CPU: 0.1% user, 0.0% nice, 6.0% system, 0.0% interrupt, 93.8% idle Mem: 1112K Active, 19G Inact, 8491M Laundry, 2648M Wired, 40K Buf, 817M Free ARC: 236M Total, 50M MFU, 108M MRU, 2067K Header, 75M Other 90M Compressed, 222M Uncompressed, 2.46:1 Ratio Swap: 8192M Total, 2784M Used, 5408M Free, 33% Inuse PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMM= AND 22684 root 1 20 0 72M 46M CPU1 1 0:16 85.79% top 25011 ntpd 1 20 0 21M 1724K select 3 0:02 0.00% ntpd 8242 root 1 52 0 13M 2004K wait 1 0:01 0.00% sh Did a reboot, and has been all down hill from there. System will no longer boot, at least not to login prompt. It stalls during several points at load= ing up, after usb driver load, and after starting network. Can coax it along so= me what by crtl-c/x/z, the last thing it will do is "Starting devd". Kernel seems to be running, as it will reboot if you hit ctrl-alt-del, or p= ower down if you tap power button. I can get into single user mode, but find /var/log is empty. I let it sit for a while at one point, and it displayed a few lines over ti= me that it was killing of "sh" processes. Because I had rebooted several times on the first night, right now I suspect some stock ("out of the box") cron job is running and looping, creating all= the "sh" processes. But I don't have enough detail yet. Honestly still figuring out how I get root file system out of read-only mode when booted single user? I want to comment out everything in /etc/crontab a= nd try booting. See if one of these is the cause. (again all "stock", I didn't create any custom cron jobs yet) Because of the issues with the MFI driver, I did pull the LSI 9361 HBA out = of the server. I even destroyed the dRAID pool. Doesn't seem related, issue persists. So why am I reporting this as a "bug", when I lack enough detail to confirm= the actual issue? Because every single step I did was the same as performed und= er 13.0. On the same hardware, that had been 100% stable for 3+ months. All th= ings being equal, there is something "wrong" or "different" in 13.1-RC6 which is= now broken / breaking my setup. In the interest of helping rule this out as a flaw in RC6, willing to do wh= at I can to trouble shoot further. But honestly would need more input as to prop= er diagnostic steps. I do have a little more time to "play" with this hardware, before I have to select a version and put it into production. I was holding= out so I could run 13.1 when it goes to release. But if I cannot figure this ou= t I will roll back to 13.0 for production, since that was fully stable. Please let me know what other details to provide, suggestions for trouble shooting, further diagnostics. Just looking to contribute to RC6 testing, determine if this is a bug or a "just me" problem. Thanks! -Greg- --=20 You are receiving this mail because: You are the assignee for the bug.=