From nobody Tue Nov 01 22:09:09 2022 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4N241Q2MYPz4g9Bj; Tue, 1 Nov 2022 22:09:14 +0000 (UTC) (envelope-from rpokala@freebsd.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4N241Q1kKbz3S01; Tue, 1 Nov 2022 22:09:14 +0000 (UTC) (envelope-from rpokala@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1667340554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TfetLGDvZ8ch7QhCGpN+gAQaQuP149OhePmWZdjrc7c=; b=w6XMgU1nW9edRaWHthyVXw+YtFUnZeHYfXeHqvS+i4cLPIQLAdEzHyckyiH7aNofj+liGj 8NCUiluPC4yKiliDEhcfAzofk9eq2gmGsdpFN/IGKj51/r5fbwwuy22i8WFeC8fW7v8R3g rWfea2LFcxBMkUQepQDzK6mz3qOvvixs6LoH/U8GXgKTokHeQDxPHaC0GCZwAOoLs507Wr gKU01hZQ56eTP/3kacUAdUZjpcFsQq5dcoFvMyk8Mhegsew2juLx0rdJPkGwc6PMV8AEac ahJcjm15skVpXvY6nxMJKG8TYYeps+swSlJAsMJN0kB8iuuME37VcrDNI/lseA== Received: from [192.168.1.10] (unknown [98.42.164.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: rpokala) by smtp.freebsd.org (Postfix) with ESMTPSA id 4N241P45WHz15dp; Tue, 1 Nov 2022 22:09:13 +0000 (UTC) (envelope-from rpokala@freebsd.org) User-Agent: Microsoft-MacOutlook/16.66.22102801 Date: Tue, 01 Nov 2022 15:09:09 -0700 Subject: Re: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests when possible From: Ravi Pokala To: Chuck Silvers CC: , , Message-ID: Thread-Topic: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver requests when possible References: <202211011755.2A1HtMll099820@gitrepo.freebsd.org> <930FB7AD-A360-460E-9638-AEEDD85AF2DE@panasas.com> In-Reply-To: List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org Mime-version: 1.0 Content-type: text/plain; charset="UTF-8" Content-transfer-encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1667340554; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TfetLGDvZ8ch7QhCGpN+gAQaQuP149OhePmWZdjrc7c=; b=fe7Nssa/IbYOg884ARjxVCpsxMWRLHYBfqMa22/OpKBZvyvt1Ps7/r3nA8JS/STLX6Q8QB FlT8yj4YUH3Bp0NO8jsAwJvfOfM35GSd7wgCRrtuCjKcg2J7QsYhdFp5ZA8iO1XwLemiwt sVaKYHJdgwMoM956x4TDyvPL1cYU2OTehC2SOvAH/o0qFXXcsxSopvGMM8+X6VH2yWej/I 3wu/zsVORzgAxbtcjV5XjhLgk2oSfN2uqbr9QF6m6LjJ54FD8vPcnpTRi03FJj1mqXkBXd aryilsc+JTQboLuew0X8gfcBa7Nl/tcXtIyKlPKkeeswgD9ALY2INGi2M6uHRg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1667340554; a=rsa-sha256; cv=none; b=nLbQuZaULl9+X8dGTT+TQkhp3hz/sNQV+nIAXCA6Ib12GghhB6puQnrurQkhjRFSTkNN7L nlszZEavSJ+x2N3W2gR2KsGV83TT2AyzVjp4FKRA6sGoasA+wSwPPV1PsmLfYetEFInE7z VzffADTZ3HX+vXyo6eHgwXNEj0ecHXAE7P+VS3QGDIcyhOhqQ0lv9wteZO+ewsGI9g3rKb KD/TaIJhPB7U2nh0K1Xzmt5948qkvppAbVHK7JXUNWgwpaFwUx7ypsuwR+7mAtOpuLfDnB +Kn3/3ahfCC9WOEXaxWmv7pf4eZaaZcyDhgzc9Nhhcyw3rsk1ew1BNjA9sRt1g== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N My understanding is that IPMB and KCS are different physical-layer interfac= es to the BMC, and only one of them is active on any given motherboard. So i= f the kernel is using KCS, then so is userland. This can be grepped out of /= var/run/dmesg.boot: ipmi0: port 0xca2,0xca3 on acpi0 ipmi0: KCS mode found at io 0xca2 on acpi ipmi0: IPMI device rev. 1, firmware rev. 1.73, version 2.0, device supp= ort mask 0xbf ipmi0: Number of channels 2 ipmi0: Attached watchdog ipmi0: Establishing power cycle handler -Ravi (rpokala@) =EF=BB=BF-----Original Message----- From: Chuck Silvers Date: 2022-11-01, Tuesday at 14:55 To: Ravi Pokala Cc: , , Subject: Re: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver request= s when possible Hi Ravi, the usual path (the IPMI_SYSTEM_INTERFACE_ADDR_TYPE case) for an "appli= cation" request (ie. one from userland) is: ipmi_polled_enqueue_request() sc->ipmi_enqueue_request() ipmi_ioctl() which uses the regular (low priority) queue. in the non-IPMI_SYSTEM_INTERFACE_ADDR_TYPE case (aka. the IPMB case), ipmi_ioctl() does issue two driver requests to set up the IPMB stuff be= fore then issuing the real application request with sc->ipmi_enqueue_request= () as above. I don't know much about this IPMB thing, it looks like a kind of pass-t= hrough interface for communicating with another device that sits behind the ma= in BMC IPMI controller. I suppose that it would make sense to change these two IPMB setup drive= r requests to use the regular (low priority) queue (since sleeping is ok in that p= ath), but I don't think I have any way to test that since our systems don't u= se this path. -Chuck On Tue, Nov 01, 2022 at 11:53:57AM -0700, Ravi Pokala wrote: > Hi Chuck, >=20 > As best as I can tell, the stack for handling a request from userland= (e.g. `ipmitool') looks like this: >=20 > ipmi_polled_enqueue_request_highpri() > kcs_driver_request_queue() /* assuming not panicked or dumping core *= / > kcs_driver_request() > sc->ipmi_driver_request() > ipmi_submit_driver_request() > ipmi_ioctl() >=20 > And the stack for handling kernel-internal requests (i.e. watchdog) i= s essentially the same, except for calling ipmi_submit_driver_request() dire= ctly rather than going through ipmi_ioctl(). Specifically, there doesn't see= m to be anything that would put the two different types of requests on two d= ifferent queues; it looks like everything goes onto the highpri queue. >=20 > What am I missing? >=20 > Thanks, >=20 > Ravi (rpokala@) >=20 > =EF=BB=BF-----Original Message----- > From: on behalf of Chuck Silvers <= chs@FreeBSD.org> > Date: 2022-11-01, Tuesday at 10:55 > To: , , = > Subject: git: f0f3e3e961d3 - main - ipmi: use a queue for kcs driver = requests when possible >=20 > The branch main has been updated by chs: >=20 > URL: https://cgit.FreeBSD.org/src/commit/?id=3Df0f3e3e961d311f1cd93= 8f1319385e7f454525f1 >=20 > commit f0f3e3e961d311f1cd938f1319385e7f454525f1 > Author: Chuck Silvers > AuthorDate: 2022-11-01 17:55:14 +0000 > Commit: Chuck Silvers > CommitDate: 2022-11-01 17:55:14 +0000 >=20 > ipmi: use a queue for kcs driver requests when possible >=20 > The ipmi watchdog pretimeout action can trigger unintentional= ly in > certain rare, complicated situations. What we have seen at N= etflix > is that the BMC can sometimes be sent a continuous stream of > writes to port 0x80, and due to what is a bug or misconfigura= tion > in the BMC software, this results in the BMC running out of m= emory, > becoming very slow to respond to KCS requests, and eventually= being > rebooted by its own internal watchdog. While that is going o= n in > the BMC, back in the host OS, a number of requests are pendin= g in > the ipmi request queue, and the kcs_loop thread is working on > processing these requests. All of the KCS accesses to proces= s > those requests are timing out and eventually failing because = the > BMC is responding very slowly or not at all, and the kcs_loop= thread > is holding the IPMI_IO_LOCK the whole time that is going on. > Meanwhile the watchdogd process in the host is trying to pat = the > BMC watchdog, and this process is sleeping waiting to get the > IPMI_IO_LOCK. It's not entirely clear why the watchdogd proc= ess > is sleeping for this lock, because the intention is that a th= read > holding the IPMI_IO_LOCK should not sleep and thus any thread > that wants the lock should just spin to wait for it. My best= guess > is that the kcs_loop thread is spinning waiting for the BMC t= o > respond for so long that it is eventually preempted, and duri= ng > the brief interval when the kcs_loop thread is not running, > the watchdogd thread notices that the lock holder is not runn= ing > and sleeps. When the kcs_loop thread eventually finishes pro= cessing > one request, it drops the IPMI_IO_LOCK and then immediately t= akes the > lock again so it can process the next request in the queue. > Because the watchdogd thread is sleeping at this point, the k= cs_loop > always wins the race to acquire the IPMI_IO_LOCK, thus starvi= ng > the watchdogd thread. The callout for the watchdog pretimeou= t > would be reset by the watchdogd thread after its request to t= he BMC > watchdog completes, but since that request never processed, t= he > pretimeout callout eventually fires, even though there is not= hing > actually wrong with the host. >=20 > To prevent this saga from unfolding: >=20 > - when kcs_driver_request() is called in a context where it = can sleep, > queue the request and let the worker thread process it rat= her than > trying to process in the original thread. > - add a new high-priority queue for driver requests, so that= the > watchdog patting requests will be processed as quickly as = possible > even if lots of application requests have already been que= ued. >=20 > With these two changes, the watchdog pretimeout action does n= ot trigger > even if the BMC is completely out to lunch for long periods o= f time > (as long as the watchdogd check command does not also get stu= ck). >=20 > Sponsored by: Netflix > Reviewed by: imp > Differential Revision: https://reviews.freebsd.org/D36555 > --- > sys/dev/ipmi/ipmi.c | 33 ++++++++++++++++++++++++++++++--- > sys/dev/ipmi/ipmi_kcs.c | 28 +++++++++++++++++++++++++++- > sys/dev/ipmi/ipmivars.h | 2 ++ > 3 files changed, 59 insertions(+), 4 deletions(-) >=20 > diff --git a/sys/dev/ipmi/ipmi.c b/sys/dev/ipmi/ipmi.c > index d79690d55c68..b8705a81627b 100644 > --- a/sys/dev/ipmi/ipmi.c > +++ b/sys/dev/ipmi/ipmi.c > @@ -208,6 +208,15 @@ ipmi_dtor(void *arg) > IPMI_LOCK(sc); > if (dev->ipmi_requests) { > /* Throw away any pending requests for this device. */ > + TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests_highpri, ir= _link, > + nreq) { > + if (req->ir_owner =3D=3D dev) { > + TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, > + ir_link); > + dev->ipmi_requests--; > + ipmi_free_request(req); > + } > + } > TAILQ_FOREACH_SAFE(req, &sc->ipmi_pending_requests, ir_link, > nreq) { > if (req->ir_owner =3D=3D dev) { > @@ -579,13 +588,19 @@ ipmi_dequeue_request(struct ipmi_softc *sc) >=20 > IPMI_LOCK_ASSERT(sc); >=20 > - while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_req= uests)) > + while (!sc->ipmi_detaching && TAILQ_EMPTY(&sc->ipmi_pending_req= uests) && > + TAILQ_EMPTY(&sc->ipmi_pending_requests_highpri)) > cv_wait(&sc->ipmi_request_added, &sc->ipmi_requests_lock); > if (sc->ipmi_detaching) > return (NULL); >=20 > - req =3D TAILQ_FIRST(&sc->ipmi_pending_requests); > - TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link); > + req =3D TAILQ_FIRST(&sc->ipmi_pending_requests_highpri); > + if (req !=3D NULL) > + TAILQ_REMOVE(&sc->ipmi_pending_requests_highpri, req, ir_link)= ; > + else { > + req =3D TAILQ_FIRST(&sc->ipmi_pending_requests); > + TAILQ_REMOVE(&sc->ipmi_pending_requests, req, ir_link); > + } > return (req); > } >=20 > @@ -601,6 +616,17 @@ ipmi_polled_enqueue_request(struct ipmi_soft= c *sc, struct ipmi_request *req) > return (0); > } >=20 > +int > +ipmi_polled_enqueue_request_highpri(struct ipmi_softc *sc, struc= t ipmi_request *req) > +{ > + > + IPMI_LOCK_ASSERT(sc); > + > + TAILQ_INSERT_TAIL(&sc->ipmi_pending_requests_highpri, req, ir_l= ink); > + cv_signal(&sc->ipmi_request_added); > + return (0); > +} > + > /* > * Watchdog event handler. > */ > @@ -817,6 +843,7 @@ ipmi_startup(void *arg) > mtx_init(&sc->ipmi_requests_lock, "ipmi requests", NULL, MTX_DE= F); > mtx_init(&sc->ipmi_io_lock, "ipmi io", NULL, MTX_DEF); > cv_init(&sc->ipmi_request_added, "ipmireq"); > + TAILQ_INIT(&sc->ipmi_pending_requests_highpri); > TAILQ_INIT(&sc->ipmi_pending_requests); >=20 > /* Initialize interface-dependent state. */ > diff --git a/sys/dev/ipmi/ipmi_kcs.c b/sys/dev/ipmi/ipmi_kcs.c > index df3b37614eb7..5908ec88e039 100644 > --- a/sys/dev/ipmi/ipmi_kcs.c > +++ b/sys/dev/ipmi/ipmi_kcs.c > @@ -32,6 +32,7 @@ __FBSDID("$FreeBSD$"); > #include > #include > #include > +#include > #include > #include > #include > @@ -490,7 +491,21 @@ kcs_startup(struct ipmi_softc *sc) > } >=20 > static int > -kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *r= eq, int timo) > +kcs_driver_request_queue(struct ipmi_softc *sc, struct ipmi_requ= est *req, int timo) > +{ > + int error; > + > + IPMI_LOCK(sc); > + ipmi_polled_enqueue_request_highpri(sc, req); > + error =3D msleep(req, &sc->ipmi_requests_lock, 0, "ipmireq", timo= ); > + if (error =3D=3D 0) > + error =3D req->ir_error; > + IPMI_UNLOCK(sc); > + return (error); > +} > + > +static int > +kcs_driver_request_poll(struct ipmi_softc *sc, struct ipmi_reque= st *req) > { > int i, ok; >=20 > @@ -504,6 +519,17 @@ kcs_driver_request(struct ipmi_softc *sc, st= ruct ipmi_request *req, int timo) > return (req->ir_error); > } >=20 > +static int > +kcs_driver_request(struct ipmi_softc *sc, struct ipmi_request *r= eq, int timo) > +{ > + > + if (KERNEL_PANICKED() || dumping) > + return (kcs_driver_request_poll(sc, req)); > + else > + return (kcs_driver_request_queue(sc, req, timo)); > +} > + > + > int > ipmi_kcs_attach(struct ipmi_softc *sc) > { > diff --git a/sys/dev/ipmi/ipmivars.h b/sys/dev/ipmi/ipmivars.h > index b0548ee3d7c3..1468e86be73b 100644 > --- a/sys/dev/ipmi/ipmivars.h > +++ b/sys/dev/ipmi/ipmivars.h > @@ -111,6 +111,7 @@ struct ipmi_softc { > uint8_t ipmi_dev_support; /* IPMI_ADS_* */ > struct cdev *ipmi_cdev; > TAILQ_HEAD(,ipmi_request) ipmi_pending_requests; > + TAILQ_HEAD(,ipmi_request) ipmi_pending_requests_highpri; > int ipmi_driver_requests_polled; > eventhandler_tag ipmi_power_cycle_tag; > eventhandler_tag ipmi_watchdog_tag; > @@ -237,6 +238,7 @@ void ipmi_complete_request(struct ipmi_softc = *, struct ipmi_request *); > struct ipmi_request *ipmi_dequeue_request(struct ipmi_softc *); > void ipmi_free_request(struct ipmi_request *); > int ipmi_polled_enqueue_request(struct ipmi_softc *, struct ipmi= _request *); > +int ipmi_polled_enqueue_request_highpri(struct ipmi_softc *, str= uct ipmi_request *); > int ipmi_submit_driver_request(struct ipmi_softc *, struct ipmi_= request *, > int); >=20 >=20