Kernel panic due to netback.c
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 20 Mar 2023 20:10:38 UTC
<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Hello,</div> <div> </div> <div>From time to time a kernel panic occurs. Xen-kernel-4.15, dom0, FreeBSD 13.0-RELEASE.</div> <div> </div> <div>"Fatal trap 12: page fault while in kernel mode"</div> <div> </div> <div>I can not repeat it reliably, but eventually it happens. I have captured a stack trace (always the same on crash), relevant part is:<br/> ..<br/> #9 xnb_txpkt2gnttab (pkt=<optimized out>, pkt@entry=0xfffffe00c49fdac8, mbufc=<optimized out>, mbufc@entry=0xfffff8002f958500, gnttab=gnttab@entry=0xfffffe019ae94a70,<br/> txb=txb@entry=0xfffffe019ae95480, otherend_id=6) at /usr/src/sys/dev/xen/netback/netback.c:1715<br/> #10 0xffffffff80a8d72a in xnb_recv (txb=0xfffffe019ae95480, otherend=6, mbufc=<optimized out>, ifnet=0xfffff80170f81000, gnttab=0xfffffe019ae94a70)<br/> at /usr/src/sys/dev/xen/netback/netback.c:1851<br/> #11 xnb_intr (arg=0xfffffe019ae94000) at /usr/src/sys/dev/xen/netback/netback.c:1446<br/> ..</div> <div> </div> <div>It seems netback.c has not changed in ages, same lines are valid in 13.2 RC3 as well.</div> <div> </div> <div>relevant code around /usr/src/sys/dev/xen/netback/netback.c:1715<br/> ..<br/> xnb_txpkt2gnttab(const struct xnb_pkt *pkt, struct mbuf *mbufc,<br/> ..<br/> while (size_remaining > 0) {<br/> const netif_tx_request_t *txq = RING_GET_REQUEST(txb, r_idx);<br/> const size_t mbuf_space = M_TRAILINGSPACE(mbuf) - m_ofs; /* PANIC happens here! */<br/> ..</div> <div> </div> <div>By analyzing the trace i've come to conclusion that mbuf is NULL, thus macro:<br/> #define M_TRAILINGSPACE(m) ((m)->m_maxlen - (m)->m_len)<br/> introduces panic.</div> <div> </div> <div>The only way mbuf can become NULL is within this same loop at line:1751 mbuf = mbuf->m_next;<br/> it can not be NULL at the function call, because xnb_recv ensures that it is not NULL, before call.</div> <div> </div> <div>The problem definiteley is because "while condition" is on size_remaining, but contents are accessed based on mbuf->m_next;</div> <div> </div> <div>So my questions are:<br/> 1) would it be possible to add some function before the PANIC line (or mbuf->m_next) that dumps offending packet in error logs or something similar? The goal for this would be to find a way to reliably repeat this case and understand what is the cause? If there is no such a function, which variables would be relevant and hellpful in this case?<br/> 2) How could this code be modified so that it does not panic in this case, but just drops offending packet instead?</div> <div>A code snippet in xnb_recv has caught my eye:<br/> if (*mbufc == NULL) {<br/> /*<br/> * Couldn't allocate mbufs. Respond and drop the packet. Do<br/> * not consume the requests<br/> */<br/> xnb_txpkt2rsp(&pkt, txb, 1);<br/> DPRINTF("xnb_intr: Couldn't allocate mbufs, num_consumed=%d\n",<br/> num_consumed);<br/> if_inc_counter(ifnet, IFCOUNTER_IQDROPS, 1);<br/> return ENOMEM;<br/> }</div> <div>Could it be used in function xnb_txpkt2gnttab to avoid panic in this particular case as well?</div> <div><br/> Thank you!<br/> Janis Abens</div> <div> </div></div></body></html>