Re: kernel epoch crash in IPv4 multicast code
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 21 Mar 2022 15:11:44 UTC
Kristof wrote: > On 18 Mar 2022, at 19:02, Mike Karels wrote: > > It looks like the IPv4 multicast code has not been fully converted to > > use epochs. I installed this week's snapshot of -current, configured > > and started mrouted, and started rwhod -m. The system crashed shortly > > thereafter with this: > > > > panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_output.c:343 > > cpuid = 15 > > time = 1647609865 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01b51a39d0 > > vpanic() at vpanic+0x17f/frame 0xfffffe01b51a3a20 > > panic() at panic+0x43/frame 0xfffffe01b51a3a80 > > ip_output() at ip_output+0x15f9/frame 0xfffffe01b51a3b80 > > phyint_send() at phyint_send+0x107/frame 0xfffffe01b51a3be0 > > ip_mdq() at ip_mdq+0x259/frame 0xfffffe01b51a3c60 > > X_ip_mrouter_set() at X_ip_mrouter_set+0x9e4/frame 0xfffffe01b51a3d30 > > sosetopt() at sosetopt+0xee/frame 0xfffffe01b51a3d80 > > kern_setsockopt() at kern_setsockopt+0xad/frame 0xfffffe01b51a3de0 > > sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe01b51a3e00 > > amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe01b51a3f30 > > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01b51a3f30 > > --- syscall (105, FreeBSD ELF64, sys_setsockopt), rip = 0x821b72dda, rsp = 0x8204c06f8, rbp = 0x8204c0750 --- > > KDB: enter: panic > > > > The kgdb backtrace is appended. > > > > It looks like ip_mroute is protected in the forwarding path (it's called > > from ip_input) and the output path, but not in the setup path from > > setsockopt(). At least the MRT_ADD_MFC call needs to enter an epoch. > > I tried adding epoch handling in add_mfc(), and that seems to work. > > The alternative would be to do it in Xip_mrouter_set() so it would cover > > all the calls. Any opinions? > > > Your analysis looks reasonable. > I think I'd suggest adding the NET_EPOCH_ENTER() calls in add_mfc(). We already do that in add_vif(), so we'd be following existing choices. > I'd also suggest adding NET_EPOCH_ASSERT() to everything which directly or indirectly calls ip_output(). That should help us catch other potential issues like this one. Thanks. I had already added one assert; I added one in send_packet() as well. For anyone interested, this is now in review: https://reviews.freebsd.org/D34624. Mike > Br, > Kristof