[Bug 254303] Fatal trap 12: page fault while in kernel mode ((frr 7.5_1 + Freebsd 13 Beta3) zebra crashes server when routes are populated)

Fri Mar 26 23:46:30 UTC 2021

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254303

--- Comment #16 from Alexander V. Chernikov <melifaro at FreeBSD.org> ---
(In reply to Aleks from comment #15)
Thank you!

Short summary:

>From the private core.5 you sent me:
* rtentry looks perfectly fine, but the nexthop pointer is (mostly) zeroed

* from the core2: failure to resolve nh_priv pointer
* from the original kgdb_backtrace: nhg has zero pointer to nh_ctl

So far it looks like we're removing the additional reference from the nexthop
group in some corner case scenario, which results in the group being freed,
with the rtentry still pointing to this group.

Re reproduction: I don't have 2 full-view peers, so I ended up duplicating the
feed from a single peer & introducing some delay, to mimic propagation delays.
So far I wasn't able to reproduce any panic.
Are there any additional specifics (e.g. links flapping) in the setup?

IS there any chance you could run

stdbuf -o0 route -n monitor > zebra_log.txt at startup (or, actually, at the
point in time when all peers are down) and then try to turn up first and then
the second peer?
If you could also run something like
`while true; do date >> nhg.log ; netstat -4OnW >> nhg.log ; sleep 5; done`

and share both files along with the core backtrace, that would be awesome.

If there is a possibility of getting access to the server - that would really
speed the things up.

-- 
You are receiving this mail because:
You are on the CC list for the bug.