[Bug 284743] System reproducably livelocks after a couple of hours in poudriere bulk -a
Date: Tue, 04 Mar 2025 00:11:49 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=284743 --- Comment #8 from Mitchell Horne <mhorne@freebsd.org> --- I am going to take an informed guess that this might a bug in OpenSBI. The version provided by sysutils/opensbi sat at v1.4 for some time. A quick log of the commits to the IPI code since that version yields one interesting candidate: commit be9752a071475ae1d9e58a2dfcb8e83185fb7ae5 Author: Samuel Holland <samuel.holland@sifive.com> Date: Fri Oct 25 11:59:46 2024 -0700 lib: sbi_ipi: Make .ipi_clear always target the current hart All existing users of this operation target the current hart, and it seems unlikely that a future user will need to clear the pending IPI status of a remote hart. Simplify the logic by changing .ipi_clear (and its wrapper sbi_ipi_raw_clear()) to always operate on the current hart. This incidentally fixes a bug introduced in commit 78c667b6fc07 ("lib: sbi: Prefer hartindex over hartid in IPI framework"), which changed the .ipi_clear parameter from a hartid to a hart index, but failed to update the warm_init functions to match. Fixes: 78c667b6fc07 ("lib: sbi: Prefer hartindex over hartid in IPI framework") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> A bug in clearing the IPI status, when multiple harts are attempting an IPI broadcast concurrently, might explain the livelock we are seeing. I did not inspect the implementation to verify this. Notably, the buggy commit was present in the v1.4 release, but this fix was not. I recently (last week) updated the sysutils/opensbi port to v1.6, and dependent u-boot ports were bumped. So, I suggest you update your firmware, keep running things the usual way, and if the livelocks continue to manifest report back here. -- You are receiving this mail because: You are the assignee for the bug.