Hung kernel from sysv semaphore semu_list corruption
Ed Maste
emaste at phaedrus.sandvine.ca
Wed Mar 7 23:19:33 UTC 2007
Nightly tests on our 6.1-based installation using pgsql have resulted in
a number of kernel hangs, due to a corrupt semu_list (the list ended up
with a loop).
It seems there are a few holes in the locking in the semaphore code. The
issue we've encountered comes from semexit_myhook. It obtains a pointer
to a list element after acquiring SEMUNDO_LOCK, and after dropping the
lock manipulates the next pointer to remove the element from the list.
The fix below solves our current problem. Any comments?
--- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005
+++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007
@@ -1259,16 +1259,17 @@
struct proc *p;
{
struct sem_undo *suptr;
- struct sem_undo **supptr;
/*
* Go through the chain of undo vectors looking for one
* associated with this process.
*/
SEMUNDO_LOCK();
- SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) {
- if (suptr->un_proc == p)
+ SLIST_FOREACH(suptr, &semu_list, un_next) {
+ if (suptr->un_proc == p) {
+ SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next);
break;
+ }
}
SEMUNDO_UNLOCK();
@@ -1328,8 +1329,9 @@
* Deallocate the undo vector.
*/
DPRINTF(("removing vector\n"));
+ SEMUNDO_LOCK();
suptr->un_proc = NULL;
- *supptr = SLIST_NEXT(suptr, un_next);
+ SEMUNDO_UNLOCK();
}
static int
More information about the freebsd-hackers
mailing list