Upperlimit for bwait()
- Reply: Poul-Henning Kamp: "Re: Upperlimit for bwait()"
- Reply: Warner Losh : "Re: Upperlimit for bwait()"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 30 May 2024 05:53:35 UTC
Hello, There have been a few incidents reported on Juniper devices with FreeBSD, where buffer IO operations sleep for more than 30 mins. Theoretically, this can happen due to faulty hardware or in virtual platforms due to faulty connection between guest and host, filesystem corruption, too many buffer IO operations, and/or host not responding due to various reasons. When that happens, as this buffer IO writes hold a lock before going to sleep, the threads waiting for that lock would starve for so long. There is no upper limit for this bwait() as of now. If that wait goes beyond 30 mins for a sleeping thread OR 15 mins for a thread blocked on turnstile, deadlkres crashes the kernel assuming a possible deadlock. We perhaps could gracefully handle such lengthy buffer IO operations by adding a timeout in bwait() - like say 10 minutes. If the buffer IO is not completed in a few mins, it probably would not complete forever and/or would be slowing down the entire system. So it is better to stop such faulty IO operations. For now, since we had seen these instances only with BIO operations, I have a patch to set this value only from bufwait(). Please find the patch attached. I am not very sure if 10 mins is a good upper limit for all the scenarios for bwait(). If it is, then we could just change msleep() in bwait() to set a 10 mins upper limit by default. Please let me know if this approach works for all the usecases - If not, is there a better alternative ? And is 10 mins okay for a timeout ? Thanks and Regards, Kumara