Software interrupt preemption problems
- Reply: Ryan Stone : "Re: Software interrupt preemption problems"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 28 Sep 2021 13:17:04 UTC
Hi, We have encountered a problem with the netisr system Context: The used scheduler is ULE We are using the netisr to poll packets on network cards: at each swi tick, we poll a given number of packets from the card (16 for the moment) and process them We run a network trafic of 400k packets per second through the machine and the network card does have 4096 packets in the buffer (around 10 ms of traffic) the sysctl kern.hz is set to 8000 the sysctl kern.sched.preemption is set to 1 the PREEMPTION option is enabled Problem: When the netisr doesn't have any work to do, the idle thread wakes up and takes the hand, allowing any userland process to execute Then, a userland process takes the hand, but because of the priority lending mechanism, the process might have taken the netisr priority when it was running on another CPU while the netisr was running (i.e.: the userland process might send some packets causing a concurrent lock access with the netisr) The problem appears on the next hz tick: the netisr asks for a schedule but because the current process "stole" the priority, the netisr can't preempt the process, and because the preemption isn't reevaluated until the userland process does a sysctl or the scheduler time slice has ended, the userland might keep the hand for a long time (a few tens of thousands of milliseconds until the time slice end), causing buffer exhaustion on the network card We added some debug with KTR and saw that the userland process loses his priority escalation after a while (a few hz ticks), making the netisr process the highest priority of the run queue, but because the scheduler doesn't reevaluate the preemption, the netisr doesn't preempt the userland process Fix proposal: We patched kern_intr to force the scheduler to reevalute the preemption when the swi should have asked for a schedule but is already in the run queue static int intr_event_schedule_thread(struct intr_event *ie, struct trapframe *frame) { struct intr_entropy entropy; struct intr_thread *it; struct thread *td; struct thread *ctd; /* * If no ithread or no handlers, then we have a stray interrupt. */ if (ie == NULL || CK_SLIST_EMPTY(&ie->ie_handlers) || ie->ie_thread == NULL) return (EINVAL); ctd = curthread; it = ie->ie_thread; td = it->it_thread; /* * If any of the handlers for this ithread claim to be good * sources of entropy, then gather some. */ if (ie->ie_hflags & IH_ENTROPY) { entropy.event = (uintptr_t)ie; entropy.td = ctd; random_harvest_queue(&entropy, sizeof(entropy), RANDOM_INTERRUPT); } KASSERT(td->td_proc != NULL, ("ithread %s has no process", ie->ie_name)); /* * Set it_need to tell the thread to keep running if it is already * running. Then, lock the thread and see if we actually need to * put it on the runqueue. * * Use store_rel to arrange that the store to ih_need in * swi_sched() is before the store to it_need and prepare for * transfer of this order to loads in the ithread. */ atomic_store_rel_int(&it->it_need, 1); thread_lock(td); if (TD_AWAITING_INTR(td)) { #ifdef HWPMC_HOOKS it->it_waiting = 0; if (PMC_HOOK_INSTALLED_ANY()) PMC_SOFT_CALL_INTR_HLPR(schedule, frame); #endif CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, td->td_proc->p_pid, td->td_name); TD_CLR_IWAIT(td); sched_add(td, SRQ_INTR); } else { #ifdef HWPMC_HOOKS it->it_waiting++; if (PMC_HOOK_INSTALLED_ANY() && (it->it_waiting >= intr_hwpmc_waiting_report_threshold)) PMC_SOFT_CALL_INTR_HLPR(waiting, frame); #endif CTR5(KTR_INTR, "%s: pid %d (%s): it_need %d, state %d", __func__, td->td_proc->p_pid, td->td_name, it->it_need, TD_GET_STATE(td)); + thread_lock(ctd); + sched_setpreempt(td); + thread_unlock(ctd); thread_unlock(td); } return (0); } We would like to know if this patch does look correct to you and if so, if we should make a PR Thanks