ULE locking mechanism

Tue Feb 11 19:49:47 UTC 2014

On Tuesday, January 28, 2014 8:07:08 am Jens Krieg wrote:
> Hello,
> 
> we are currently working on project for our university. Our goal is to 
implement a simple round robin scheduler for FreeBSD 9.2 on a single core 
machine.
> So far we removed most of the functionality of the ULE scheduler except the 
functions that are called from outside. The system successfully boots to user 
land with our RR scheduler managing thread in a list based run queue. Further, 
it is possible to interact with the system using the shell.
> 
> The next step is to replace the locking mechanism of the ULE scheduler. 
Therefore, we replaced the scheduling dependent thread_lock/thread_unlock 
functions by simply disabling/enabling the interrupts. With this modification 
the kernel works fine until we hit the user land then the system crashes.
> The error occurs in the init user process (init_main.c:start_init:685). We 
found out that the page fault is triggered while executing the subyte function 
for the first time. See the error description below (unfortunately not shown 
in backtrace).
> We compared the ULE scheduler with our RR implementation and it appears, 
that the parameters passed to subyte as well as the register values are 
identical. We assume, that whatever caused the error is related to the thread 
locking replacement.
> 
> Every time the kernel want to modify thread data the corresponding thread is 
locked to prevent any interference by other threads. Since we are using a 
single core machine why isn’t it sufficient to simply disable interrupt while 
modifying thread data. Could you provide us with detailed information about 
the locking mechanism in FreeBSD and also answer the following questions, 
please.
> 
> What is the purpose of thread_lock/thread_unlock besides protecting thread 
data?
> How does the TDQ LOCK works and how is it related to a thread LOCK?
> 	- all thread LOCKs of the thread located in the run queue pointing to the 
TDQ LOCK, and
> 	- the TDQ LOCK points to the currently running thread
> 	- on context switching the current thread passes the TDQ LOCK to the new 
chosen thread
> 	- Could you explain the idea behind that locking concept, please? 
> Any suggestions we shall care about in our own lock implementation?

thread_lock is quite intertwined with other locks.  E.g. when a thread is
blocked on a turnstile, thread_lock() for that thread locks the 'ts_lock'
spin mutex for that turnstile.  If you want to replace thread lock, you need
to change all the locks that td_lock can be to use your new primitive.  You'd
probably have an easier time just changing how mtx_lock_spin() works.  (In 
fact, if you just disable 'options SMP', the stock kernel turns 
mtx_lock_spin() into a function that just disables interrupts.)

For your core dump, the first step would be to use gdb to map that address to 
a file line.  For example, you can just do 'l *fork_exit+0x9d', or you can do
'l *<instruction pointer>' where you use the value from the trap message.
Looking at that can probably tell you why you panic'd.

-- 
John Baldwin