Code review: groundwork for SMP

Fri Jan 29 05:57:37 UTC 2010

I'm new to this list, joined yesterday.  I work at RMI (now Netlogic),
and did part of our internal port of FreeBSD 6.4 to XLR/XLS
processors.

> So on your systems threads share the TLB?  Wired TLB entries can't be
> pulled out (in the case of the kernel stack it's basically
> catastrophic for that to happen.)  A compromise if your TLB entries
> are really at a premium is to use a single large entry (using, say, a
> single 32k page) that contains both PCPU and the kernel stack, or a
> page which has pointers to pcpu data, the kernel stack, etc.  I seem
> to recall seeing a port of FreeBSD that used the same storage for the
> kernel stack and PCPU data, but I could be mistaken.

Our cpus can be configured in a way that they share the 64 TLB entries
among the 4 threads in the core. You could also configure the threads
so that they have 16 independent entires each.  But 16 is too less for
running FreeBSD, so by default we used the shared TLB mode.

> There are other trade-offs available, of course.  If we don't use the
> gp for accessing small data, we can keep a pointer to the pcpu data of
> a CPU in gp whenever the kernel is running, and then PCPU accesses are
> just a madder of loading from offset+gp, which is very quick — faster
> than the wired TLB entry mechanism, unless you use a virtual address
> for the pcpu in which case it can be painful.  As there are more
> things like VIMAGE, the amount of small global data in the kernel is
> going to fall and making gp a pcpu pointer makes more sense.  My old
> port used -G0 and I still disable use of the gp in my non-FreeBSD MIPS
> work — I think NetBSD used to but I haven't noticed what FreeBSD does.

Again on XLR processors, there are per-thread scratch registers in
COP0. So our preferred way of doing this was to have the per-cpu
pointer in one of these scratch registers.  We can also get the TLB
out of the way for some of these by reserving KSEG0 region on startup
for these and for stack.

I agree with Randall here, the preferred way is to avoid wiring the
TLB entries.  Can't we reserve some area for this at start-up and keep
the pointer in a platform-specific macro?

> More curiosity than anything (since I don't seem to be able to get an
> RMI system to develop on): if the threads are sharing the TLB, how are
> updates to TLB-related fields synchronized?  How do you atomically
> increase the wired count of the TLB?  How does 'tlbwr' work?  Do you
> have to use special instructions when you're sharing the TLB that are
> XLR-specific?

Each thread has its own COP0 registers, but they update the core's
TLB, so there are no special TLB instructions.

Regards,
JC.

--
C. Jayachandran    c.jayachandran at gmail.com