getaffinity/setaffinity and cpu sets.
Brooks Davis
brooks at freebsd.org
Sat Feb 23 21:35:19 UTC 2008
On Sat, Feb 23, 2008 at 11:21:33AM -1000, Jeff Roberson wrote:
>
> On Sat, 23 Feb 2008, Brooks Davis wrote:
>
>> On Fri, Feb 22, 2008 at 01:52:54PM -1000, Jeff Roberson wrote:
>>> On Fri, 22 Feb 2008, Brooks Davis wrote:
>>>
>>>> On Fri, Feb 22, 2008 at 12:34:13PM -1000, Jeff Roberson wrote:
>>>>>
>>>>> On Thu, 21 Feb 2008, Robert Watson wrote:
>>>>>
>>>>>> On Wed, 20 Feb 2008, Jeff Roberson wrote:
>>
>>>>>> - It would be nice to be able to use CPU sets in jail as well,
>>>>>> suggesting
>>>>>> a
>>>>>> hierarchal model with some sort of tagging so you know what CPU sets
>>>>>> were
>>>>>> created in a jail such that you know whether they can be changed in a
>>>>>> jail.
>>>>>> While I recognize this makes things a lot more tricky, I think we
>>>>>> should
>>>>>> basically be planning more carefully with respect to virtualization
>>>>>> when
>>>>>> we
>>>>>> add new interfaces, since it's a widely used feature, and the current
>>>>>> set
>>>>>> of
>>>>>> "stragglers" unsupported in Jail is growing rather than shrinking.
>>>>>
>>>>> I have implemented a hierarchical model. Each thread has a pointer to
>>>>> the
>>>>> cpuset that it's in. If it makes a local modification via
>>>>> setaffinity()
>>>>> it
>>>>> gets an anonymous cpuset that is a child of the set assigned to the
>>>>> process. This anonymous set will also be inherited across fork/thread
>>>>> creation.
>>>>>
>>>>> In this model presently there are nodes marked as root. To query the
>>>>> 'system' cpus available we walk up from the current node until we find
>>>>> a
>>>>> root. These are the 'system' set. A thread may not break out of its
>>>>> system set. A process may join the root set but it may not modify a
>>>>> root
>>>>> that is a parent. Jails would create a new root. A process outside of
>>>>> the
>>>>> jail can modify the set of processors in the jail but a process within
>>>>> the
>>>>> jail/root may not.
>>>>>
>>>>> The next level down from the root is the assigned set. The root may be
>>>>> an
>>>>> assigned set or this may be a subset of the root. Processes may create
>>>>> sets which are parented back to their root and may include any
>>>>> processors
>>>>> within their root. The mask of the assigned set is returned as
>>>>> 'available'
>>>>> processors.
>>>>>
>>>>> This gives a 1 to 3 level hierarchy. The root, an assigned set, and an
>>>>> anonymous set. Any of these but the root may be omitted. There is no
>>>>> current way for userland to create subsets of assigned sets to permit
>>>>> further nesting. I'm not sure I see value in it right now and it gives
>>>>> the
>>>>> possibility of unbound tree depth.
>>>>>
>>>>> Anonymous sets are immutable as they are shared and changes only apply
>>>>> to
>>>>> the thread/pid in the WHICH argument and not others which have
>>>>> inherited
>>>>> from it. Anonymous sets have no id and may not be specifically
>>>>> manipulated
>>>>> via a setid. You must refer to the process/thread. From the
>>>>> administration point of view they don't exist.
>>>>>
>>>>> When a set is modified we walk down the children recursively and apply
>>>>> the
>>>>> new mask. This is done with a global set lock under which all
>>>>> modifications and tree operations are performed. The td_cpuset pointer
>>>>> is
>>>>> protected under the thread_lock() and may read the set without a lock.
>>>>> This
>>>>> gives the possibility for certain kinds of races but I believe they are
>>>>> all
>>>>> safe.
>>>>>
>>>>> Hopefully I explained that well enough for people to follow. I realize
>>>>> it's a lot of text but it's fairly simple book keeping code. This is
>>>>> all
>>>>> implemented and I'm debugging now.
>>>>
>>>> One place I'd like to implement CPU affinity is in the Sun Grid Engine
>>>> execution daemon. I think anonymous set would not be sufficent there
>>>> because the model allows new tasks to be started on a particular node at
>>>> any time during a parallel job. I'd have to do some more digging in the
>>>> code to be entierly certain. I think the less limits we place on the
>>>> hierarchy, the better off we'll be unless there are compeling complexity
>>>> reasons to avoid them.
>>>
>>> With the anonymous set you can bind any thread to any cpu that is visible
>>> to it. How would this not work?
>>
>> I'm still trying to wrap my head around the anonymous sets. Is the idea
>> that once you are in an anonymous set, you can't expand it, or can you
>> expand out as far as the assigned set? I'd like for parallel jobs to
>> be allocated a set of cpus that they can't change, but still be able
>> to make their own decisions about thread affinity if they desire (for
>> example OpenMPI has some support for this so processes stay put and in
>> theory benefit from positive cache effects). If that's feasible in
>> this model, I'm happy ok it. I think we should keep in mind that these
>> SGE execution daemons might be sitting inside jails. ;-)
>
> Ah, when I said the anonymous sets were immutable, that only means that
> they are copy-on-write. Because you can't know who shares a copy via fork
> or thread creation you must make a new set each time you write.
>
> I made the anonymous sets so that the parent would have a list of all
> derivative children sets so that modifications to the parent would be
> reflected in the child. This also means that the scheduler only has to
> look at one bitmap to determine the available cpus for a thread.
I think the anonymous sets seem like a good idea. On solution to my
problem might be to make changing your current set to be something that
is not a subset of your parent (or maybe your current set?) is privileged.
-- Brooks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20080223/cd2b65f5/attachment.pgp
More information about the freebsd-arch
mailing list