Reducing vm page queue mutex contention

Thu Feb 1 06:42:24 UTC 2007

Hello Alan,

Profiling shows that the vm page queue mutex is the most contended lock in
the kernel, maybe apart from sched_lock. It seems that this is in part
because this lock protects a lot of things: page queues, pv entries, page
flags, page hold count, page wired count..

I came up with a possible plan to reduce contention on this lock,
concentrating on the amd64 pmap (although these should be applicable to
the other architectures as well):

- Make vm_page_flag_set/clear() just use atomic operations to get rid of
  the page queues lock dependency.
  I'm still not entirely convinced this is entirely safe.

- vm_page_hold and vm_page_unhold can be made not acquire the queues lock in
  the common case.

  I already have a patch for this, although it increases the size of
  vm_page_t (I have some other ideas to reduce the size of vm_page_t, but
  that's for another time):

  http://people.freebsd.org/~ssouhlal/testing/vm_page_hold-20070131.diff

- Add a mutex pool for vm pages to protect the pv entries lists.
  I'm currently working on this.
  My current approach makes struct pv_entry larger because it needs to store
  a pointer to the pte in each pv_entry.

  Another way that might be better is to move to per-object pv entries,
  which is what Linux does. It would greatly reduce memory usage when
  mapping large objects in a lot of processes, although it might be slower
  for sparsely faulted objects mapped in a large number of processes.
  This approach would be a lot of work, which is why I'm leaning towards
  keeping per-page pv entries.

- It should be possible to make vm_page->wired_count use atomic operations
  instead of needing a lock, similarly to what I did for the hold_count.
  This might be a bit tricky, but hopefully possible.
  Alternatively, we could use the mutex pool described above to protect it.

- We can change pmap_unuse_pt and free_pv_entry to just mark the pages they
  want to free in an array allocated by the caller.
  The caller will then free those pages after it drops the pmap lock.

  For example:

  struct pages_to_free {
          vm_page_t *page[MAX_PAGES];
          int num_pages;
  };

  void pmap_remove(...)
  {
      struct pages_to_free pages;

      PMAP_LOCK(pmap);
      ...
      pmap_unuse_pt(..., &pages);
      ...
      PMAP_UNLOCK(pmap);
      vm_page_lock_queues();
      for (i = 0; i < pages.num_pages; i++)
          vm_page_free(pages.page[i]);
      vm_page_unlock_queues();
  }

  This way, pmap_remove can be mostly without queues lock.

- Once the above are done, it should be possible to make pmap_enter() run
  mostly queue lock free by:
  - Pre-allocating a pv chunk early in pmap_enter, if there are no free
    ones, so that we never have to allocate new chunks in pmap_insert_entry.
  - Dropping the page queues lock immediately after the pmap_allocpte in
    pmap_enter.

Any thoughts/comments?

-- Suleiman