[Bug 272238] [PATCH] false sharing with pthread rw and spin locks leads to severe perf degradation

Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] [PATCH] false sharing with pthread rw and spin locks leads to severe perf degradation"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] [PATCH] false sharing with pthread rw and spin locks leads to severe perf degradation"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] False sharing with pthread rw and spin locks leads to severe perf degradation"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] False sharing with pthread rw and spin locks leads to severe perf degradation"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] False sharing with pthread rw and spin locks leads to severe perf degradation"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] False sharing with pthread rw and spin locks leads to severe perf degradation"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] False sharing with pthread rw and spin locks leads to severe perf degradation"
Reply: bugzilla-noreply_a_freebsd.org: "[Bug 272238] False sharing with pthread rw and spin locks leads to severe perf degradation"
Go to: [ bottom of page ] [ top of archives ] [ this month ]

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 26 Jun 2023 23:41:16 UTC

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272238

            Bug ID: 272238
           Summary: [PATCH] false sharing with pthread rw and spin locks
                    leads to severe perf degradation
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: threads
          Assignee: threads@FreeBSD.org
          Reporter: greg@codeconcepts.com

Created attachment 243025
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=243025&action=edit
Patch to ameliorate false sharing with pthread rw and spin locks.

If an application allocates two or more pthread rwlocks (or spin locks) and
then heavily accesses them from different CPUs it's quite likely that the
application will experience severe performance degradation due to false
sharing.
The problem is that these locks are small (36 and 32 bytes, respectively) and
allocated on the heap via jemalloc(3).  Depending upon the state of the
allocator they may wind up within the same or straddling adjacent cache lines.

For example, if I initialize four rwlocks and hammer on them from four
different CPUs (one lock per CPU, such each lock is always uncontended) then on
my dual E2697a 2.6GHz server I get about 10.52 million lock+inc+unlock calls
per second.

With the attached patch, which rounds up the allocations to CACHE_LINE_SIZE, I
get 47.68 million calls per second.  Similarly, for pthread spin locks I get
about 4.53 and 50.94 million calls per second, respectively.

Overall, I am seeing roughly a 4.5x improvement with pthread rwlocks, and an
11.2x improvement with pthread spin locks.

The patch is very simple and ignores adajacent cacheline prefetch as seen on
amd64 hardware.

Developed and test on:

FreeBSD sm1.cc.codeconcepts.com 14.0-CURRENT FreeBSD 14.0-CURRENT #4
n263748-b95d2237af40: Mon Jun 26 17:08:50 CDT 2023    
greg@sm1.cc.codeconcepts.com:/usr/obj/usr/src/amd64.amd64/sys/SM1 amd64

Passes the kyua test:

kyua test -k /usr/tests/lib/libthr/Kyuafile

-- 
You are receiving this mail because:
You are the assignee for the bug.