FreeBSD on Ryzen
Freddie Cash
fjwcash at gmail.com
Wed Mar 22 20:50:37 UTC 2017
On Wed, Mar 22, 2017 at 1:30 PM, Don Lewis <truckman at freebsd.org> wrote:
> I put together a Ryzen 1700X machine over the weekend and installed the
> 12.0-CURRENT r315413 snapshot on it a couple of days ago. The RAM is
> DDR4 2400.
>
> First impression is that it's pretty zippy. Compared to my previous
> fastest machine:
> CPU: AMD FX-8320E Eight-Core Processor (3210.84-MHz K8-class CPU)
> make -j8 buildworld using tmpfs is a bit more than 2x faster. Since the
> Ryzen has SMT, it's eight cores look like 16 CPUs to FreeBSD, I get
> almost a 2.6x speedup with -j16 as compared to my old machine.
>
> I do see that the reported total CPU time increases quite a bit at -j16
> (~19900u) as compared to -j8 (~13600u) so it is running into some
> hardware bottlenecks that are slowing down instruction execution. It
> could be the resources shared by both SMT threads that share each core,
> or it could be cache or memory bandwidth related. The Ryzen topology is
> a bit complicated. There are two groups of four cores, where each group
> of four cores shares half of the L3 cache, with a slowish interconnect
> bus between the groups. This probably causes some NUMA-like issues. I
> wonder if the ULE scheduler could be tweaked to handle this better.
>
The interconnect, aka Infinity Fabric, runs at the speed of the memory
controller, so if you put faster RAM into the system, the fabric runs
faster, and inter-CCX latency should drop to match.
There's 2 MB of L3 cache shared between every two cores, but any core can
access data in the L3 cache of any other core. Latency for those requests
depends on whether it's within the same CCX (4-core cluster), or in the
other CCX (going across the Infinity Fabric).
There's a lot of finicky timing issues with L3 cache accesses, and with
thread migration (in-CCX vs across the fabric).
This is a whole other level of NUMA fun. And it'll get even more fun when
the server version ships where you have 4 CCXes in a single CPU, with
multiple sockets on a motherboard, and Infinity Fabric joining everything
together. :)
I feel sorry for the scheduler devs who get to figure all this out. :D
Supposedly, the Linux folks have this mostly figured out in kernel 4.10,
but I'll wait for the benchmarks to believe it. There's a bunch up on
Phoronix ... but, well, it's Phoronix. :)
--
Freddie Cash
fjwcash at gmail.com
More information about the freebsd-amd64
mailing list