i386 4/4 change

Sat Mar 31 15:57:19 UTC 2018

On Sat, 31 Mar 2018, Konstantin Belousov wrote:

> the change to provide full 4G of address space for both kernel and
> user on i386 is ready to land.  The motivation for the work was to both
> mitigate Meltdown on i386, and to give more breazing space for still
> used 32bit architecture.  The patch was tested by Peter Holm, and I am
> satisfied with the code.
>
> If you use i386 with HEAD, I recommend you to apply the patch from
> https://reviews.freebsd.org/D14633
> and report any regressions before the commit, not after.  Unless
> a significant issue is reported, I plan to commit the change somewhere
> at Wed/Thu next week.
>
> Also I welcome patch comments and reviews.

It crashes at boot time in getmemsize() unless booted with loader which
I don't want to use.

It is much slower, and I couldn't find an option to turn it off.

For makeworld, the system time is slightly more than doubled, the user
time is increased by 16%, and the real time is increased by 21%.

On amd64, turning off pti and not having ibrs gives almost no increase
in makeworld times relative to old versions, and pti only costs about
5% IIRC.

Makeworld is not very syscall-intensive.  netblast is very syscall-intensive,
and its throughput is down by a factor of 5 (660/136 = 4.9, 1331/242 = 5.5).

netblast 127.0.0.1 5001 5 10 (localhost, port 5001, 5-byte tinygrams for 10 s):
     537 kpps sent, 0 kpps dropped     # before this patch (CPU use 1.3)
     136 kpps sent, 0 kpps dropped     # after (CPU use 2.1)

(Pure software overheads.  It uses 1.6 times as much CPU to go 4 times
slower).

netblast 192.168.2.8 (low end PCI33 lem on low latency 1 Gbps LAN)
     275 kpps sent, 1045 kpps dropped  # before (CPU use 1.3)
     245 kpps sent, 0    kpps dropped  # after (CPU use 1.3)

(The hardware can't do anywhere near line rate of ~1500 kpps, so this
becomes a benchmark of syscalls and dropping packets.  The change makes
FreeBSD so slow that 8 CPUs at 4.08 can't saturate a low end PCI33 NIC
(the hardware saturates at about 282 kpps for tx and about 400 kpps for
rx)).

netblast 192.168.2.8 (low end PCIe em on low latency 1 Gbps LAN)
    1316 kpps sent, 3 kpps dropped     # before (CPU use 1.6)
     243 kpps sent, 0 kpps dropped     # after (CPU use 1.2)

This is seriously slower for the most useful case.  It reduces a system
that could almost reach line rate using about 2 of 8 CPUs at 4 GHz to
one that that is slower than with 1 CPU at 2 GHz (the latter saturates
in software at about 640 kpps in old versions of FreeBSD at at about
400 kpps in -current).

Initial debugging of the crash: it crashes on the first pmap_kenter()
in getmemsize().  I configure debug.late_console to 0.  That works,
and without it getmemsize() can't even be debugged since it is after
console initialization and ddb entry with -d.

In getmemsize(), of course all the preload calls return 0 and smapbase is
NULL.  Then vm86 bios calls work and give basemem = 0x276.  Then
basemem_setup() is called and it returns. Then pmap_kenter() is called
and it crashes:

Stopped at      getmemsize+0xb3:        pushl   $0x1000
Stopped at      getmemsize+0xb8:        pushl   $0x1000
Stopped at      getmemsize+0xbd:        call    pmap_kenter
Stopped at      pmap_kenter:    pushl   %ebp
Stopped at      pmap_kenter+0x1:        movl    %esp,%ebp
Stopped at      pmap_kenter+0x3:        movl    0x8(%ebp),%eax
Stopped at      pmap_kenter+0x6:        shrl    $0xc,%eax
Stopped at      pmap_kenter+0x9:        movl    0xc(%ebp),%edx
Stopped at      pmap_kenter+0xc:        orl     $0x3,%edx
Stopped at      pmap_kenter+0xf:        movl    %edx,PTmap(,%eax,4)

The last instruction crashes because PTmap is not mapped at this point:

db> p/x $edx
     1003
db> p/x PTmap
ff800000
db> p/x $eax
        1
db> x/x PTmap
PTmap:KDB: reentering
KDB: stack backtrace:
   db_trace_self_wrapper(cec5cb,1420a04,c6de83,1420978,1,...) at db_trace_self_wrapper+0x24/frame 0x142095c
kdb_reenter(1420978,1,ff80003a,1420998,8f1419,...) at kdb_reenter+0x24/frame 0x1420968
trap(1420a10) at trap+0xa0/frame 0x1420a04
calltrap() at calltrap+0x8/frame 0x1420a04
--- trap 0xc, eip = 0xc5c394, esp = 0x1420a50, ebp = 0x1420a88 ---
db_read_bytes(ff800001,3,1420aa0) at db_read_bytes+0x29/frame 0x1420a88
db_get_value(ff800000,4,0,0,d2d304,...) at db_get_value+0x20/frame 0x1420ab4
db_examine(ff800000,1,ffffffff,1420b00) at db_examine+0x144/frame 0x1420ae4
db_command(cb1d99,1420be4,8f0f01,d1d28a,0,...) at db_command+0x20a/frame 0x1420b90
db_command_loop(d1d28a,0,1420bac,1420b9c,1420be4,...) at db_command_loop+0x55/frame 0x1420b9c
db_trap(a,ffff4ff0,1,1,80046,...) at db_trap+0xe1/frame 0x1420be4
kdb_trap(a,ffff4ff0,1420cc4) at kdb_trap+0xb1/frame 0x1420c10
trap(1420cc4) at trap+0x523/frame 0x1420cb8
calltrap() at calltrap+0x8/frame 0x1420cb8
--- trap 0xa, eip = 0xc65a4a, esp = 0x1420d04, ebp = 0x1420d04 ---
pmap_kenter(1000,1000,1429000,8efe13,0,...) at pmap_kenter+0xf/frame 0x1420d04
getmemsize(1,5a8807ff,ee,59a80097,ee,...) at getmemsize+0xc2/frame 0x1420fc4
init386(1428000) at init386+0x2bb/frame 0x1420ff4
btext() at btext+0x55
*** error reading from address ff800000 ***
--More--        KDB: reentering
KDB: stack backtrace:
db_trace_self_wrapper(cec5cb,1420ab4,8ee255,cb1923,ff800000,...) at db_trace_self_wrapper+0x24/frame 0x1420a7c
kdb_reenter(cb1923,ff800000,0) at kdb_reenter+0x24/frame 0x1420a88
db_get_value(ff800000,4,0,0,d2d304,...) at db_get_value+0x3a/frame 0x1420ab4
db_examine(ff800000,1,ffffffff,1420b00) at db_examine+0x144/frame 0x1420ae4
db_command(cb1d99,1420be4,8f0f01,d1d28a,0,...) at db_command+0x20a/frame 0x1420b90
db_command_loop(d1d28a,0,1420bac,1420b9c,1420be4,...) at db_command_loop+0x55/frame 0x1420b9c
db_trap(a,ffff4ff0,1,1,80046,...) at db_trap+0xe1/frame 0x1420be4
kdb_trap(a,ffff4ff0,1420cc4) at kdb_trap+0xb1/frame 0x1420c10
trap(1420cc4) at trap+0x523/frame 0x1420cb8
calltrap() at calltrap+0x8/frame 0x1420cb8
--- trap 0xa, eip = 0xc65a4a, esp = 0x1420d04, ebp = 0x1420d04 ---
pmap_kenter(1000,1000,1429000,8efe13,0,...) at pmap_kenter+0xf/frame 0x1420d04
getmemsize(1,5a8807ff,ee,59a80097,ee,...) at getmemsize+0xc2/frame 0x1420fc4
init386(1428000) at init386+0x2bb/frame 0x1420ff4
btext() at btext+0x55
db>

Bruce