From nobody Thu Aug 11 18:18:48 2022 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4M3Znc4WGDz4ZM94 for ; Thu, 11 Aug 2022 18:19:00 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-vk1-xa32.google.com (mail-vk1-xa32.google.com [IPv6:2607:f8b0:4864:20::a32]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4M3Znb6WJyz4FQW for ; Thu, 11 Aug 2022 18:18:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-vk1-xa32.google.com with SMTP id x128so6313792vke.3 for ; Thu, 11 Aug 2022 11:18:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=fZ9xr8CWI2tf/ROC2K6VNq+eNTZsTDvhjM8PRaRqWBU=; b=EuVbsA4PKhanxa9Q2k9jdC7/XUE1okp/KFHuCkM1D8sIdXlx3e/QTbablm5q9GmjYT jomEKegpDfSeKl05fjmxdrVzzpuowAPOs8dP0pGuCL+RgiReikL9UzGmaVrP7p8mtV9A SCVZttUQEd5YjrtWBbyO7d07GtOR9EW0HidK+eEk3yfWtf2HEpSJuI0mH6bnsmlz4vXe 1vTsSFbnYb/fqh3RSn/VjT4jl3V0XSH+By4pfDrlHuSuIukC5qeHNh78RnE646z8DG53 51L0An34lVs03BDDLMOQIdoHqQ3otV56Qxkbivl/C7lvy32QOiV1p1i8QjuIbvO2I8P1 X79g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=fZ9xr8CWI2tf/ROC2K6VNq+eNTZsTDvhjM8PRaRqWBU=; b=6YAa3Zqujt5saem0OkaQ3zxrWfEBsxxVK77NTG4B3JBIL8jP5WONSlkoRJyRc7EbBf HMMUfsNpj2gQrDpTQnoSaAH3dDdBsRjGzrxhr+aAofHjf1cBlYHd5o1J9xi8AMZ+hhkB wCPNeGX1NaZXn0awjAWFdEUTSYx6U/xky3Vn+byKLGQLLLmbJd6J3kswrU7NiT16D4dJ +9mrJGBv2ZfWv3Y8k0mg1XCg+mHogkXntJ+M79LMQ20cQKjfxgkTV0GhzsoPHdoyrE2H KLszkfvslZxsSc2hf0tQDaYP2k9R5l7HbiOOpEEyev9g7uMwIF1alNcQy8lXc1njwPYZ yISw== X-Gm-Message-State: ACgBeo2EkDof/Zlf+nQrwhIKs1/1h1MN3wWy37DZY+hyQmIt7L09NWup YTircSSXDnmPwmSHl65jZqwFsC54J0z6laaUeyYqDA== X-Google-Smtp-Source: AA6agR4YeEE+5xLQMXWAitzUZXpXxX7A2b6pVPXTJjIHv47AdZKo3pL+SrjQ/xQzpSsQp7V4aQx/PKAQT9GPqcy+A6k= X-Received: by 2002:a1f:dac3:0:b0:377:8cb:4544 with SMTP id r186-20020a1fdac3000000b0037708cb4544mr216141vkg.7.1660241938996; Thu, 11 Aug 2022 11:18:58 -0700 (PDT) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <202208110331.27B3Va7M007335@gitrepo.freebsd.org> <727af1f3-7432-c038-a776-682ef161f6f9@FreeBSD.org> In-Reply-To: <727af1f3-7432-c038-a776-682ef161f6f9@FreeBSD.org> From: Warner Losh Date: Thu, 11 Aug 2022 12:18:48 -0600 Message-ID: Subject: Re: git: 39fdad34e220 - main - stand: impose 510,000 byte limit for /boot/loader and /boot/pxeldr To: John Baldwin Cc: Warner Losh , src-committers , "" , dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="00000000000003f4d305e5fb3340" X-Rspamd-Queue-Id: 4M3Znb6WJyz4FQW X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20210112.gappssmtp.com header.s=20210112 header.b=EuVbsA4P; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::a32) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-3.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20210112.gappssmtp.com:s=20210112]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::a32:from]; MLMMJ_DEST(0.00)[dev-commits-src-main@freebsd.org]; RCVD_TLS_LAST(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+,1:+,2:~]; R_SPF_NA(0.00)[no SPF record]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20210112.gappssmtp.com:+]; PREVIOUSLY_DELIVERED(0.00)[dev-commits-src-main@freebsd.org]; RCPT_COUNT_FIVE(0.00)[5]; DMARC_NA(0.00)[bsdimp.com]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com] X-ThisMailContainsUnwantedMimeParts: N --00000000000003f4d305e5fb3340 Content-Type: text/plain; charset="UTF-8" On Thu, Aug 11, 2022 at 10:56 AM John Baldwin wrote: > On 8/10/22 8:31 PM, Warner Losh wrote: > > The branch main has been updated by imp: > > > > URL: > https://cgit.FreeBSD.org/src/commit/?id=39fdad34e220c52a433e78f20c8c39412429014e > > > > commit 39fdad34e220c52a433e78f20c8c39412429014e > > Author: Warner Losh > > AuthorDate: 2022-08-11 03:19:01 +0000 > > Commit: Warner Losh > > CommitDate: 2022-08-11 03:29:20 +0000 > > > > stand: impose 510,000 byte limit for /boot/loader and /boot/pxeldr > > > > The BIOS method of booting imposes an absolute limit of 640k for the > > size of the program being run due to btx. In practice, this means > that > > programs larger than about 500kiB will fail in odd ways as the > stack / > > heap will overflow. > > Technically the heap is now always above 1MB, the issue is the stack > growing > down and overwriting .bss. > Fair point. I realized that after I pushed as well... Some compilers I've used in the past have 'stack overflow' checks, but that was done at the userland/kernel boundary and likely would be hard to pull off here... > > Pick 510,000 as the cutoff line semi-arbitrarily. loader_lua is now > > almost too big and we want to break the build when it crosses this > > threshold. In my experience, below 500,000 always works, above > 520,000 > > always seems to fail with things getting bad somewhere between > 512,000 > > to 515,000. 510,000 is as close to the line as I think we can go, > though > > experience may dictate we need to lower this in the future. > > > > This is at-best a stop-breakage until we have a better way to > subset the > > boot loader for BIOS booting to allow better, more fined-tuned > > /boot/loaders for the many different environments they have to run > > in. This likely means we'll have a graphical loader than > understands a > > few filesystmes for installation, and a non-graphical loader that > > understands the most filesystems possible for everything else in the > > future. Our build infrastructure needs some work before we can do > that, > > however. > > > > At this late date, it likely isn't worth the efforts to move parts > of > > the loader into high memory. There's a number of assumptions about > where > > the stack is, where buffers reside, etc that are fulfilled when it > lives > > in the first 640k that would need bounce buffers and/or other > counter > > measures if we were to split it up. All BIOS calls are done in > 16-bit > > mode with SEG:OFF addresses, requiring them to be in the first 640k > of > > RAM. And nearly all machines in the last decade can boot with UEFI > > (though there's some exceptions, so it isn't worth killing outright > > yet). > > Fully agree that we just want to keep the BIOS loader on a sufficient > feature > diet. > Yes. > > Sponsored by: Netflix > > Reviewed by: kevans > > Differential Revision: https://reviews.freebsd.org/D36129 > > You really want to apply the size check to loader.bin, not loader. The > memory > layout down in the first 1MB for boot loaders is roughly: > > 0x0000: real-mode IDT > 0x0400: BIOS data > 0x7c00: where BIOS loads boot loaders such as /boot/mbr, etc. > 0x1000: various BTX global data like GDT, TSS, IDT, stacks > 0x9000: BTX kernel > 0xa000: BTX client (loader.bin) > 0xa0000: top of BTX client stack (though this can be a bit lower for cases > like > PXE booting) > > The real size constraint is on the BTX client (loader.bin) and the fact > that > it's text/data/bss plus stack need to fit into that 576k window (give or > take). > btxldr isn't stored in low memory, so its size isn't relevant, and BTX's > code > always takes up a full page even though it is much smaller. > Where does 576k come from? That's 589824 bytes, but a0000-a000 is 614400 bytes. The delta is 24k (24576). My 'observed' value of about 515,000 is another 75k below that, suggesting we are needing 100k of stack? Can that be right? I knew lua ate a lot of stack, but wow! In theory pxeboot's total size needs to fit into the window at 0x7c00 - > 0xa0000, > but in practice that limit is much larger since the size of pxeldr plus > the BTX > kernel is much smaller than 0xa000 - 0x7c00. > > I would say that you might want the PXE size to be even lower (maybe 4k or > so?) > than the "plain" disk loader as PXE ROMs grab some of the memory ending at > 0xa0000 to use for data buffers. I don't have a firm number I can recall > of how > much they grab hence my guess of about 4k or so. > Hmmm... Good point on that. It likely is better to get well below the limit than to necessarily check for a smaller pxeldr. Or put another way, I'd rather we push the limit down all the time so we don't wind up in that awkward place where /boot/loader fits, but /boot/pxeldr doesn't. Warner --00000000000003f4d305e5fb3340 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, Aug 11, 2022 at 10:56 AM John= Baldwin <jhb@freebsd.org> wro= te:
On 8/10/22 8= :31 PM, Warner Losh wrote:
> The branch main has been updated by imp:
>
> URL: https://= cgit.FreeBSD.org/src/commit/?id=3D39fdad34e220c52a433e78f20c8c39412429014e<= /a>
>
> commit 39fdad34e220c52a433e78f20c8c39412429014e
> Author:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org>
> AuthorDate: 2022-08-11 03:19:01 +0000
> Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh <imp@FreeBSD.org>
> CommitDate: 2022-08-11 03:29:20 +0000
>
>=C2=A0 =C2=A0 =C2=A0 stand: impose 510,000 byte limit for /boot/loader = and /boot/pxeldr
>=C2=A0 =C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 The BIOS method of booting imposes an absolute lim= it of 640k for the
>=C2=A0 =C2=A0 =C2=A0 size of the program being run due to btx. In pract= ice, this means that
>=C2=A0 =C2=A0 =C2=A0 programs larger than about 500kiB will fail in odd= ways as the stack /
>=C2=A0 =C2=A0 =C2=A0 heap will overflow.

Technically the heap is now always above 1MB, the issue is the stack growin= g
down and overwriting .bss.


>=C2=A0 =C2=A0 =C2=A0 Pick 510,000 as the cutoff line semi-arbitrarily. = loader_lua is now
>=C2=A0 =C2=A0 =C2=A0 almost too big and we want to break the build when= it crosses this
>=C2=A0 =C2=A0 =C2=A0 threshold. In my experience, below 500,000 always = works, above 520,000
>=C2=A0 =C2=A0 =C2=A0 always seems to fail with things getting bad somew= here between 512,000
>=C2=A0 =C2=A0 =C2=A0 to 515,000. 510,000 is as close to the line as I t= hink we can go, though
>=C2=A0 =C2=A0 =C2=A0 experience may dictate we need to lower this in th= e future.
>=C2=A0 =C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 This is at-best a stop-breakage until we have a be= tter way to subset the
>=C2=A0 =C2=A0 =C2=A0 boot loader for BIOS booting to allow better, more= fined-tuned
>=C2=A0 =C2=A0 =C2=A0 /boot/loaders for the many different environments = they have to run
>=C2=A0 =C2=A0 =C2=A0 in. This likely means we'll have a graphical l= oader than understands a
>=C2=A0 =C2=A0 =C2=A0 few filesystmes for installation, and a non-graphi= cal loader that
>=C2=A0 =C2=A0 =C2=A0 understands the most filesystems possible for ever= ything else in the
>=C2=A0 =C2=A0 =C2=A0 future. Our build infrastructure needs some work b= efore we can do that,
>=C2=A0 =C2=A0 =C2=A0 however.
>=C2=A0 =C2=A0 =C2=A0
>=C2=A0 =C2=A0 =C2=A0 At this late date, it likely isn't worth the e= fforts to move parts of
>=C2=A0 =C2=A0 =C2=A0 the loader into high memory. There's a number = of assumptions about where
>=C2=A0 =C2=A0 =C2=A0 the stack is, where buffers reside, etc that are f= ulfilled when it lives
>=C2=A0 =C2=A0 =C2=A0 in the first 640k that would need bounce buffers a= nd/or other counter
>=C2=A0 =C2=A0 =C2=A0 measures if we were to split it up. All BIOS calls= are done in 16-bit
>=C2=A0 =C2=A0 =C2=A0 mode with SEG:OFF addresses, requiring them to be = in the first 640k of
>=C2=A0 =C2=A0 =C2=A0 RAM. And nearly all machines in the last decade ca= n boot with UEFI
>=C2=A0 =C2=A0 =C2=A0 (though there's some exceptions, so it isn'= ;t worth killing outright
>=C2=A0 =C2=A0 =C2=A0 yet).

Fully agree that we just want to keep the BIOS loader on a sufficient featu= re
diet.

>=C2=A0 =C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Netflix
>=C2=A0 =C2=A0 =C2=A0 Reviewed by:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 kevans
>=C2=A0 =C2=A0 =C2=A0 Differential Revision:=C2=A0
https://revi= ews.freebsd.org/D36129

You really want to apply the size check to loader.bin, not loader.=C2=A0 Th= e memory
layout down in the first 1MB for boot loaders is roughly:

0x0000: real-mode IDT
0x0400: BIOS data
0x7c00: where BIOS loads boot loaders such as /boot/mbr, etc.
0x1000: various BTX global data like GDT, TSS, IDT, stacks
0x9000: BTX kernel
0xa000: BTX client (loader.bin)
0xa0000: top of BTX client stack (though this can be a bit lower for cases = like
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PXE booting)

The real size constraint is on the BTX client (loader.bin) and the fact tha= t
it's text/data/bss plus stack need to fit into that 576k window (give o= r take).
btxldr isn't stored in low memory, so its size isn't relevant, and = BTX's code
always takes up a full page even though it is much smaller.

Where does 576k come from? That's 589824 bytes, bu= t a0000-a000 is=C2=A0614400
bytes. The delta is 24k (24576). My &= #39;observed' value of about 515,000 is another
75k below tha= t, suggesting we are needing 100k of stack? Can that be right? I knew
=
lua ate a lot of stack, but wow!

In theory pxeboot's total size needs to fit into the window at 0x7c00 -= 0xa0000,
but in practice that limit is much larger since the size of pxeldr plus the= BTX
kernel is much smaller than 0xa000 - 0x7c00.

I would say that you might want the PXE size to be even lower (maybe 4k or = so?)
than the "plain" disk loader as PXE ROMs grab some of the memory = ending at
0xa0000 to use for data buffers.=C2=A0 I don't have a firm number I can= recall of how
much they grab hence my guess of about 4k or so.

Hmmm... Good point on that. It likely is better= to get well below the limit than to necessarily
check for a smal= ler pxeldr. Or put another way, I'd rather we push the limit down all
the time so we don't wind up in that awkward place where /boot= /loader fits, but /boot/pxeldr
doesn't.=C2=A0

<= /div>
Warner
--00000000000003f4d305e5fb3340--