on BIOS problems with disks larger than 2 TB

Mon Dec 26 23:22:38 UTC 2016

> Am 12.08.2016 um 21:18 schrieb John Baldwin <jhb at FreeBSD.org>:
> 
> On Tuesday, August 02, 2016 04:35:23 PM Andriy Gapon wrote:
>> 
>> There are some BIOSes out there that do not properly support disks
>> larger than 2TB and cause boot problems if there is any data required
>> for boot at offsets larger than 2 TB (TiB, rather).
>> 
>> The most typical victim is the ZFS boot if a boot pool includes disk
>> areas beyond 2TB, because a kernel, or zfsloader or any configuration
>> files required by the loader may end up in those "inaccessible" areas.
>> 
>> It's obvious why 2TiB is a magic value here:
>> 2^32 * 512 = 2^41 = 2 * 2^40
>> So the problem seems to happen when an LBA is treated as a 32-bit
>> integer (unsigned).
>> 
>> I happen to own one of affected systems and I have done some more
>> investigation.  As far as I can see, the only actual problem in my case
>> is that a disk size in 512b sectors is reported modulo 2^32 by INT 13h
>> AH=48h.  If I "fix up" the parameter, then everything else (i.e. actual
>> data reads) seems to work just fine after that.
>> 
>> I suspect that a large subclass of other problematic systems may have
>> exactly the same problem.
>> 
>> Does anyone have an idea about how we could auto-detect and and
>> auto-correct that problem?
>> Would that be worth the trouble at all?  Given the gradual de-orbiting
>> of BIOS systems.
> 
> Hmm, I'm not sure how easy it is to handle this case (i.e. how do you know
> if an LBA beyond the size is really legit due to truncation vs coming from
> corrupted metadata).  Related is that tsoome's bcache stuff wants to know
> where the end of the disk is (to avoid reading off the end), so just
> ignoring the size is not easy.

Having just been bitten by this, an early indication that the BIOS is deficient would be most welcome.

I have two systems (Asus P6-P8H61E) which BIOS seems to be limited to 2 TB.  For about two years, everything seemed to be fine, until the latest make world, when the new loader, kernel, and modules suddenly ended up too far back on the disk:
All buffers synced.
Uptime: 32d4h27m58s
re0: link state changed to DOWN
re0: link st/boot/config: -DhS115200

ZFS: i/o error - all block copies unavailable
Invalid format

FreeBSD/x86 boot
Default: tank/be/default:/boot/kernel/kernel
boot: 
ZFS: i/o error - all block copies unavailable
Invalid format

Of course, the systems are remote and I can’t access them physically easily.

Luckily, I did manage to loader the old loader and kernel, and could bring the system up again, but I will need to try to update the BIOS on the machine, or even create a root ZFS pool that is far enough forward on the main disk.

If the BIOS limitation cannot be worked around, gptboot/gptzfsboot should at least try and read (for example) the backup GPT.  This way, they could emit a warning that parts of the disk are not accessible through the BIOS, and that future boots might suddenly stop working.  If I had known that the BIOS had this problem when I was setting up these systems, I could have easily created a root pool and a separate data pool, instead of just a root pool.

Stefan

-- 
Stefan Bethke <stb at lassitu.de>   Fon +49 151 14070811