Getting ZFS pools back.
Willem Jan Withagen
wjw at digiware.nl
Tue May 1 21:25:44 UTC 2018
On 30/04/2018 12:37, Willem Jan Withagen wrote:
> On 29-4-2018 23:20, Willem Jan Withagen wrote:
>> On 29/04/2018 20:21, Warner Losh wrote:
>>>
>>>
>>> On Sun, Apr 29, 2018 at 11:57 AM, Jan Knepper <jan at digitaldaemon.com
>>> <mailto:jan at digitaldaemon.com>> wrote:
>>>
>>> On 04/29/2018 13:27, Willem Jan Withagen wrote:
>>>
>>> Trouble started when I installed (freebsd-update) 11.1 over a
>>> running 10.4. Which is sort of scarry?
>>>
>>> This does sounds 'scary' as I am planning to do this in the (near)
>>> future...
>>>
>>> Has anyone else experienced issues like this?
>>>
>>> Generally I do build the new system software on a running system,
>>> but then go to single user mode to perform the actual install.
>>>
>>> I have done many upgrades like that over 18 or so years and never
>>> seen or heard of an issue alike this.
>>>
>>>
>>> 11.x binaries aren't guaranteed to work with a 10.x kernel. So that's
>>> a bit of a problem. freebsd-update shouldn't have let you do that
>>> either.
>>>
>>> However, most 11.x binaries work well enough to at least bootstrap /
>>> fix problems if booted on a 10.x kernel due to targeted forward
>>> compatibility. You shouldn't count on it for long, but it generally
>>> won't totally brick your box. In the past, and I believe this is
>>> still true, they work well enough to compile and install a new kernel
>>> after pulling sources. The 10.x -> 11.x syscall changes are such that
>>> you should be fine. At least if you are on UFS.
>>
>> I have been doing those kind of this for years and years. Even
>> upgrading over NFS and stuff. Sometimes it is a bit too close to the
>> sun and things burn. But never crash this bad.
>>
>>> However, the ZFS ioctls and such are in the bag of 'don't
>>> specifically guarantee and also they change a lot' so that may be why
>>> you can't mount ZFS by UUID. I've not checked to see if there's
>>> specifically an issue here or not. The ZFS ABI is somewhat more
>>> fragile than other parts of the system, so you may have issues here.
>>>
>>> If all else fails, you may be able to PXE boot an 11 kernel, or boot
>>> off a USB memstick image to install a kernel.
>>
>> Tried just about replace everything in both the boot-partition (First
>> growing it to take > 64K gptzfsboot) and in /boot from the memstick.
>> But the error never went away.
>>
>> Never had ZFS die on me this bad, that I could not get it back.
>>
>>> Generally, while we don't guarantee forward compatibility (running
>>> newer binaries on older kernels), we've generally built enough
>>> forward compat so that things work well enough to complete the
>>> upgrade. That's why you haven't hit an issue in 18 years of
>>> upgrading. However, the velocity of syscall additions has increased,
>>> and we've gone from fairly stable (stale?) ABIs for UFS to a more
>>> dynamic one for ZFS where backwards compat is a bit of a crap shoot
>>> and forward compat isn't really there at all. That's likely why
>>> you've hit a speed bump here.
>>
>> Come to think of it, I did not do this step with freebsd-update, since
>> I was not at an official release yet. I was going to 11.1-RELEASE, to
>> be able to start using freebsd-update.
>>
>> So I don't think I did just do that.... But I tried so much yesterday.
>> Normally I would installkernel, reboot, installworld, mergemaster,
>> reboot for systems that are not up for freebsd-update.
>
> Right,
>
> The story gets even sadder .....
> Took the "spare" disk home, and just connected it to an older SuperMicro
> server I had lying about for Ceph tests. And lo and behold, it just boots.
>
> So that system got upgraded from: 10.2 -> 10.4 -> 11.1
> No complaints about anything.
>
> So now I'm inclined to point at older hardware with an old bios, which
> confused ZFS, or probably more precisely gptzfsboot.
>
> From dmidecode:
> System Information
> Manufacturer: Supermicro
> Product Name: H8SGL
> Version: 1234567890
> BIOS Information
> Vendor: American Megatrends Inc.
> Version: 3.5
> Release Date: 11/25/2013
> Address: 0xF0000
>
> We only have 1 of those, so further investigation, and or tinkering, in
> combo with the hardware will be impossible.
Today i found the messages below in my daily report of the server:
+NMI ISA 3c, EISA ff
+NMI ISA 3c, EISA ff
+NMI ISA 3c, EISA ff
+NMI ... going to debugger
+NMI ... going to debugger
+NMI ISA 3c, EISA ff
+NMI ISA 2c, EISA ff
+NMI ... going to debugger
+NMI ... going to debugger
+NMI ISA 2c, EISA ff
+NMI ISA 3c, EISA ff
+NMI ... going to debugger
+NMI ... going to debugger
+NMI ... going to debugger
+NMI ISA 3c, EISA ff
+NMI ... going to debugger
Could these things have anything to do with the problem I had with
trying to find the pools.
--WjW
More information about the freebsd-hackers
mailing list