Re: Does a failed separate ZIL disk mean the entire zpool is lost?

In reply to: andy thomas : "Re: Does a failed separate ZIL disk mean the entire zpool is lost?"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Charles Sprickman <spork_at_bway.net>
Date: Tue, 10 Sep 2024 18:02:10 UTC
I don't think your data is gone!

Note that the "-m" option to ignore the log device and possibly lose a few transactions does NOT require a mirrored ZIL:

             -m    Allows a pool to import when there is a missing log
                   device.  Recent transactions can be lost because the log
                   device will be discarded.

It seems like it's absolutely worth trying.

Charles

> On Sep 10, 2024, at 6:35 AM, andy thomas <andy@time-domain.co.uk> wrote:
> 
> Thank you but I'm afraid I didn't use two mirrored ZIL devices since I didn't know this was possible at the time I set this server up (late 2017 and before I was even aware of the 'FreeBSD Mastery: ZFS' book!) And there were no spare disk bays in the server's chassis to add another device and at the time PCIe > nvme adapters were not available. For data resilience I relied on an identical mirror server in the same rack linked via a 2 x 10GBit/sec bonded point-to-point network link but this server also failed in the data centre melt-down...
> 
> It looks like the data is now lost so I won't waste any more time trying to recover it - this incident will hopefully persuade my employer to heed advice given years ago regarding locating mirror servers in a different data centre linked by a fast multi-gigabit connection.
> 
> Andy
> 
> PS: the ZFS and Advanced ZFS books are truly excellent, by the way!
> 
> On Mon, 9 Sep 2024, Allan Jude wrote:
> 
>> As the last person mentioned, you should be able to import with the -m flag, and only lose about 5 seconds worth of writes.
>> 
>> The pool is already partially imported at boot by the other mechanisms, you might need to disable that to prevent the partial import at boot, so you can do the manual import.
>> 
>> On 2024-09-09 12:20 p.m., infoomatic wrote:
>>> did you use two mirrored ZIL devices?
>>> You can "zpool import -m", but you will probably be confronted with some
>>> errors - you will probably lose the data the ZIL has not committed, but
>>> most of your data in your pool should be there
>>> On 09.09.24 17:51, andy thomas wrote:
>>>> A server I look after had a 65TB ZFS RAIDz1 pool with 8 x 8TB hard disks
>>>> plus one hot spare and separate ZFS intent log (ZIL) and L2ARC cache
>>>> disks that used a pair of 256GB SSDs. This ran really well for 6 years
>>>> until 2 weeks ago, when the main cooling system in the data centre where
>>>> it was installed failed and the backup cooling system failed to start up.
>>>> The upshot was the ZIL SSD went short-circuit across its power
>>>> connector, shorting out the server's PSUs and shutting down the server.
>>>> After replacing the failed SSD and verifying all the spinning hard disks
>>>> and the cache SSD are undamaged, attempts to import the pool fail with
>>>> the following message:
>>>> NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP DEDUP
>>>> HEALTH  ALTROOT
>>>> clustor2      -      -      -        -         -      - -      -
>>>> UNAVAIL  -
>>>> Does this mean the pool's contents are now lost and unrecoverable?
>>>> Andy
>> 
>> 
> 
> 
> ----------------------------
> Andy Thomas,
> Time Domain Systems
> 
> Tel: +44 (0)7866 556626
> http://www.time-domain.co.uk