zpool errors
Daniel Braniss
danny at cs.huji.ac.il
Thu Jul 11 13:18:11 UTC 2019
> On 11 Jul 2019, at 10:39, Daniel Braniss <danny at cs.huji.ac.il> wrote:
>
>
>
>> On 10 Jul 2019, at 20:23, Allan Jude <allanjude at freebsd.org> wrote:
>>
>> On 2019-07-10 11:37, Daniel Braniss wrote:
>>>
>>>
>>>> On 10 Jul 2019, at 18:24, Allan Jude <allanjude at freebsd.org> wrote:
>>>>
>>>> On 2019-07-10 10:48, Daniel Braniss wrote:
>>>>> hi,
>>>>> i got a degraded pool, but can’t make sense of the file name:
>>>>>
>>>>> protonew-2# zpool status -vx
>>>>> pool: h
>>>>> state: ONLINE
>>>>> status: One or more devices has experienced an error resulting in data
>>>>> corruption. Applications may be affected.
>>>>> action: Restore the file in question if possible. Otherwise restore the
>>>>> entire pool from backup.
>>>>> see: http://illumos.org/msg/ZFS-8000-8A <http://illumos.org/msg/ZFS-8000-8A>
>>>>> scan: scrub repaired 6.50K in 17h30m with 0 errors on Wed Jul 10 12:06:14 2019
>>>>> config:
>>>>>
>>>>> NAME STATE READ WRITE CKSUM
>>>>> h ONLINE 0 0 14.4M
>>>>> gpt/r5/zfs ONLINE 0 0 57.5M
>>>>>
>>>>> errors: Permanent errors have been detected in the following files:
>>>>>
>>>>> <0x102>:<0x30723>
>>>>> <0x102>:<0x30726>
>>>>> <0x102>:<0x3062a>
>>>>> …
>>>>> <0x281>:<0x0>
>>>>> <0x6aa>:<0x305cd>
>>>>> <0xffffffffffffffff>:<0x305cd>
>>>>>
>>>>>
>>>>> any hints as how I can identify third files?
>>>>>
>>>>> thanks,
>>>>> danny
>>>>>
>>>>> _______________________________________________
>>>>> freebsd-hackers at freebsd.org mailing list
>>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>>>>>
>>>>
>>>> Once a file has been deleted, ZFS can have a hard time determining its
>>>> filename.
>>>>
>>>> It is inode 198186 (0x3062a) on dataset 0x102. The file has been
>>>> deleted, but still exists in at least one snapshot.
>>>>
>>>> Although, 57 million checksum errors seems like there may be some other
>>>> problem. You might look for and resolve the problem with what appears to
>>>> be a raid5 you have built your ZFS pool on top of it? Then do 'zpool
>>>> clear' to reset the counters to zero, and 'zpool scrub' to try to read
>>>> everything again.
>>>>
>>>> --
>>>> Allan Jude
>>>>
>>> I don’t know when the first error was detected, and this host has been up for 367 days!
>>> I did a scrub but no change.
>>> i will remove old snapshots and see if it helps.
>>>
>>> is it possible to know at least which volume?
>>>
>>> thanks,
>>> danny
>>>
>>>
>>> _______________________________________________
>>> freebsd-hackers at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>>>
>>
>> zdb -ddddd h 0x102
>>
>> Should tell you about which dataset that is
>>
>> --
>> Allan Jude
>>
>
the above did’t work for me, but,
after removing old snapshots I reduced the problematic files to 1!
<0xffffffffffffffff>:<0x305cd>
which seems very odd -1?
so now I removed more old snapshots, and started a a new zpool scrub.
what still worries me is the fast growing checksum count,
thanks,
danny
> firstly, thanks for your help!
> now, after doing a zpool clear, I notice that the CHKSUM is growing,
> the pool is on a raid controller raid5 (PERC from dell) which is showing
> it’s correcting the errors (‘Corrected medium error during recovery on PD …).
>
> so what can be the cause? btw, the FreeBSD is 10.3-stable.
>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
More information about the freebsd-hackers
mailing list