Bad sector on a gstripe
Alban Hertroys
dalroi at solfertje.student.utwente.nl
Sat Feb 9 14:03:34 UTC 2008
Hi all,
I'm having trouble locating a bad sector on a gstriped file system.
Smartd has been nagging about this single bad sector for months now,
there don't appear to appear any new ones. It's about time I look
into this...
I got so far that I know the sector number in the partition involved.
I detailed my attempts after the problem description. I tried newfs-
ing the filesystem; it's my /tmp - there's nothing of relevance on
it, but newfs-ing doesn't seem to have marked the sector bad.
Anything wrong with: newfs -U -o time /dev/stripe/tmp ? I performed
that from single-user mode after umounting all file-systems.
I tried opening the filesystem with fsdb, but it can't open the
partition, only the striped file-system - how do I determine which
sector I'm dealing with on a striped fs? And how do I write to it to
have it marked as a bad sector?
I'm not sure whether this error means my disk is at the end of its
life, smartd has been spamming me with this single error about the
same sector for months now (every half hour!), and it's only the
third error in the disks' smart log. If I understand the docs of
smartmontools correctly, this could well be caused by the sector not
having been written to all this time, which seems plausible to me;
it's near the end of a mostly empty /tmp...
From the lifetime it appears the disk is nearly two years old
already, and it's been on pretty much 24/7. Maybe it is time to
replace it (by a server version probably).
Time for some data.
The disk is an:
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model: ST3200822A
Serial Number: 3LJ020SJ
Firmware Version: 3.01
smartctl says:
Error 3 occurred at disk power-on lifetime: 18356 hours (764 days +
20 hours)
When the command that caused the error occurred, the device was
active or idle
.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 30 ed 61 40 Error: UNC at LBA = 0x0061ed30 = 6417712
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 20 1f ed 61 40 00 15:42:14.650 READ DMA EXT
25 00 40 9f e6 61 40 00 15:42:14.419 READ DMA EXT
25 00 40 df f1 61 40 00 15:42:14.293 READ DMA EXT
25 00 40 5f e6 61 40 00 15:42:14.049 READ DMA EXT
25 00 40 5f e9 61 40 00 15:42:13.795 READ DMA EXT
According to fdisk and bsdlabel that's on partition e of slice 1:
# fdisk -s /dev/ad0
/dev/ad0: 387621 cyl 16 hd 63 sec
Part Start Size Type Flags
1: 63 390716802 0xa5 0x80
So the bad sector is at 6417712 - 63 = 6417649 in /dev/ad0s1.
# bsdlabel /dev/ad0s1
# /dev/ad0s1:
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 524288 0 4.2BSD 2048 16384 32776
b: 4194304 524288 swap
c: 390716802 0 unused 0 0 # "raw"
part, don't edit
d: 1048576 4718592 4.2BSD 2048 16384 8
e: 1048576 5767168 4.2BSD 2048 16384 8
f: 20971520 6815744 4.2BSD 2048 16384 28552
g: 362929538 27787264 4.2BSD 2048 16384 28552
So the bad sector is 6417649 - 5767168 = 650481 in partition /dev/
ad0s1e at around 62% of its total size. This is where I started to
get lost...
I set up partition ad0s1e to be used in /dev/stripe/tmp:
# gstripe list tmp
Geom name: tmp
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
Stripesize: 4096
ID: 1982480573
Providers:
1. Name: stripe/tmp
Mediasize: 1073733632 (1.0G)
Sectorsize: 512
Mode: r1w1e1
Consumers:
1. Name: ad0s1e
Mediasize: 536870912 (512M)
Sectorsize: 512
Mode: r1w1e2
Number: 0
2. Name: ad1s1e
Mediasize: 536870912 (512M)
Sectorsize: 512
Mode: r1w1e2
Number: 1
I tried: (used -r to prevent it marking my FS's dirty while I was
testing)
# fsdb -r /dev/ad0s1e
** /dev/ad0s1e (NO WRITE)
Cannot find file system superblock
LOOK FOR ALTERNATE SUPERBLOCKS? no
fsdb: cannot set up file system `/dev/ad0s1e'
Exit 1
and:
fsdb -r /dev/stripe/tmp
** /dev/stripe/tmp (NO WRITE)
Examining file system `/dev/stripe/tmp'
Last Mounted on /tmp
current inode: directory
I=2 MODE=40777 SIZE=512
BTIME=Feb 9 12:01:18 2008 [0 nsec]
MTIME=Feb 9 12:54:41 2008 [0 nsec]
CTIME=Feb 9 12:54:41 2008 [0 nsec]
ATIME=Feb 9 13:23:07 2008 [0 nsec]
OWNER=root GRP=wheel LINKCNT=7 FLAGS=0 BLKCNT=4 GEN=7a46458d
fsdb (inum: 2)>
I figured the findblk command would give me the inode of the problem
area (although there won't be one if there are no files in that
sector I think?), but I'm dealing with sectors striped across two
disks... I have no idea which "block number" would be appropriate.
The disk containing the bad sector is apparently the first in the
stripe, that much I gathered.
So, how to continue?
Regards,
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:760,47ada565167321710067946!
More information about the freebsd-questions
mailing list