zpool can't bring online disk2 ----I screwed up
Jose A. Lombera
jose at lajni.com
Mon Sep 24 05:50:32 UTC 2012
This is the error I got when I run the failover script.
Sep 24 06:43:39 san1 hastd[3404]: [disk3] (primary) Provider /dev/mfid3 is not part of resource disk3.
Sep 24 06:43:39 san1 hastd[3343]: [disk3] (primary) Worker process exited ungracefully (pid=3404, exitcode=66).
Sep 24 06:43:39 san1 hastd[3413]: [disk6] (primary) Provider /dev/mfid6 is not part of resource disk6.
Sep 24 06:43:39 san1 hastd[3343]: [disk6] (primary) Worker process exited ungracefully (pid=3413, exitcode=66).
Sep 24 06:43:39 san1 hastd[3425]: [disk10] (primary) Unable to open /dev/mfid10: No such file or directory.
Sep 24 06:43:39 san1 hastd[3407]: [disk4] (primary) Provider /dev/mfid4 is not part of resource disk4.
Sep 24 06:43:39 san1 hastd[3343]: [disk10] (primary) Worker process exited ungracefully (pid=3425, exitcode=66).
Sep 24 06:43:39 san1 hastd[3410]: [disk5] (primary) Provider /dev/mfid5 is not part of resource disk5.
Sep 24 06:43:39 san1 hastd[3343]: [disk4] (primary) Worker process exited ungracefully (pid=3407, exitcode=66).
Sep 24 06:43:39 san1 hastd[3416]: [disk7] (primary) Provider /dev/mfid7 is not part of resource disk7.
Sep 24 06:43:39 san1 hastd[3422]: [disk9] (primary) Provider /dev/mfid9 is not part of resource disk9.
Sep 24 06:43:39 san1 hastd[3419]: [disk8] (primary) Provider /dev/mfid8 is not part of resource disk8.
Sep 24 06:43:39 san1 hastd[3343]: [disk5] (primary) Worker process exited ungracefully (pid=3410, exitcode=66).
Sep 24 06:43:40 san1 hastd[3343]: [disk9] (primary) Worker process exited ungracefully (pid=3422, exitcode=66).
Sep 24 06:43:40 san1 hastd[3343]: [disk8] (primary) Worker process exited ungracefully (pid=3419, exitcode=66).
Sep 24 06:43:40 san1 hastd[3343]: [disk7] (primary) Worker process exited ungracefully (pid=3416, exitcode=66).
Sep 24 06:43:40 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803).
Sep 24 06:43:45 san1 hastd[3348]: [disk1] (primary) Split-brain condition!
Sep 24 06:43:50 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803).
Sep 24 06:43:55 san1 hastd[3348]: [disk1] (primary) Split-brain condition!
Sep 24 06:44:00 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803).
Sep 24 06:44:05 san1 hastd[3348]: [disk1] (primary) Split-brain condition!
Sep 24 06:44:10 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803)
Is there any patch I need to run to fix this issue?
From: Jose A. Lombera [mailto:jose at lajni.com]
Sent: Sunday, September 23, 2012 10:00 PM
To: freebsd-current at freebsd.org
Cc: freebsd-current at freebsd.org
Subject: RE: zpool can't bring online disk2 ----I screwed up
Everytime I run this for any of the disk 3,4,5,6,7,8,9,10
Disk 1,2 shows in the /dev/hast
[root at san2 /usr/home/jose]# hastctl role primary disk3
[root at san2 /usr/home/jose]#
I got this in the logs.
Sep 23 21:58:13 san2 hastd[2793]: [disk3] (primary) Provider /dev/mfid3 is not part of resource disk3.
Please help.
Thanks.
From: Jose A. Lombera [mailto:jose at lajni.com]
Sent: Sunday, September 23, 2012 9:46 PM
To: 'Freddie Cash'
Cc: freebsd-current at freebsd.org
Subject: RE: zpool can't bring online disk2 ----I screwed up
Please, some one help me….!!!
I screw up big time.
I was doing the
Hastctl create disk2
But since I got some input out errors I decided to stop /etc/rc.d/hastd stop
But since couldn’t stop disk1 and 9 I killed it.
Restarted both servers.
And now only /dev/hast shows nothing.
And the pool is lost.
I was able to create disk2.
I have restarted both server but the pool is not coming up.
Any suggestions, please help I know that the info is there since I only did “hastctl create disk2” I haven’t done it for the other disks.
From: Jose A. Lombera [mailto:jose at lajni.com]
Sent: Sunday, September 23, 2012 8:10 PM
To: 'Freddie Cash'
Cc: freebsd-current at freebsd.org
Subject: RE: zpool can't bring online disk2
Freddie,
Thanks for your great help, now makes so much sense.
I still have a small problem, and I'm not sure if it is because hastd is running.
I can't initialize (hastctl create disk2) disk2
This is what I did.
1.. zpool offline tank /dev/dsk/hast/disk2
2. zpool status -x
[root at san /usr/home/jose]# zpool status -x
pool: tank
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 2012
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
hast/disk1 ONLINE 0 0 0
11919832608590631234 OFFLINE 0 0 0 was /dev/dsk/hast/disk2
hast/disk3 ONLINE 0 0 0
hast/disk4 ONLINE 0 0 0
hast/disk5 ONLINE 0 0 0
hast/disk6 ONLINE 0 0 0
hast/disk7 ONLINE 0 0 0
hast/disk8 ONLINE 0 0 0
hast/disk9 ONLINE 0 0 0
hast/disk10 ONLINE 0 0 0
errors: No known data errors
3. removed disk / insert a new one.
4. initialize
Hastctl role init disk2
[root at san /usr/home/jose]# hastctl status disk2
disk2:
role: init
provname: disk2
localpath: /dev/mfid2
extentsize: 0 (0B)
keepdirty: 0
remoteaddr: san1
replication: fullsync
dirty: 0 (0B)
statistics:
reads: 0
writes: 0
deletes: 0
flushes: 0
activemap updates: 0
[root at san /usr/home/jose]#
[root at san /usr/home/jose]#
[root at san /usr/home/jose]# hastctl create disk2
[ERROR] [disk2] Unable to write metadata: Input/output error.
I don't want to stop hastd since it will shut down the connection to my san.
Do you have any suggestion?
Thanks
--jose
-----Original Message-----
From: owner-freebsd-current at freebsd.org [mailto:owner-freebsd-current at freebsd.org] On Behalf Of Freddie Cash
Sent: Sunday, September 23, 2012 6:30 PM
To: compufutura -the computer of the future
Cc: yanegomi at gmail.com; freebsd-current at freebsd.org
Subject: RE: zpool can't bring online disk2
Since it's a HAST device, you have to initialise the disk via hastctl. Once that is done, the /dev/hast/disk2 GEOM device node will be created.
Then you can 'zpool replace' it.
One step at a time. :) And you've skipped a few.
1. 'zpool offline' the defective disk
2. Physically remove the defective disk
3. Physically insert the new disk
4. Initialise it as a HAST resource via 'hastctl'
5. 'zpool replace' it using the /dev/hast node 6. Wait for the pool (and HAST) to resilver it 7. Carry on as per normal On Sep 23, 2012 2:28 PM, "compufutura -the computer of the future" < <mailto:jose at compufutura.com> jose at compufutura.com> wrote:
> Yanegomi,
>
>
>
> I tried that, as you can see below, freebsd doesn’t have cfgadm
>
> Utility to un configure the device, according to,
> <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html, I
> looked to ports but there is no utility like that.
>
>
>
> Pardon me, my knowledge is little.
>
>
>
> Can you please type the command I will need, or if I need cfgadm do I
> have to look for that and install it in my freebsd box?
>
>
>
> Thanks.
>
>
>
>
>
> [root at san1 /usr/home/jose]# zpool offline tank hast/disk2
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]# zpool status -x
>
> pool: tank
>
> state: DEGRADED
>
> status: One or more devices has been taken offline by the administrator.
>
> Sufficient replicas exist for the pool to continue functioning
> in a
>
> degraded state.
>
> action: Online the device using 'zpool online' or replace the device
> with
>
> 'zpool replace'.
>
> scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19
> 2012
>
> config:
>
>
>
> NAME STATE READ WRITE CKSUM
>
> tank DEGRADED 0 0 0
>
> raidz1-0 DEGRADED 0 0 0
>
> hast/disk1 ONLINE 0 0 0
>
> 11919832608590631234 OFFLINE 0 0 0 was
> /dev/hast/disk2
>
> hast/disk3 ONLINE 0 0 0
>
> hast/disk4 ONLINE 0 0 0
>
> hast/disk5 ONLINE 0 0 0
>
> hast/disk6 ONLINE 0 0 0
>
> hast/disk7 ONLINE 0 0 0
>
> hast/disk8 ONLINE 0 0 0
>
> hast/disk9 ONLINE 0 0 0
>
> hast/disk10 ONLINE 0 0 0
>
>
>
> errors: No known data errors
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]# zpool replace tank hast/disk2
>
> cannot open 'hast/disk2': no such GEOM provider
>
> must be a full path or shorthand device name
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]# cfgadm
>
> bash: cfgadm: command not found
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]# zpool offline tank hast/disk2
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]# zpool status -x
>
> pool: tank
>
> state: DEGRADED
>
> status: One or more devices has been taken offline by the administrator.
>
> Sufficient replicas exist for the pool to continue functioning
> in a
>
> degraded state.
>
> action: Online the device using 'zpool online' or replace the device
> with
>
> 'zpool replace'.
>
> scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19
> 2012
>
> config:
>
>
>
> NAME STATE READ WRITE CKSUM
>
> tank DEGRADED 0 0 0
>
> raidz1-0 DEGRADED 0 0 0
>
> hast/disk1 ONLINE 0 0 0
>
> 11919832608590631234 OFFLINE 0 0 0 was
> /dev/hast/disk2
>
> hast/disk3 ONLINE 0 0 0
>
> hast/disk4 ONLINE 0 0 0
>
> hast/disk5 ONLINE 0 0 0
>
> hast/disk6 ONLINE 0 0 0
>
> hast/disk7 ONLINE 0 0 0
>
> hast/disk8 ONLINE 0 0 0
>
> hast/disk9 ONLINE 0 0 0
>
> hast/disk10 ONLINE 0 0 0
>
>
>
> errors: No known data errors
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]# zpool online tank hast/disk2
>
> warning: device 'hast/disk2' onlined, but remains in faulted state
>
> use 'zpool replace' to replace devices that are no longer present
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]# zpool replace tank hast/disk2
>
> cannot open 'hast/disk2': no such GEOM provider
>
> must be a full path or shorthand device name
>
> [root at san1 /usr/home/jose]#
>
> [root at san1 /usr/home/jose]#
>
>
>
> From: Garrett Cooper < <mailto:yanegomi at gmail.com> yanegomi at gmail.com>
> Date: September 23, 2012 12:25:52 PM PDT
> To: "Jose A. Lombera" < <mailto:jose at lajni.com> jose at lajni.com>
> Cc: <mailto:freebsd-current at freebsd.org> freebsd-current at freebsd.org
> Subject: Re: zpool can't bring online disk2
>
> On Sun, Sep 23, 2012 at 11:23 AM, Jose A. Lombera < <mailto:jose at lajni.com> jose at lajni.com> wrote:
>
>
>
> Hello! all,
>
>
>
> I hope someone can help me out with this.
>
>
>
> Recently disk2 when bad, I have used
>
>
>
> Zpool offline tank hast/disk2
>
>
>
> To bring the disk offline.
>
> Then I replaced it.
>
>
>
>
>
>
>
> And use the command
>
>
>
> Zpool online tank hast/disk2
>
>
>
> But the disk show REMOVE.
>
>
>
>
>
>
>
>
>
>
>
> [root at san1 /usr/home/jose]# zpool status -v
>
> pool: tank
>
> state: DEGRADED
>
> status: One or more devices has been removed by the administrator.
>
>
>
> Sufficient replicas exist for the pool to continue functioning
> in a
>
> degraded state.
>
>
>
> action: Online the device using 'zpool online' or replace the device
> with
>
>
>
> 'zpool replace'.
>
>
>
> scan: resilvered 2.49M in 0h2m with 0 errors on Sat Sep 22 01:03:13
> 2012
>
> config:
>
>
>
> NAME STATE READ WRITE CKSUM
>
>
>
> tank DEGRADED 0 0 0
>
>
>
> raidz1-0 DEGRADED 0 0 0
>
>
>
> hast/disk1 ONLINE 0 0 0
>
>
>
> 11919832608590631234 REMOVED 0 0 0 was
>
> /dev/hast/disk2
>
>
>
> hast/disk3 ONLINE 0 0 0
>
>
>
> hast/disk4 ONLINE 0 0 0
>
>
>
> hast/disk5 ONLINE 0 0 0
>
>
>
> hast/disk6 ONLINE 0 0 0
>
>
>
> hast/disk7 ONLINE 0 0 0
>
>
>
> hast/disk8 ONLINE 0 0 0
>
>
>
> hast/disk9 ONLINE 0 0 0
>
>
>
> hast/disk10 ONLINE 0 0 0
>
>
>
> [root at san1 /usr/home/jose]# zpool online tank hast/disk2
>
>
>
> warning: device 'hast/disk2' onlined, but remains in faulted state
>
>
>
> use 'zpool replace' to replace devices that are no longer present
>
>
>
> [root at san1 /usr/home/jose]#
>
>
>
> I can't bring it back online.
>
>
>
> Can you guys help me out what to do.
>
>
>
> This is a production server and I can't afford to bring the server down.
>
>
>
> I have already swap 3 disks and I got the same result.
>
>
>
> Thank you guys in advance.
>
>
> You forgot to call zpool replace as the last step in the process of
> replacing your faulted disk:
> <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html .
> Cheers,
> -Garrett
>
> _______________________________________________
> <mailto:freebsd-current at freebsd.org> freebsd-current at freebsd.org mailing list
> <http://lists.freebsd.org/mailman/listinfo/freebsd-current> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to " <mailto:freebsd-current-unsubscribe at freebsd.org> freebsd-current-unsubscribe at freebsd.org"
>
_______________________________________________
<mailto:freebsd-current at freebsd.org> freebsd-current at freebsd.org mailing list <http://lists.freebsd.org/mailman/listinfo/freebsd-current> http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to " <mailto:freebsd-current-unsubscribe at freebsd.org> freebsd-current-unsubscribe at freebsd.org"
More information about the freebsd-current
mailing list