Solaris 10单镜像盘
启动系统
1.概述
在Solaris系统重启后,发现其中一块镜像盘物理故障,或一块硬盘上的metadb或数据有丢失,导致系统启动时自动进入维护模式,本文档介绍在上述情况下启动系统的操作。
操作环境如下:
2.操作准备
1、准备一张Solaris 10的光盘用于进入单用户模式;
2.1.确认硬盘故障
查看系统启动告警信息
ok boot
Sun Ultra 45 Workstation, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.21.2, 4096 MB memory installed, Serial #68148048.
Ethernet address 0:14:4f:f:db:50, Host ID: 840fdb50.
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@9/pci@0/scsi@1/disk@0,0:a File and args: SunOS Release 5.10 Version Generic_147147-26 64-bit
Copyright (c) 1983, 2013, Oracle and/or its affiliates. All rights reserved. WARNING: md: d102: (Unavailable) needs maintenance
Hostname: test-01
Insufficient metadevice database replicas located.
Use metadb to delete databases which are broken.
Ignore any Read-only file system error messages.
Reboot the system when finished to reload the metadevice database.
After reboot, repair any broken database replicas which were deleted.
Mar 23 17:56:03 svc.startd[9]: svc:/system/metainit:default: Method "/lib/svc/method/svc-metainit" failed with exit status 96.
Mar 23 17:56:03 svc.startd[9]: system/metainit:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
Mar 23 17:56:05 svc.startd[9]: svc:/system/filesystem/usr:default: Method "/lib/svc/method/fs-usr" failed with exit status 95.
Mar 23 17:56:05 svc.startd[9]: system/filesystem/usr:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run
Root password for system maintenance (control-d to bypass): //输入root密码
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode //系统自动进入维护模式
Mar 23 17:59:19 su: 'su root' succeeded for root on /dev/console
Oracle Corporation SunOS 5.10 Generic Patch January 2005
#
在维护模式查看镜像状态,发现有一半子镜像需要维护,这一半子镜像都分布在同一块硬盘,这块硬盘上的metadb也处
于unknow状态,可以确定c1t1d0硬盘故障,拔出故障硬盘。# metastat
d130: Mirror
Submirror 0: d131
State: Okay
Submirror 1: d132
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 30722044 blocks (14 GB)
d131: Submirror of d130
State: Okay
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t0d0s3 0 No Okay Yes
d132: Submirror of d130
State: Needs maintenance
Invoke: metareplace d130 c1t1d0s3
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t1d0s3 0 No Maintenance Yes
d110: Mirror
Submirror 0: d111
State: Okay
Submirror 1: d112
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 20482820 blocks (9.8 GB)
d111: Submirror of d110
State: Okay
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t0d0s1 0 No Okay Yes
d112: Submirror of d110
State: Needs maintenance
Invoke: metareplace d110 c1t1d0s1
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t1d0s1 0 No Maintenance Yes
d100: Mirror
Submirror 0: d101
State: Okay
Submirror 1: d102
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 71683312 blocks (34 GB)
d101: Submirror of d100
State: Okay
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Okay Yes
d102: Submirror of d100
State: Needs maintenance
Invoke: metareplace d100 c1t1d0s0
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Maintenance Yes
Device Relocation Information:
Device Reloc Device ID
c1t1d0 Yes id1,sd@n5000c50002f5d577
c1t0d0 Yes id1,sd@n5000c50002f54e2b
#
#
#
#
#
# metadb
flags first blk block count
a m p lu 16 8192 /dev/dsk/c1t0d0s7 a p l 8208 8192 /dev/dsk/c1t0d0s7 a p l 16400 8192 /dev/dsk/c1t0d0s7
M W p l 16 unknown /dev/dsk/c1t1d0s7 M W p l 8208 unknown /dev/dsk/c1t1d0s7
M W p l 16400 unknown /dev/dsk/c1t1d0s7
#
3.解决方法1(修改system文件)
3.1.进入ok模式
在维护模式输入init 0进入ok模式
# init 0
# svc.startd: The system is coming down. Please wait.
svc.startd: 81 system services are now being stopped.
svc.startd: The system is down.
syncing file systems... done
Program terminated
ok
ok
3.2.从光盘引导进入单用户模式
把盘放入光驱,boot cdrom –s从光盘引导进入单用户模式
ok boot cdrom -s
Sun Ultra 45 Workstation, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.21.2, 4096 MB memory installed, Serial #68148048.
Ethernet address 0:14:4f:f:db:50, Host ID: 840fdb50.
Rebooting with command: boot cdrom -s
Boot device: /pci@1e,600000/pci@0/pci@1/pci@0/ide@1f/cdrom@0,0:f File and args: -s SunOS Release 5.10 Version Generic_147440-01 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
WARNING: i2c_0 failed to add interrupt.
WARNING: i2c_0 operating in POLL MODE only
WARNING: i2c_1 failed to add interrupt.
WARNING: i2c_1 operating in POLL MODE only
WARNING: i2c_0 failed to add interrupt
WARNING: Failed to open device(/pci@1f,700000:devctl), rv(19)
WARNING: ppm_init_cb: ppm domain domain_pciegfx will be offline.
Booting to milestone "milestone/single-user:default".
Configuring devices.
Using RPC Bootparams for network configuration information.
Attempting to configure interface ce3...
Skipped interface ce3
Attempting to configure interface ce2...
Skipped interface ce2
Attempting to configure interface ce1...
Skipped interface ce1
Attempting to configure interface ce0...
Configured interface ce0
Attempting to configure interface bge1...
Skipped interface bge1
Attempting to configure interface bge0...
Skipped interface bge0
Requesting System Maintenance Mode
SINGLE USER MODE
#
3.3.修改/etc/system文件
在format中查看硬盘,找到系统根目录所在分区,并把该分区挂载给/a。
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@1e,600000/pci@0/pci@9/pci@0/scsi@1/sd@0,0
Specify disk (enter its number): 0
selecting c1t0d0
[disk formatted]
format> p
partition> p
Current partition table (original):
Total disk cylinders available: 65533 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 root wm 4685 - 21080 34.18GB (16396/0/0) 71683312
1 swap wu 0 - 4684 9.77GB (4685/0/0) 20482820
2 backup wm 0 - 65532 136.62GB (65533/0/0) 286510276
3 home wm 21081 - 28107 14.65GB (7027/0/0) 30722044
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 28108 - 28113 12.81MB (6/0/0) 26232
partition>
#
#
# mount /dev/dsk/c1t0d0s0 /a //把c1t0d0s0分区mount到/a目录有时会提示需要fsck先检测磁盘的文件系统,如果文件系统损坏则无法mount #
# TERM=vt100;export TERM; //修改当前环境变量,用户vi编辑
# cd /a/etc //进入/a/etc/目录,不要直接vi /etc/system
#cp system system.bak //备份system文件
# vi system //编辑/a/etc/system文件
* Begin MDD root info (do not edit)
rootdev:/pseudo/md@0:0,100,blk
* End MDD root info (do not edit)
set md:mirrored_root_flag=1 //最后加入这一行,用户在metadb 仅一半有效时
候仍能启动系统
3.4.Umount目录/a
编辑好/a/etc/system文件后,取消mount的/a目录,不mount 可能会出现挂载问题;
# cd / //回到根目录,
# umount /a //umount 目录/a
3.5.重启系统并查看系统状态
重启系统,并查看系统状态。
# init 6
syncing file systems... done
Program terminated
Sun Ultra 45 Workstation, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.21.2, 4096 MB memory installed, Serial #68148048.
Ethernet address 0:14:4f:f:db:50, Host ID: 840fdb50.
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@9/pci@0/scsi@1/disk@0,0 File and args:
SunOS Release 5.10 Version Generic_147147-26 64-bit
Copyright (c) 1983, 2013, Oracle and/or its affiliates. All rights reserved. Hostname: test-01
test-01 console login: root
Password:
Last login: Mon Mar 24 11:54:36 on console
Mar 24 12:19:21 test-01 login: ROOT LOGIN /dev/console
Oracle Corporation SunOS 5.10 Generic Patch January 2005
#
#
# bash
bash-3.2# df -h
Filesystem size used avail capacity Mounted on
/dev/md/dsk/d100 34G 4.7G 29G 15% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 13G 1.7M 13G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
sharefs 0K 0K 0K 0% /etc/dfs/sharetab
/platform/sun4u-us3/lib/libc_psr/libc_psr_hwcap1.so.1
34G 4.7G 29G 15% /platform/sun4u-us3/lib/libc_psr.so.1
/platform/sun4u-us3/lib/sparcv9/libc_psr/libc_psr_hwcap1.so.1
34G 4.7G 29G 15% /platform/sun4u-us3/lib/sparcv9/libc_psr.so.1
fd 0K 0K 0K 0% /dev/fd
swap 13G 40K 13G 1% /tmp
swap 13G 40K 13G 1% /var/run
/dev/md/dsk/d130 14G 15M 14G 1% /export/home
/vol/dev/dsk/c0t0d0/sol_10_811_sparc
2.1G 2.1G 0K 100% /cdrom/sol_10_811_sparc
bash-3.2#
bash-3.2#
bash-3.2# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
M W p l 16 unknown /dev/dsk/c1t1d0s7
M W p l 8208 unknown /dev/dsk/c1t1d0s7
M W p l 16400 unknown /dev/dsk/c1t1d0s7
bash-3.2# metastat
d130: Mirror
Submirror 0: d131
State: Okay
Submirror 1: d132
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 30722044 blocks (14 GB)
d131: Submirror of d130
State: Okay
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s3 0 No Okay Yes
d132: Submirror of d130
State: Needs maintenance
Invoke: metareplace d130 c1t1d0s3
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s3 0 No Maintenance Yes
d110: Mirror
Submirror 0: d111
State: Okay
Submirror 1: d112
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 20482820 blocks (9.8 GB)
d111: Submirror of d110
State: Okay
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t0d0s1 0 No Okay Yes
d112: Submirror of d110
State: Needs maintenance
Invoke: metareplace d110 c1t1d0s1
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t1d0s1 0 No Maintenance Yes
d100: Mirror
Submirror 0: d101
State: Okay
Submirror 1: d102
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 71683312 blocks (34 GB)
d101: Submirror of d100
State: Okay
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Okay Yes
d102: Submirror of d100
State: Needs maintenance
Invoke: metareplace d100 c1t1d0s0
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Maintenance Yes
Device Relocation Information:
Device Reloc Device ID
c1t1d0 Yes id1,sd@n5000c50002f5d577
c1t0d0 Yes id1,sd@n5000c50002f54e2b
bash-3.2#
bash-3.2#
3.6.在线更换硬盘,重新同步数据
4.解决方法2(删除unknow的metadb)4.1.进入维护模式
系统缺少metadb会自动进入维护模式,
bash-3.2# init 6
bash-3.2# svc.startd: The system is coming down. Please wait.
svc.startd: 107 system services are now being stopped.
Mar 24 12:22:41 test-01 rpc.metad: Terminated
Mar 24 12:22:47 test-01 syslogd: going down on signal 15
svc.startd: The system is down.
syncing file systems... done
rebooting...
Sun Ultra 45 Workstation, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.21.2, 4096 MB memory installed, Serial #68148048.
Ethernet address 0:14:4f:f:db:50, Host ID: 840fdb50.
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@9/pci@0/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_147147-26 64-bit
Copyright (c) 1983, 2013, Oracle and/or its affiliates. All rights reserved.
Hostname: test-01
Insufficient metadevice database replicas located.
Use metadb to delete databases which are broken.
Ignore any Read-only file system error messages.
Reboot the system when finished to reload the metadevice database.
After reboot, repair any broken database replicas which were deleted.
Mar 24 12:24:55 svc.startd[9]: svc:/system/metainit:default: Method "/lib/svc/method/svc-metainit" failed with exit status 96.
Mar 24 12:24:55 svc.startd[9]: system/metainit:default misconfigured: transitioned to maintenance (see 'svcs -xv' for details)
Mar 24 12:24:57 svc.startd[9]: svc:/system/filesystem/usr:default: Method "/lib/svc/method/fs-usr" failed with exit status 95.
Mar 24 12:24:57 svc.startd[9]: system/filesystem/usr:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
Requesting System Maintenance Mode
(See /lib/svc/share/README for more information.)
Console login service(s) cannot run
Root password for system maintenance (control-d to bypass):
single-user privilege assigned to /dev/console.
Entering System Maintenance Mode
Mar 24 12:25:00 su: 'su root' succeeded for root on /dev/console
Oracle Corporation SunOS 5.10 Generic Patch January 2005
#
4.2.查看系统状态和配置
维护模式不能编辑文件,但可以查看镜像状态和系统文件,我们
可以查看镜像状态和system文件内容.
# metastat
d130: Mirror
Submirror 0: d131
State: Okay
Submirror 1: d132
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 30722044 blocks (14 GB)
d131: Submirror of d130
State: Okay
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s3 0 No Okay Yes
d132: Submirror of d130
State: Needs maintenance
Invoke: metareplace d130 c1t1d0s3
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s3 0 No Maintenance Yes
d110: Mirror
Submirror 0: d111
State: Okay
Submirror 1: d112
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 20482820 blocks (9.8 GB)
d111: Submirror of d110
State: Okay
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 0 No Okay Yes
d112: Submirror of d110
State: Needs maintenance
Invoke: metareplace d110 c1t1d0s1
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 0 No Maintenance Yes
d100: Mirror
Submirror 0: d101
State: Okay
Submirror 1: d102
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 71683312 blocks (34 GB)
d101: Submirror of d100
State: Okay
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Okay Yes
d102: Submirror of d100
State: Needs maintenance
Invoke: metareplace d100 c1t1d0s0
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Maintenance Yes
Device Relocation Information:
Device Reloc Device ID
c1t1d0 Yes id1,sd@n5000c50002f5d577
c1t0d0 Yes id1,sd@n5000c50002f54e2b
#
4.3.删除unknow的metadb文件
删除无效的metadb文件
# metadb
flags first blk block count
a m p lu 16 8192 /dev/dsk/c1t0d0s7
a p l 8208 8192 /dev/dsk/c1t0d0s7
a p l 16400 8192 /dev/dsk/c1t0d0s7
M W p l 16 unknown /dev/dsk/c1t1d0s7
M W p l 8208 unknown /dev/dsk/c1t1d0s7
M W p l 16400 unknown /dev/dsk/c1t1d0s7 #
#
# metadb -d /dev/dsk/c1t1d0s7
#
# metadb
flags first blk block count
a m p lu 16 8192 /dev/dsk/c1t0d0s7
a p l 8208 8192 /dev/dsk/c1t0d0s7
a p l 16400 8192 /dev/dsk/c1t0d0s7
#
4.4.重启系统并查看系统状态
删除3份无效的metadb后,系统启动时只检测到3份全部是有效的metadb文件,所以可以正常启动系统;
#
# init 6
# svc.startd: The system is coming down. Please wait.
svc.startd: 81 system services are now being stopped.
svc.startd: The system is down.
syncing file systems... done
rebooting...
Sun Ultra 45 Workstation, No Keyboard
Copyright 2005 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.21.2, 4096 MB memory installed, Serial #68148048.
Ethernet address 0:14:4f:f:db:50, Host ID: 840fdb50.
Rebooting with command: boot
Boot device: /pci@1e,600000/pci@0/pci@9/pci@0/scsi@1/disk@0,0:a File and args:
SunOS Release 5.10 Version Generic_147147-26 64-bit
Copyright (c) 1983, 2013, Oracle and/or its affiliates. All rights reserved.
Hostname: test-01
test-01 console login: Mar 24 12:28:47 test-01 sendmail[531]: My unqualified host name (test-01) unknown; sleeping for retry
test-01 console login: root
Password:
Mar 24 12:29:04 test-01 login: ROOT LOGIN /dev/console
Last login: Mon Mar 24 12:19:21 on console
Oracle Corporation SunOS 5.10 Generic Patch January 2005
#
#
# bash
bash-3.2# metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
bash-3.2# metastat
d130: Mirror
Submirror 0: d131
State: Okay
Submirror 1: d132
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 30722044 blocks (14 GB)
d131: Submirror of d130
State: Okay
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s3 0 No Okay Yes
d132: Submirror of d130
State: Needs maintenance
Invoke: metareplace d130 c1t1d0s3
Size: 30722044 blocks (14 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s3 0 No Maintenance Yes
d110: Mirror
Submirror 0: d111
State: Okay
Submirror 1: d112
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 20482820 blocks (9.8 GB)
d111: Submirror of d110
State: Okay
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s1 0 No Okay Yes
d112: Submirror of d110
State: Needs maintenance
Invoke: metareplace d110 c1t1d0s1
Size: 20482820 blocks (9.8 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s1 0 No Maintenance Yes
d100: Mirror
Submirror 0: d101
State: Okay
Submirror 1: d102
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 71683312 blocks (34 GB)
d101: Submirror of d100
State: Okay
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Okay Yes
d102: Submirror of d100
State: Needs maintenance
Invoke: metareplace d100 c1t1d0s0
Size: 71683312 blocks (34 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Maintenance Yes
Device Relocation Information:
Device Reloc Device ID
c1t1d0 Yes id1,sd@n5000c50002f5d577
c1t0d0 Yes id1,sd@n5000c50002f54e2b
bash-3.2#
bash-3.2# metastat -p
d130 -m d131 d132 1
d131 1 1 c1t0d0s3
d132 1 1 c1t1d0s3
d110 -m d111 d112 1
d111 1 1 c1t0d0s1
d112 1 1 c1t1d0s1
d100 -m d101 d102 1
d101 1 1 c1t0d0s0
d102 1 1 c1t1d0s0
bash-3.2#
5.总结
5.1.启动检测metadb文件
做了raid1镜像的Solars系统在启动的时候,会检测metadb;
默认情况下,有效的metadb数量大于总数的50%时,系统才会正常启动,否则系统会自动进入维护模式;如果在/etc/system文件中加入(set md:mirrored_root_flag=1)时,只有50%有效的metadb文件,仍可以正常启动。所以在一块硬盘出现故障时有两种解决方法:
1、修改system文件,加入(set md:mirrored_root_flag=1)
2、删除无效的metadb副本,使有效的metadb文件超过总数
的50%。