IBM XIV系列存储快速维护手册
- 格式:docx
- 大小:11.30 MB
- 文档页数:25
XXXX公司IBM服务器和存储设备维护方案目录前言 (3)项目背景和需求 (3)一、服务方案制定原则 (3)二、保修服务内容和标准 (4)1) 故障排除 (4)2) 定期巡检 (4)3) 培训 (4)4) 增值服务 (4)服务标准 (5)三、服务实施细则 (5)1) 前期工作 (5)2) 故障预防建议 (5)3) 故障排除 (6)4) AIX常用故障诊断技术 (6)5) 巡检 (8)6) 备份与恢复策略 (9)7) 项目实施计划 (11)8) 工作结果与工作报告 (11)四、服务保障措施 (12)1) 备件保障 (12)2) 本地化服务 (13)3) 组织和人员保障 (13)4) 安全条款 (15)5) 巡检 (15)6) 服务监督 (15)五、应急预案 (16)1) 备机替换 (16)2) 紧急调用工程师 (16)3) 紧急调用备件 (16)4) 紧急调用第三方资源 (16)5) 远程诊断 (17)六、本公司在本项目中的优势 (18)1)悠久的服务历史 (18)2)切实有效的服务保障方案 (19)前言非常感谢XXXX公司领导给予我公司的机会,我们怀着极大的热情精心组织、精心设计有关XXXX公司IBM小型机和相关存储的保修方案,特提交此保修服务方案建议书供领导和相关专家参阅。
项目背景和需求中国XXXX公司为了满足业务需要,采用了大量的IBM小型机和相关存储设备。
为了保证业务的可持续运行,需要有专业的厂商提供保证硬件环境7X24可用性的能力。
而作为上市公司,XXXX公司希望在保证满足质量要求的前提下有更合理的最具性价的服务方案和相关厂商。
人民币金额(大写):叁万捌仟壹佰元整一、服务方案制定原则本方案主要针对XXXX公司的IBM服务器主机和相关的操作系统、数据库、系统软件制定合理科学的维保策略。
方案的制定遵循以下原则:●业务为中心:本项目的最终目标是保证业务系统的安全和可靠运行。
包括计算机系统的可靠运行和业务数据的安全保证,我们将动用一切有效的措施手段,力求业务系统万无一失,我们的目标是:“非正常性停机时间为零”。
IBM XIV系列存储快速维护手册目录一、XIV的基本架构和组成: (3)二、XIV的基本管理: (5)三、XIV的安装环境要求: (6)四、XIV安装上电 (8)五、如何升级XIV的微码 (11)六、如何收集XIV的System Data(XRAY) (11)七、如何通过GUI更换单个的DDM (15)八、如何通过GUI更换一个模块(Interface Module 或Data Module) (20)附录A:参考文档: (25)一、XIV的基本架构和组成:IBM的XIV系统存储2810-A14,提供了高性能、可扩展性、易于管理功能。
同时采用了SATA磁盘,更加降低了单位容量的成本。
XIV存储主要包括以下组成部分:1.6个主机系统接口模块(Interface Module),它提供FC以及iSCSI接口,同时每个模块包括了12块1TB的SATA硬盘。
2.9个数据模块(Data Module),每个模块包括了12块1TB的SATA硬盘。
3.3组不间断电源模块(UPS),提供掉电保护功能。
4.2个以太网交换机,提供模块间的数据交换功能。
5.一个维护模块(Maintenance Module)6.一个自动电源分配管理模块(ATS),管理外部输入电源的冗余。
7.一个Modem,连接到维护模块,提供CallHome和远程维护。
除此之外,为了方便和外部系统相连接,XIV将所有要和外部相连接的端口都集中到了一个叫Patch Panel的模块。
Patch Panel位于机柜后面右上角。
具体的端口定义如下图。
二、XIV的基本管理:XIV主要是通过XIV Storage Manager GUI进行管理的。
通过GUI完成以下的管理功能:1.XIV硬件的维护和备件的更换2.XIV存储容量的逻辑划分3.XIV和主机系统连接的配置管理4.系统性能的检测5.系统快照、拷贝、远程拷贝、数据迁移功能的管理除此之外XIV还提供了命令行方式的管理XCLI,所有能在GUI中完成的功能也都够通过XCLI完成。
ibmxiv ups指导手册摘要:1.引言2.IBM XIV UPS 系统简介3.IBM XIV UPS 系统组件4.IBM XIV UPS 系统配置与安装5.IBM XIV UPS 系统监控与维护6.IBM XIV UPS 系统故障排除7.总结正文:IBM XIV UPS(不间断电源系统)是IBM 公司推出的一款高可靠性的电源解决方案,旨在为关键任务应用提供高效、稳定的电源保障。
本指导手册将详细介绍IBM XIV UPS 系统的相关知识和使用方法。
1.引言IBM XIV UPS 系统专为满足现代数据中心的严格需求而设计,具有高可靠性、高可用性和高灵活性等特点。
通过本手册,您可以了解IBM XIV UPS 系统的关键特性、应用场景以及优势。
2.IBM XIV UPS 系统简介IBM XIV UPS 系统采用先进的数字信号处理技术,可提供卓越的电力质量、强大的抗干扰能力和高效的能源利用率。
它适用于各种关键任务应用,如服务器、存储、网络设备等,以确保数据中心的持续运行。
3.IBM XIV UPS 系统组件IBM XIV UPS 系统主要由以下几个组件组成:- 输入/输出配电单元(PDU)- 静态旁路开关(SBS)- 电池模块(BM)- 智能监控模块(IMM)- 网络管理模块(NMM)每个组件都具有特定的功能和性能,共同确保IBM XIV UPS 系统的稳定运行。
4.IBM XIV UPS 系统配置与安装在配置和安装IBM XIV UPS 系统时,请遵循以下步骤:- 确认硬件和软件要求- 安装输入/输出配电单元- 安装静态旁路开关- 安装电池模块- 安装智能监控模块- 安装网络管理模块- 配置系统参数正确安装和配置IBM XIV UPS 系统是确保其正常运行的关键。
5.IBM XIV UPS 系统监控与维护为了保持IBM XIV UPS 系统的稳定运行,需要定期对其进行监控和维护。
具体措施包括:- 检查系统运行状况- 监控系统参数- 维护电池模块- 更新系统软件通过有效的监控和维护,可以确保IBM XIV UPS 系统在关键时刻发挥其应有的作用。
目录一、报修 (2)二、P570小型机维护 (2)1) 开关机流程 (2)2) 日常维护 (3)3) 硬件诊断 (4)三、DS4800存储维护 (5)1) DS4800的开关机步骤 (5)2) DS4800 的日常维护 (5)四.DS8100存储维护 (6)1)如何将DS8100关闭和加电 (6)2) DS8100 的日常维护 (8)五.DS8300存储维护 (9)1)如何将DS8300关闭和加电 (9)2) DS8300 的日常维护 (11)IBM小型机简易维护手册 一、报修如果碰到硬件或者软件故障,请打IBM 800免费报修电话IBM硬件报修电话8008106677IBM软件报修电话8008101818报修前需要准备:1)机器序列号(如9113-550 10-593ED)如图所示:2)客户单位3)客户联系人及电话4)机器所在城市5)问题描述6)相关日志二、P570小型机维护1) 开关机流程1.开机A 无分区:1)检查电源是否插好2)液晶面板出现“OK”字样,指示灯2秒钟闪烁一次,表示机器此时处在关机状态。
3)按下前面板上白色按钮后,主机会进入硬件自检和引导阶段;液晶面板会显示开机过程码,每一代码表示自检或引导的不同阶段,引导结束时,液晶面板代码消失,终端上有显示,进入AIX操作系统初始化,最后会出现登录提示。
4)如果主机长时间停留在某一代码上(大于20分钟),说明主机或操作系统有故障,请打IBM硬件保修电话8008106677,并提供相关代码。
B 有分区:1)检查电源是否插好2)在HMC中看Service Management里面对应服务器的状态,应为Power off状态3)选中对应的服务器,选中Power On, 选项为Partition to Standby,点击OK4)主机开始硬件自检,启动结束后,在HMC中看到对应的服务器为Standby状态5)选中该主机的对应分区,点击“Active”,启动分区2 关机A 无分区:1)停应用2)shutdown -F停操作系统,如果机器全分区,液晶面板会显示停机过程码,最后出现“OK”字样,指示灯2秒钟闪烁一次。
目录IBM System x3650 M2 7947 型 (1)3650M2前视图 (1)X3650M2操作员信息面板 (3)光通路诊断面板 (4)后视图 (5)IBM System x3650 M3 7945 型 (7)x3650m3正视图 (7)X3560m3操作员信息面板 (8)光通路诊断面板 (9)电源部分&指示灯 (15)IBM System x3650 M4 7915 型 (18)IBM System x3650M4正视图 (18)操作员信息面板 (19)X3650M4光通路诊断面板 (20)服务器电源功能 (27)IBM System x3500 M4 7383 型服务器 (29)X3500M4正视图 (31)X3500M4光通路诊断 (33)3500M4光通路诊断指示灯 (35)3500M4 后视图 (40)IBM System x3650 M2 7947 型3650M2前视图下图显示了服务器前部的控件、接口和硬盘驱动器托架。
硬盘驱动器活动指示灯:每个热插拔硬盘驱动器都具有一个活动指示灯。
当该指示灯闪烁时,表示该驱动器正在使用中。
硬盘驱动器状态指示灯:每个热插拔硬盘驱动器都具有一个状态指示灯。
当该指示灯点亮时,表示该驱动器发生了故障。
如果该指示灯缓慢闪烁(每秒闪烁一次),表示正在将该驱动器重新构建为RAID 配置的一部分。
当该指示灯快速闪烁(每秒闪烁三次)时,表示控制器正在识别该驱动器。
视频接口:将显示器连接到该接口。
可同时使用服务器前部和后部的视频接口。
USB 接口:这两个USB 接口可以连接USB 设备,如USB 鼠标、键盘或其他USB设备。
操作员信息面板:该面板包含控件、指示灯和接口。
有关操作员信息面板上的控件和指示灯的信息,请参阅第10 页的『操作员信息面板』。
机架释放滑锁:按下这些滑锁可以从机架上卸下服务器。
CD/DVD 弹出按钮:按该按钮可从CD-RW/DVD 驱动器中取出CD 或DVD。
建行IBM小型机日常维护第一篇:日常维护部分第二篇:故障处理部分第三篇:安图特公司技术支持第一篇日常维护部分目录第1章AIX系统管理日常工作(检查篇) (1)1.1 常用的命令 (1)1.2 语法介绍 (1)1.2.1 vmstat:检查存、CPU、进程状态 (1)1.2.2 sar:检查CPU、IO (2)1.2.3 PS:检查进程状态命令 (3)1.2.4 svmon:显示进程占用存 (3)1.2.5 iostat:显示磁盘IO (4)1.2.6 netstat, entstat:显示网卡信息 (4)1.2.7 no:显示tcpip参数设置 (5)1.2.8 其它命令 (5)第2章AIX系统管理日常工作(LV篇) (6)2.1 IBM AIX系统管理的日常工作 (6)2.1.1 开关机步骤 (6)2.1.2 用户组及用户管理 (6)2.1.3 文件系统维护 (6)2.1.4 系统日常管理 (7)2.1.5 系统备份 (7)2.1.6 定时清洗磁带机 (7)2.1.7 定时检查设备指示灯状态 (7)2.1.8 简单故障的判断 (7)2.1.9 熟悉ibm aix操作系统 (7)2.2 关于IBM AIX的逻辑卷管理 (7)2.3 LVM命令 (8)第3章AIX系统管理日常工作(关键参数检查篇) (10)3.1 AIO参数检查 (10)3.2 磁盘阵列QUEUE_DEPTH参数检查 (11)3.3 用户参数检查 (11)3.4 激活SSA F AST-W RITE C ACHE (12)3.5 IO参数设置 (12)3.6 SYNCD DAEMON的数据刷新频率 (12)3.7 检查系统硬盘的镜像 (12)第4章AIX系统管理日常工作(性能分析篇) (13)4.1 性能瓶颈定义 (13)4.2 性能围 (14)第5章AIX系统管理日常工作(SHUTDOWN篇) (14)5.1 概念 (14)5.2 关机命令 (14)第6章AIX系统管理日常工作(备份与恢复篇) (15)6.1 用SMIT备份 (15)6.2 手工备份 (15)6.3 恢复系统 (15)第7章HACMP的双机系统的管理和维护 (15)7.1 HACMP双机系统的启动 (15)7.2 HACMP双机系统的关闭 (16)7.3 察看双机系统的当前状态 (16)7.4 HACMP环境下的排错 (17)7.4.1 了解问题的存在 (17)7.4.2 判断问题的出处 (18)第1章AIX系统管理日常工作(检查篇)1.1常用的命令1.2语法介绍1.2.1vmstat:检查存、CPU、进程状态# vmstat 1 15kthr memory page faultscpu----- ----------- ------------------------------------ -----------r b avm fre re pi po fr sr cy in sy csus sy id wa1 0 28132 81277 0 0 0 0 0 0 132 375 67 65 1 342 0 28132 81277 0 0 0 0 0 0 127 338 131 99 0 02 0 28132 81277 0 0 0 0 0 0 132 316 131 99 0 02 0 28132 81277 0 0 0 0 0 0 120 317 99 0 0 02 0 28132 81277 0 0 0 0 0 0 146 316 127 99 0 02 0 28132 81277 0 0 0 0 0 0 130 317 125 99 0 02 0 28132 81277 0 0 0 0 0 0 316 127 99 0 0 02 0 28132 81277 0 0 0 0 0 0 129 317 124 99 0 02 0 28132 81277 0 0 0 0 0 0 304 127 99 0 0 0r:正在运行的进程b:被阻挡的进程avm:活动的虚存,单位4kbfre:自由列表,位4kbpo:页换出pi:页换入sy:系统占用CPUid:空闲CPUwa:等待的CPU1.2.2sar:检查CPU、IO例如:sar -u 1 30sar -P ALL 1 10语法:sar -[abckmqruvwyA] inteval repetition-b buffer 活动-c 系统调用-k 核进程统计.-m 消息及信号量活动-q 正在运行的队列数及等待队列数-r 页交换统计-u CPU利用-P CPU负载.1.2.3 PS:检查进程状态命令ps:显示当前SHELL重所有进程ps -ef :显示系统中所有进程,-f显示更详细信息ps -u oracle:显示oracle用户进程ps –emo THREAD:显示线程信息ps au;ps vg:按使用时间显示进程(最近一次调用)ps aux:按使用时间显示进程(进程启动)1.2.4 svmon:显示进程占用存svmon –G:显示系统占用存svmon -C command_name:显示某个用户进程占用存svmon -P pid显示某个进程占用存svmon –S:显示段占用存1.2.5iostat:显示磁盘IOtty: tin tout avg-cpu: % user % sys % idle %iowait0.0 4.0 0.9 1.3 95.4 2.5Disks: % tm_act Kbps tps Kb_read Kb_wrtnhdisk0 58.4 218.3 41.2 172 920hdisk1 16.8 85.6 21.4 428 0hdisk2 50.6 223.9 55.6 1100 20hdisk3 16.8 85.6 21.4 428 0hdisk4 0.0 0.0 0.0 0 0hdisk5 43.4 279.1 69.8 1396 0hdisk6 0.0 0.0 0.0 0 0hdisk7 16.4 27.2 20.2 0 136hdisk8 0.0 0.0 0.0 0 0hdisk9 9.4 156.0 11.4 0 780hdisk10 16.4 27.2 20.2 0 136cd0 0.0 0.0 0.0 0 01.2.6n etstat, entstat:显示网卡信息netstat en0:显示en0信息netstat –s:显示网络信息netstat -m显示网络 buffers.netstat -i显示网卡状态netstat -I en0 1显示eno网卡塞(1秒间隔)1.2.7n o:显示tcpip参数设置no –a:显示tcpip所有参数当前设置no -o tcp_keepalivetime=7200000设置tcp_keepalivetime等于3600000秒no -d 恢复默认值注:该方法在重启后失效1.2.8其它命令第2章AIX系统管理日常工作(LV篇)2.1IBM AIX系统管理的日常工作系统管理员对小型机系统的正确管理是系统稳定运行的保障,作为系统管理员应注意以下几个方面:2.1.1开关机步骤在系统管理员控制下进行正确的操作。
IBM XIV Storage System SoftwareHost System Attachment Guide for AIX 1.0.2.0Note:Before using this information and the product it supports,read the general information in“Notices”on page5.Third Edition(2009)The following paragraph does not apply to any country(or region)where such provisions are inconsistent with local law.INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION“AS IS”WITHOUT WARRANTY OF ANY KIND,EITHER EXPRESS OR IMPLIED,INCLUDING,BUT NOT LIMITED TO,THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.Some states(or regions)do not allow disclaimer of express or implied warranties in certain transactions;therefore,this statement may not apply to you.Order publications through your IBM representative or the IBM branch office serving your locality.©Copyright International Business Machines Corporation2009.US Government Users Restricted Rights–Use,duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.Host System Attachment Guide for AIXThe1.0.2.0version of the Host System Attachment for AIX is available forinstallation or upgrade.It is the only version that is supported by IBM XIV.Note:Do not install previous versions of this package.Standard IBM InstallationThe Host System Attachment for AIX is an IBM standard package:disk.fcp.2810.rte1.0.2.0You can install it via:v Smitty installv Invoking geninstallv Directly,via the installp commandThe XIV package depends on AIX fixes that must be installed prior to theinstallation.In addition,the IBM XIV installation package has to be configured,also prior to the installation.And the XIV package performs an importantpost-install function.For these reasons,be sure to follow the detailed installation procedure described inthe section’Detailed Installation Procedure’below.Installation and upgradeUse this package for a clean installation or to upgrade from previous versions.IBMsupports a connection by AIX hosts to IBM XIV Storage System only when thispackage is installed.Connections between AIX hosts to IBM XIV Storage System are supported onlywhen this package is installed.Removing this packageThis package can be removed,if support for connection to IBM-XIV StorageSystem is no longer required,though there is no requirement to do so.To remove the installed package:1.Upgrade to this version.2.Remove the installed package.3.Reboot immediately after a successful removal.Installing previous versionsDo not install previous versions of this package.Installing previous versions maycause problems which could render your server unbootable.©Copyright IBM Corp.20091Removal of customized parametersWhen being installed,the package removes any customized parameters for AIXhdisk devices corresponding to XIV disks.Configuration decision-load balancinground_robinBy default,the package configures multipathing to work in a round_robin modewith queue depth of1.Currently,queue depth must be1if you are multipathingin round_robin mode.The limitation of queue_depth=1in round_robin mode forXIV devices will be removed in a future version of AIX.failover_onlyAs setting the queue depth to1may incur a performance penalty,you may alterthe multipathing mode to failover_only and use a higher queue depth,such as32.Note:If you choose failover_only,then only one path will be used for any givenhdisk.Setting environment variablesTo make failover_only algorithm and a given queue depth the default,set thefollowing environment variables prior to package installation to:export XIV_DEF_ALGO=failover_onlyexport XIV_DEF_QD=32(or another value)Required eFixThe package installation requires certain IBM eFix to be installed prior to itsinstallation.Please see the following table.AIX version Required patchAIX5.3TL7and previous Not supportedAIX5.3TL7IZ28969AIX5.3TL8IZ28970AIX5.3TL9IZ28047AIX5.3TL9and later iFIX/APAR is not requiredAIX6.1TL0IZ28002AIX6.1TL1IZ28004AIX6.1TL2IZ28079AIX6.1TL2and later iFIX/APAR is not requiredA detailed installation procedure1.Check whether you have a regular AIX fix package installed for the2810.instfix-iv|grep"2810round".v If you have a regular fix package installed,go to step6below.v If you do not have a regular fix package installed,proceed to step2.2IBM XIV Storage System Software:Host System Attachment Guide for AIX1.0.2.02.Download either a regular fix(according to the table above)or the latestservice pack.3.Check whether you have an eFIX for XIV installedemgr-l|grep XIV.v If you don’t have an eFIX for XIV installed,go to step5below.v If you do have an eFIX for XIV installed,note the eFIX ID and proceed tostep4.4.Remove the eFIX with emgr-r-L(use the eFIX ID you got at step3above).***Do NOT reboot***.5.eFIX installation.Install the downloaded regular fix or a full service pack.Be sure installation is successful.***Do NOT reboot***.6.XIV package installation.Install the XIV package.Reboot immediately after the successful XIV package installation.Following the installationFollowing the package installation and reboot,XIV Fibre Channel disks are multipathed and seen as:IBM2810XIV Fibre Channel Disks.LUN0(if a real volume is not assigned to it)is not multipathed and is seen as: IBM2810XIV-LUN-0Fibre Channel Array Controller.This is a correct and supported configuration.If XIV disks are seen asMPIO Other FC SCSI Disk Drive,it means that the XIV package is not installed and you got some built-in AIX multipathing support for XIV introduced in AIX service packs released in August 2008(for example,AIX6.1TL2).Such a configuration is not supported by IBM XIV.IBM supports a connection by AIX hosts to IBM-XIV Storage System only when this package is installed.Host System Attachment Guide for AIX34IBM XIV Storage System Software:Host System Attachment Guide for AIX1.0.2.0NoticesThis information was developed for products and services offered in the U.S.A.IBM®may not offer the products,services,or features discussed in this documentin other countries.Consult your local IBM representative for information on theproducts and services currently available in your area.Any reference to an IBMproduct,program,or service is not intended to state or imply that only that IBMproduct,program,or service may be used.Any functionally equivalent product,program,or service that does not infringe any IBM intellectual property right maybe used instead.However,it is the user’s responsibility to evaluate and verify theoperation of any non-IBM product,program,or service.IBM may have patents or pending patent applications covering subject matterdescribed in this document.The furnishing of this document does not give youany license to these patents.You can send license inquiries,in writing,to:IBM Director of LicensingIBM CorporationNorth Castle DriveArmonk,NY10504-1785U.S.A.The following paragraph does not apply to the United Kingdom or any othercountry where such provisions are inconsistent with local law:INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THISPUBLICATIONS“AS IS”WITHOUT WARRANTY OF ANY KIND,EITHEREXPRESS OR IMPLIED,INCLUDING,BUT NOT LIMITED TO,THE IMPLIEDWARRANTIES OF NON-INFRINGEMENT,MERCHANTABILITY OR FITNESSFOR A PARTICULAR PURPOSE.Some states do not allow disclaimer of express orimplied warranties in certain transactions,therefore,this statement may not applyto you.This information could include technical inaccuracies or typographical errors.Changes are periodically made to the information herein;these changes will beincorporated in new editions of the publications.IBM may make improvementsand/or changes in the product(s)and/or program(s)described in this publicationat any time without notice.Any references in this information to non-IBM Web sites are provided forconvenience only and do not in any manner serve as an endorsement of those Websites.The materials at those Web sites are not part of the materials for this IBMproduct and use of those Web sites is at your own risk.IBM may use or distribute any of the information you supply in any way itbelieves appropriate without incurring any obligation to you.©Copyright IBM Corp.20095Part Number:GC27-2258-00 U.S.A。
I B M X I V®S t o r a g e S y s t e m Performance ReinventedW h i t e P a p e rSeptember 2008Copyright IBM Corporation 2008IBM, the IBM logo, , System Storage, XIV, and the XIV logo are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at /legal/copytrade.shtml .Other company, product, or service names may be trademarks or service marks of others.This document could include technical inaccuracies or typographical errors. IBM may not offer the products, services or features discussed in this document in other countries, and the product information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. Any statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. The information contained in this document is current as of the initial date of publication only and is subject to change without notice. All performance information was determined in a controlled environment. Actual results may vary. Performance information is provided “AS IS” and no warranties or guarantees are expressed or implied by IBM. Information concerning non-IBM products was obtained from the suppliers of their products, their published announcements or other publicly available sources. Questions on the capabilities of the non-IBM products should be addressed with the suppliers. IBM does not warrant that the information offered herein will meet your requirements or those of your distributors or customers. IBM provides this information “AS IS” without warranty. IBM disclaims all warranties, express or implied, including the implied warranties of noninfringement, merchantability and fitness for a particular purpose or noninfringement. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.*******************ContentsIntroduction (1)The XIV System: Architecture and Performance (1)Optimal Exploitation of All System Resources (1)Integrating Cache and Disk in Each Module (2)Huge CPU Power (2)High Performance without Management Effort (2)High Performance with Snapshots (2)Disk Mirroring vs. Parity-based Protection (3)Maintaining Performance Consistency through Failures (5)Traditional storage: Degradation during the Rebuild Process (5)Traditional storage: Degradation Due to Write-through Mode (5)The XIV System: Performance in the Field (6)Scenario #1: Write-intensive Database (6)Scenario #2: E-mail Appliances (6)Scenario #3: Voice-recording Application (7)Scenario #4: E-mail Server (7)Summary (7)IntroductionOne of the major requirements of any SAN administration team is to provide users and applications with adequate performance levels. This task becomes increasingly difficult with demands for high performance growing while budgets for storage systems, administration efforts, and power consumption are diminishing.This document describes how the IBM® XIV™ Storage System provides an outstanding and in many ways unprecedented solution to today's performance requirements. It does so by achieving the following:►Providing high performance through a massively parallelized architecture, optimal exploitation of all system components (including disks, CPUs, andswitches), and an innovative cache design.►Ensuring that performance levels are kept intact when adding storage capacity, adding volumes, deleting volumes, or resizing volumes.►Guaranteeing the same performance level, even throughout variations of the applications' access patterns►Providing high performance without any planning or administration efforts►Providing consistent performance levels even through hardware failures►Maintaining high performance even while using snapshotsThe XIV System: Architecture and Performance Optimal Exploitation of All System ResourcesEach logical volume in the XIV system is divided into multiple stripes of one megabyte. These stripes are spread over all the disks in the system, using a sophisticated pseudo-random distribution mechanism.This revolutionary approach ensures that:►All disks and modules are utilized equally, regardless of access patterns.Despite the fact that applications may access certain volumes more frequently than other volumes or access certain parts of a volume more frequently than other parts, the load on the disks and modules remains balanced perfectly.►Pseudo-random distribution ensures consistent load balancing even after adding, deleting, or resizing volumes, as well as after adding or removinghardwareIntegrating Cache and Disk in Each ModuleUnlike traditional storage systems, the XIV system’s design embeds the read/write cache in the same hardware module as the disks. This unique design aspect has several advantages:►Distributed Cache. The cache is implemented as a distributed cache, so that all cache units can concurrently serve host I/Os and perform cache-to-disk I/O. This ensures that cache never becomes a bottleneck. In contrast, traditional storage systems use a central memory architecture, which has significant overhead due to memory locking.►High Cache-to-Disk Bandwidth. Aggressive prefetching is enabled by the fact that cache-to-disk bandwidth is the internal bandwidth of a module, providingdozens of gigabytes per second for the whole system.►Powerful Cache Management. Its unique cache design enables the XIV system to read a large cache slot per each disk read, while managing least-recently-used statistics in small cache slots. This unique combination is made possible by the system’s huge processing power and high cache-to-disk bandwidth.Huge CPU PowerEach data module has its own quad-core processor, giving the XIV system dozens of CPU cores. The system uses this vast processing power to execute advanced caching algorithms that support small cache slots, enable powerful snapshot performance, and so on. The massive CPU power ensures high performance through high cache-hit rates and minimal snapshot overhead.High Performance without Management EffortUnlike other storage systems, the XIV system is fully virtualized. The user has no control over the allocation of volumes to physical drives. As a result, the XIV system's high performance is gained with no planning efforts. The user does not have to allocate volumes to specific disk drives or shelves, nor is there a need to reconsider these decisions when new volumes are required, new hardware is added, or application access patterns change.Instead, the XIV system always ensures optimal utilization of all resources in a way that is transparent to the hosts and storage administration team.High Performance with SnapshotsMany storage systems can provide the required performance levels as long as snapshots are not defined. This is because snapshot functionality was added to these systems long after their initial design. As soon as snapshots are defined, performance levels in many cases degrade to unacceptable levels. Some systems solve this problem by using full copies instead of differential snapshots.The XIV system has been designed from inception to support snapshots. Its combination of innovative replication algorithms and massive CPU and cache power keep the impact of snapshots on performance to a minimum. Specifically, it achieves this as follows:►The traditional copy-on-write technique is replaced by the more efficient redirect-on-write technique, eliminating unnecessary copies►Redirect-on-write is always performed within the same module where data is being copied between disks. This architecture provides a huge performanceboost compared with the traditional method of copying between modules.►Snapshot write overhead does not depend on the number of snapshots or volume size►Zero read overhead for volumes and snapshots►Zero overhead when writing in unformatted areasDisk Mirroring vs. Parity-based ProtectionToday’s storage administrators face the dilemma of deciding which protection scheme to choose for their data: mirroring or parity-based. The XIV system uses mirroring protection, in which each piece of data is written on two disks. When comparing the XIV system to other systems, keep in mind that the propose configurations of other systems often involve RAID-5 or even RAID-6 protections, which create several performance problems:►Each host write translates into two disk writes and two disk reads (or even three writes and three reads in RAID-6) compared to two disk writes in mirroring.►RAID-5/6-based rebuild time is much longer, hence extending the time of reduced performance due to disk rebuild whenever a disk fails.►With RAID-5/6, upon a rebuild, each read request to the failed area is served through multiple reads and computing an XOR, creating a huge performanceoverhead.The XIV system architecture is shown in the following diagram:Figure 1: XIV ArchitectureMaintaining Performance Consistency through FailuresIn many storage systems, even those considered tier-1, performance levels can degrade significantly upon a hardware failure. This is unacceptable in today's world, since a reduction in performance levels means, in many cases, downtime for the applications.This section shows how traditional architectures create performance degradation due to hardware problems and how the XIV system solves this problem.Traditional storage: Degradation during the Rebuild ProcessThe current, traditional storage implementation of redundancy involves a redundant disk group, either mirrored pairs of disks or RAID-5 disk groups. Each such group has a hot spare disk, which is used to rebuild the redundancy upon a failure.The enormous increase in disk capacity in recent years has not, unfortunately, been matched by an increase in disk bandwidth. As a result, disk rebuild time has increased to several hours, to as many as 15, depending on disk size and protection scheme. During this time, the system suffers from severe performance degradation due to the heavy I/O requirement of the rebuild process. Some systems offer a way to limit the resources allocated for a rebuild, thus ensuring more system performance, but wind up increasing rebuild time, thereby increasing exposure to double failure.The XIV system's disk failure protection scheme enables a distributed rebuild mechanism in which all disks participate. This ensures an extremely short rebuild time, 30 minutes for a 1 TB drive. Furthermore, the overhead of the rebuild process is minimal, since all disks participate in the rebuild and each disk only needs to rebuild a small portion. This ensures that performance levels at rebuild time remain intact. Another problem with a RAID-5 or RAID-6-based rebuild is that until the rebuild process is over, each request to read data from the failed disk must be served via multiple reads from all the disk groups and computing XOR. This creates a huge performance impact on serving read requests. The XIV system's mirrored protection ensures that even while a rebuild is in progress, read requests are served without any overhead.Traditional storage: Degradation Due to Write-through ModeModern redundant storage architectures require that each write command be written in two cache units before the host is acknowledged. Otherwise, a single failure in the cache module would create data loss. Furthermore, they require redundant protection of power supply to these cache units.Unfortunately, many storage architectures cannot guarantee protected cache after certain types of failures. A typical example is the failure of a cache module, which leaves the peer cache module exposed to a single failure. Another example is the failure of a UPS module, which makes the system vulnerable to power failures.The common solution to this problem is to use write-through mode, in which a host is acknowledged only after the information has been written to two disks and without using write-cache. This mode has a severe impact on performance and usually means a slowdown or stoppage of service to the application host. Unfortunately, it takes a technician’s visit to overcome such a problem.With the XIV system, write-through mode is never used. Even after the failure of a UPS unit or module, a write request is written to a cache in two different modules. The XIV System: Performance in the FieldThe performance of the XIV system has been proven in the field, demonstrating dramatic increases in comparison to other tier-1 storage systems. Several examples are given below.Scenario #1: Write-intensive DatabaseA leading bank was trying to contend with a performance-demanding application based on a 7 TB Oracle database with an extremely write-intensive I/O. The application practically failed when running on a leading tier-1 storage system. When migrated to another tier-1 storage system, equipped with 240 FC 146 GB 15K ROM drives, the application managed to provide an adequate performance level, but no more. Snapshots were not possible without compromising performance to unacceptable levels; as a result, backup procedures were complex and limited. Migrating the application to the XIV system gave the customer a dramatic increase in performance (for example, queries could now be performed in one-third of the time), while enabling the ongoing use of 28 differential snapshots. The gains were many: a much better response time to users, simplified physical backup procedures, and 28 levels of logical backup snapshots.Scenario #2: E-mail AppliancesTwo leading ISPs compared the XIV system against a well-known tier-1 system running POP e-mail storage for a group of e-mail appliances. The existing system required an independent interface card per each e-mail appliance, making the solution much more expensive and complex.The XIV system was able to handle five e-mail appliances on a single interface port, with no degradation in performance.Scenario #3: Voice-recording ApplicationA world leader in voice recording systems compared the XIV system with a system made up entirely of 146GB 15K RPM FC drives. The customer found that, with the XIV system, the same set of servers could support three times more clients (12,000 instead of 4,000), consequently reducing the total cost of the solution by an order of magnitude.Scenario #4: E-mail ServerA leading telecom company tested Microsoft® Exchange server performance on various storage systems and saw a striking gap between XIV and another leading tier-1 system. After sharing this information with that vendor’s top support engineers, the customer was told that since the Exchange metadata was spanned across only 18 disk drives, performance was limited. The customer asked the vendor to lay out the volume on more disk drives. The response was that doing so was technically impossible. This example illustrates how XIV’s ease of management provided real life high performance, while other vendors did not manage to exploit the full power of the physical components due to management limitations.SummaryAs presented above, the XIV system provides:►Unmatched performance levels, setting a new standard for SAN storage►High performance levels without manual planning or a configuration process►High performance levels that are consistently maintained, even upon hardware failure►Industry breakthrough: snapshots with high performance。
目录IBM System x3650 M2 7947 型 (1)3650M2前视图 (1)X3650M2操作员信息面板 (3)光通路诊断面板 (4)后视图 (5)IBM System x3650 M3 7945 型 (7)x3650m3正视图 (7)X3560m3操作员信息面板 (8)光通路诊断面板 (9)电源部分&指示灯 (15)IBM System x3650 M4 7915 型 (18)IBM System x3650M4正视图 (18)操作员信息面板 (19)X3650M4光通路诊断面板 (20)服务器电源功能 (27)IBM System x3500 M4 7383 型服务器 (29)X3500M4正视图 (31)X3500M4光通路诊断 (33)3500M4光通路诊断指示灯 (35)3500M4 后视图 (40)IBM System x3650 M2 7947 型3650M2前视图下图显示了服务器前部的控件、接口和硬盘驱动器托架。
硬盘驱动器活动指示灯:每个热插拔硬盘驱动器都具有一个活动指示灯。
当该指示灯闪烁时,表示该驱动器正在使用中。
硬盘驱动器状态指示灯:每个热插拔硬盘驱动器都具有一个状态指示灯。
当该指示灯点亮时,表示该驱动器发生了故障。
如果该指示灯缓慢闪烁(每秒闪烁一次),表示正在将该驱动器重新构建为RAID 配置的一部分。
当该指示灯快速闪烁(每秒闪烁三次)时,表示控制器正在识别该驱动器。
视频接口:将显示器连接到该接口。
可同时使用服务器前部和后部的视频接口。
USB 接口:这两个USB 接口可以连接USB 设备,如USB 鼠标、键盘或其他USB设备。
操作员信息面板:该面板包含控件、指示灯和接口。
有关操作员信息面板上的控件和指示灯的信息,请参阅第10 页的『操作员信息面板』。
机架释放滑锁:按下这些滑锁可以从机架上卸下服务器。
CD/DVD 弹出按钮:按该按钮可从CD-RW/DVD 驱动器中取出CD 或DVD。
IBM XIV系列存储快速维护手册目录一、XIV的基本架构和组成: (3)二、XIV的基本管理: (5)三、XIV的安装环境要求: (6)四、XIV安装上电 (8)五、如何升级XIV的微码 (11)六、如何收集XIV的System Data(XRAY) (11)七、如何通过GUI更换单个的DDM (15)八、如何通过GUI更换一个模块(Interface Module 或Data Module) (20)附录A:参考文档: (25)一、XIV的基本架构和组成:IBM的XIV系统存储2810-A14,提供了高性能、可扩展性、易于管理功能。
同时采用了SATA磁盘,更加降低了单位容量的成本。
XIV存储主要包括以下组成部分:1.6个主机系统接口模块(Interface Module),它提供FC以及iSCSI接口,同时每个模块包括了12块1TB的SATA硬盘。
2.9个数据模块(Data Module),每个模块包括了12块1TB的SATA硬盘。
3.3组不间断电源模块(UPS),提供掉电保护功能。
4.2个以太网交换机,提供模块间的数据交换功能。
5.一个维护模块(Maintenance Module)6.一个自动电源分配管理模块(ATS),管理外部输入电源的冗余。
7.一个Modem,连接到维护模块,提供CallHome和远程维护。
除此之外,为了方便和外部系统相连接,XIV将所有要和外部相连接的端口都集中到了一个叫Patch Panel的模块。
Patch Panel位于机柜后面右上角。
具体的端口定义如下图。
二、XIV的基本管理:XIV主要是通过XIV Storage Manager GUI进行管理的。
通过GUI完成以下的管理功能:1.XIV硬件的维护和备件的更换2.XIV存储容量的逻辑划分3.XIV和主机系统连接的配置管理4.系统性能的检测5.系统快照、拷贝、远程拷贝、数据迁移功能的管理除此之外XIV还提供了命令行方式的管理XCLI,所有能在GUI中完成的功能也都够通过XCLI完成。
XCLI软件包是和GUI一起安装的。
如果没有安装GUI,也可以单独安装XCLI。
以下是GUI的界面:三、XIV的安装环境要求:1.电源要求:XIV使用双路单项电源,180-264VAC,50/60Hz,最大电流60A。
2.环境温度要求:发热量:15Module:26K BTU/小时;6Module:10.5K BTU/小时。
空气循环:冷却风从前面进入机柜,从机柜后面排出。
空气流量:15Module:750cfm;6Module:300cfm。
运行环境:温度:10-35℃(50-95℉)相对湿度:25-80%最高海拔:2133M(7000Ft)3.物理空间和承重要求:机器尺寸:1991(高)×600(宽)×1091(深)机器重量:876KG走线方式:XIV和外部连线可以上走线和下走线。
如果是下走线,地板开口在机柜后部。
安装位置要求:4.四、XIV安装上电1. XIV电源的准备:XIV的电源配置通常有两种,一种是2路220V 60A的,一种是4路220V 30A的。
根据配置不同按照以下的安装图进行电源线的准备:电源线的线序分别如下图:2. 确认电源的零-地电压小于1V,接地电阻小于1OHM。
3. 连接电源线到相应的ATS或UPS电源接口。
4. 将ATS上的3个开关打到“ON”的位置,ATS上的绿色电源指示灯应该亮起。
5. 将3个UPS的前面板卸下。
检查UPS上的电池组的是否正确连线,在运输过程中,有一组电池会没有连线。
将其连线连好并紧固。
6. 当电池组的连线都连接好后,3个UPS前面的风扇会转起。
如果没有转,检查UPS后面的电源连线和ATS开关的状态是否在“ON”的位置。
7. 重新安装上UPS的前面板。
8. 检查3个UPS后面的输出的开关,应该在“ON”的位置。
9. 检查3个UPS后面的“Online/Bypass”开关应该设置在“Online”的位置。
10.按下3个UPS的“test”按钮。
整个XIV开始上电自检。
如果XIV正常上电自检完成就可以用lantop连接到patchpanal上module 5 laptop port通过GUI或XCLI进行逻辑安装配置了。
五、如何升级XIV的微码XIV的微码升级是通过一个“IBM-XIV Technician Assistant utility”的工具来完成的。
XIV 的微码文件、升级手册和IBM-XIV Technician Assistant utility的下载地址:https:///webapp/iwm/int/reg/pick.do?source=IIPxiv 除此之外,升级微码还需要准备putty和pscp或者WinSCP工具。
Putty的下载地址:http://the.earth.li/~sgtatham/putty/latest/x86/putty.exeWinSCP下载地址:/eng/download.php微码升级步骤根据初始微码版本和要升级的微码版本的不同,步骤不完全一样。
具体的升级步骤参考各微码版本对应的升级手册。
六、如何收集XIV的System Data(XRAY)在出现复杂的情况,不能马上找到问题时,SSR需要收集XIV的System Data提供个support 进行分析。
此System Data类似于DS8000的papkg。
在XIV里称为XRAY。
XRAY的收集是SSR通过安装有XCLI的laptop生成和收集的。
1.将laptop连接到Patch panel上的module 5 laptop port。
通常DHCP会分配给laptop 一个14.10.202.xx的地址。
如果没有分配到地址,手工将laptop的地址设成14.10.202.1。
用“ping”命令检查和14.10.202.250地址是否可以通讯。
2.在laptop的命令行执行收集程序,即运行xray_collect IP 来收集XRAY,这里的IP是直连时的IP(即14.10.202.250)。
注:如果XIV 的微码是10.1或以上的,使用xray_collect_v2.0.1.exe文件。
否者使用xray_collect.exe文件。
(xray_collect.exe 和xray_collect_v2.0.1.exe 文件在对应的微码的包里面)XRAY收集完成后会自动Offload到lantop上运行xray_collect.exe的目录下,是一个压缩包。
文件名类似system_xray_2810A147801224_2009-12-04-1441.tar.bz2。
如果Xray文件没有能够正常地offload到lantop,可以采用以下的从GUI的方式Offload。
Xray的文件可以直接上传到PFE的website。
在XIV GUI2.4.1版本开始,XIV支持用GUI的方式收集XRAY。
在GUI界面中用technician用户登录(密码是:teChn1cian),选中要收集XRAY的XIV 系统。
选择菜单中的“Help” “Support Logs…”选择选中要要download的XRAY文件和保存的目录。
七、如何通过GUI更换单个的DDM1.将laptop连接到Patch panel上的module 5 laptop port。
通常DHCP会分配给laptop 一个14.10.202.xx的地址。
如果没有分配到地址,手工将laptop的地址设成14.10.202.1。
用“ping”命令检查和14.10.202.250地址是否可以通讯。
2.在laptop上打开XIV Storage Manager GUI,输入用户名:technician,密码:teChn1cian。
(注意区分大、小写)3.进入后选择对应的XIV和要更换的DDM,点击鼠标右键,选择“Phase out -> Failed”:4.这时可以看到整个XIV的容量已经由79302GB减少到78907GB。
系统的状态也由原来的Full Redundancy变成了Redistributing。
表示系统在将数据进行重新分布,重新分布的时间取决于损坏的DDM上数据的多少。
5.这时的DDM的状态已经变成了“Failed”,Functioning是“no”。
6.将Failed的DDM拔出来,插入新的DDM。
这时在GUI上选择对应的DDM查看,Functioning状态变成了“yes”。
7.再选择对应的DDM,选择鼠标右键。
选择“Test”。
等完成后,DDM的状态会变成“Ready”8.再次在对应的DDM上选择鼠标右键,选择“Phase in”。
这时系统又开始进行“Redistributing”9.等到“Redistributing”完成,系统再次恢复到“Full Redundanc”。
系统总容量也恢复到79302GB。
更换DDM工作完成。
八、如何通过GUI更换一个模块(Interface Module 或Data Module)1.将laptop连接到Patch panel上的module 5 laptop port。
通常DHCP会分配给laptop 一个14.10.202.xx的地址。
如果没有分配到地址,手工将laptop的地址设成14.10.202.1。
用“ping”命令检查和14.10.202.250地址是否可以通讯。
2.在laptop上打开XIV Storage Manager GUI,输入用户名:technician,密码:teChn1cian。
(注意区分大、小写)3.进入后选择对应的XIV和要更换的Module,点击鼠标右键,选择“Phase out -> Failed”:4.这时可以看到整个XIV的容量已经由79302GB减少到73392GB。
系统的状态也由原来的Full Redundancy变成了Redistributing。
表示系统在将数据进行重新分布,重新分布的时间取决于损坏的Module上数据的多少。
5.等待Phase out完成,在GUI上Module由桔黄色变成红色。
这时的Module的状态已经变成了“Failed”。
6.按照Service Guide的步骤将Failed Module从机柜中取出,将新更换的Module插入机柜,并将原Module中的DDM插回新更换的Module。
这时在GUI上选择对应的Module,选择鼠标右键。
选择“Test”。
这时新Module开始Initializing。
7.等待Initializing完成后,Module的状态会变成“Ready”。
8.再次在对应的Module上选择鼠标右键,选择“Phase in”。