当前位置：文档之家› Oracle-OS-Watcher-工具-使用详解

Oracle-OS-Watcher-工具-使用详解

Oracle OS Watcher 工具使用详解Oracle OS Watcher 工具使用详解

一.OSWatcher 说明OSWatcher 是Oracle 提供的一个用于操作系统监控的工具包，分Windows 和Linux 2个版本。

Linux 下的下载参考：OS Watcher Black Box UserGuide [ID 301137.1]
Windows平台下载：OSWatcher For Windows (OSWFW) User Guide [ID 433472.1]

也可以从我的CSDN 下载：
Oracle OS Watcher Tool
https://www.doczj.com/doc/e19328052.html,/detail/tianlesoftware/4049989

OSWatcher hasbeen renamed to OSWatcher Black Box to avoid confusion as there are many toolsin support with this same name. This version is not to be confused with theversion of OSWatcher that is shipped with Exadata.
--OSWatcher 在4.0 版本被重命名为OSWatcherBlack Box,已避免造成与同名工具的疑惑。

New in thisrelease (4.0.0) is a built-in analyzer which analyzes the data OSWbb collectsand provides information on system slowdowns, hangs and other OS performanceproblems.
--在最新的4.0.0 版本，添加了收集分析数据的功能，在系统slowdown，hang 或者其他性能问题时会提供相关的分析数据。

OS Watcher BlackBox Analyzer (OSWbba) is a graphing and analysis utility which comes bundledwithOSWbb v4.0.0and higher. OSWbba allows the userto graphically display data collected, generate reports containing these graphsand provides a built in analyzer to analyze the data and provide details on anyperformance problems it detects. The ability to graph and analyze thisinformation relieves the user of manually inspecting all the files.
-- OS Watcher Black Box Analyzer (OSWbba) 是一个绘图和分析工具，其捆绑在 OS Watcher Black Box（OSWbb）4.0中。

NOTE:OSWbbareplaces the utility OSWg. This was done to eliminate the confusion caused byhaving multiple tools in support named OSWatcher. OSWbba is only supported fordata collected by OSWbb and no other tool.
--OSWbba 替代了OSWg工具，已避免其和OSWatcher 工具的疑惑。OSWbba 仅仅用来支持OSWbb的数据收集，不做其他用途。

以上的说明感觉有点复杂，简单点说：
在OSW 4.0 之前是：OSWatcher 和 OSWg的关系。
OSW 4.0 后变成了： OSWbb 与 OSWbba 的关系。

这样避免造成名称上的疑惑。OSWbb收集数据，OSWbba 分析数据。

还有一个类似的工具OracleRDA:
OracleRDA(Remote Diagnostic Agent) 工具说明
https://www.doczj.com/doc/e19328052.html,/tianlesoftware/article/details/6758522

二．OS Watcher Black Box 安装配置MOS：OS Watcher Black Box User Guide [ID 301137.1]

2.1 OSWbb 说明 OS Watcher BlackBox (OSWbb) is a collection of UNIX shell scripts intended to collect andarchive operating system and network metrics to aid support in diagnosingperformance issues. OSWbb operates as a set of background processes on theserver and gathers OS data on a regular basis, invoking such Unix utilities asvmstat, netstat and iostat. OSWbb can be downloaded from this note. OSWbb isalso inclu

ded in the RAC-DDT script file, but is not installed by RAC-DDT. Formore information on RAC-DDT see <>. OSWbb is installed on each node wheredata is to be collected. Installation instructions for OSWbb are provided inthis user guide.
--OSWbb 是UNIX 脚本的集合，其用来收集和归档数据，从来来帮助定位问题。 OSWbb 操作可以设置为一个后台进程，然后规则的收集数据，其调用Unix 的工具，如vmstat，netstat和iostat。
OSWbb 包含了RAC-DDT脚本，但RAC-DDT 不包含OSWbb。

OSWbb consistsof a series of shell scripts. OSWatcher.sh is the main controlling executive,which spawns individual shell processes to collect specific kinds of data,using Unix operating system diagnostic utilities. Control is passed toindividually spawned operating system data collector processes, which in turncollect specific data, timestamp the data output, and append the data topre-generated and named files. Each data collector will have its own file,created and named by the File Manager process.
--OSWbb 包含一系列的shell 脚本。 OSWwaterch.sh 是总控制，其可以生成独立的shell 进程来收集不同的数据。
每个收集的信息都有自己独立的文件，文件名有时间戳。

Data collectionintervals are configurable by the user, but will be uniform for all datacollector processes for a single instance of the OSWbb tool. For example, ifOSWbb is configured to collect data once per minute, each spawned datacollector process will generate output for its respective metric, write data toits corresponding data file, then sleep for one minute (or other configuredinterval) and repeat. Because we are collecting data every minute, the filesgenerated by each spawned processes will contain 60 entries, one for eachminute during the previous hour. Each file will contain, at most, one hour ofdata. At the end of each hour, File Manager will wake up and copy the existingcurrent hour file to an archive location, then create a new current hour file.
--数据收集的间隔由用户配置，但对一个OSWbb 实例来说，其所有的收集进程的间隔时间是一样的。

The File Managerensures only the lastNhours of information are retained,whereNis a configurable integer defaulting to 48. File Manager willwake up once per hour to delete files older thanNhours. At anytime, the entire output file set will consist of one current hour file,plusNarchive files for each data collector process.
stopOSWbb.sh will terminate all processesassociated with OSWbb, and is the normal, graceful mechanism for stopping thetool's operation.
--File Manager 用来控制日志文件只保留最后N个小时的信息，这个N由用户配置，默认是48小时。File Manager 每隔一小时被唤醒一次，用来删除超过N小时的日志。

OSWbb invokesthese distinct operating system utilities, each as a distinct backgroundprocess, as data collectors. These utilities will be supported, or thei

requivalents, as available for each supported target platform.
--OSWbb 直接调用系统命令来收集信息，每个收集的信息都对应一个后台进程，这些命令包括:
(1)ps
(2)top
(3)mpstat
(4)iostat
(5)netstat
(6)traceroute
(7)vmstat

2.2 Supported PlatformsOSWbb is certified to run on the followingplatforms:
--OSWbb 支持如下平台：
(1)AIX
(2)Tru64
(3)Solaris
(4)HP-UX
(5)Linux

2.3 Gathering DiagnosticData2.3.1 Installing OSWbbOSWbb needs tobe installed on each node, one installation per node. OSWbb should be installedmanually by using the following procedure:
--OSWbb 需要在每个节点上安装。从MOS上下载的OSWbb 是tar 文件，使用如下命令对tar 文件进行解压缩，就会得到一个OSWbb的文件夹。
[root@rac1 u01]#tar xvfoswbb.tar

oswbb 文件夹包含了所有需要的文件。解压缩的过程就是OSWbb的过程，也就是说，OSWbb 不需要安装，直接解压缩即可。

2.3.2 Uninstalling OSWbbTo de-installOSWbb issue the following command on the oswbb directory.
--卸载OSWbb，使用rm 命令移除整个文件夹即可，命令如下：
[root@rac1 u01]#rm -rf oswbb

2.3.3 Setting up OSWbb Once OSWbb isinstalled, scripts have been provided to start and stop the OSWbb utility. WhenOSWbb is started for the first time it creates the archive subdirectory. Thearchive directory contains 7 subdirectories, one for each data collector. Datacollectors exist for top, vmstat, iostat, mpstat, netstat, ps and an optionalcollector for tracing private networks. To turn on data collection for privatenetworks the user must create an executable file in the oswbb directory https://www.doczj.com/doc/e19328052.html,. An example of what this file should look like is named https://www.doczj.com/doc/e19328052.html, with samples for each operating system: solaris, linux, aix, hp,etc. in the oswbb directory. This file can be edited and renamed https://www.doczj.com/doc/e19328052.html, ora new file named https://www.doczj.com/doc/e19328052.html, can be created. This file contains entries forrunning the traceroute command to verify RAC private networks.
--当OSWbb 安装完成之后，就可以使用start 和stop 脚本，在OSWbb第一次使用时，它会创建一些归档的子目录。这些归档目录包含7个子目录，每一个子目录对应一个收集数据。这7个目录分别对应：top, vmstat, iostat, mpstat, netstat, ps 和一个可选的traceprivate network。
要启动private network，必须先在oswbb目录下创建一个https://www.doczj.com/doc/e19328052.html,的可执行文件。这个文件里的内容可以是用来验证RAC private network的traceroute命令。

下面是Solaris平台下https://www.doczj.com/doc/e19328052.html,示例：
Example https://www.doczj.com/doc/e19328052.html, entry on Solaris:
traceroute -r -F node1traceroute -r -F node2

Where node1 andnode2 are 2 nodes in addition to the hostnode of a 3 node RAC cluster. If thefile https://www.doczj.com/doc/e19328052.html, does not exist or is not executable then no data will becollected and stored under the oswprvtnet directory.

OSWbb will needaccess to the OS utilities:top, vmstat, iostat, mpst

at,netstat,andtraceroute.These OS utilities need to be installon the system prior to running OSWbb. Execute permission on theseutilities need to be granted to the user of OSWbb.
--OSWbb 需要访问OS 命令，这些OS 命令需要在运行OSWbb之前安装好。

2.3.4 Starting OSWbb To start theOSWbb utility execute the startOSWbb.sh shell script from the directory whereOSWbb was installed. This script has 2 arguments which control the frequencythat data is collected and the number of hour's worth of data to archive.
--启动OSWbb 功能用startOSWbb.sh 脚本。这个脚本有2个脚本，其用来控制数据收集的频率和归档数据保留的时间。
ARG1 = snapshotinterval in seconds.ARG2 = the number of hours of archive data to store.

If you do notenter any arguments the script runs with default values of 30 and 48 meaningcollect data every 30 seconds and store the last 48 hours of data in archive files.
--如果没有在启动时没有指定这2个参数，那么默认情况是30秒收集一次，归档数据保留48个小时。

--示例一
Example 1:
./startOSWbb.sh 60 10

This would startthe tool and collect data at 60 second intervals and log the last 10 hours ofdata to archive files.
--这个命令每隔60秒收集一次，数据保留10个小时。

Example 2:
./startOSWbb.sh
NOTE: This woulduse the default values of 30, 48 and collect data at 30 second intervals andlog the last 48 hours of data to archive files.
--没有指定参数，使用默认值

Example 3:
nohup ./startOSWbb.sh 60 10 &
This would startthe tool, put the process in the background, enable to the tool to continuerunning after the session has been terminated, collect data at 60 secondintervals, and log the last 10 hours of data to archive files.
--使用nohup让脚本后台执行。更多内容，参考我的Blog：
Linux 前台和后台进程说明
https://www.doczj.com/doc/e19328052.html,/tianlesoftware/article/details/6165753

2.3.5 Stopping OSWbbTo stop theOSWbb utility execute the stopOSWbb.sh command from the directory where OSWbbwas installed. This terminates all the processes associated with the tool.
--停止OSWbb，使用stopOSWbb.sh 脚本即可。这个命令将终止所有相关的进程。

Example:
./stopOSWbb.sh

2.4 Diagnostic Data Output--OSWbb 数据内容说明
As stated above,when OSWbb is started for the first time it creates the archive subdirectoryunder the OSWbb installation directory. The archive directory contains 7subdirectories, one for each data collector. These directories are namedoswiostat, oswmpstat, oswnetstat, oswprvtnet, oswps, oswtop, and oswvmstat. Onefile per hour will be generated in each of the 7 OS utility subdirectories withthe exception of oswprvtnet which is dependent on having private networkstracing configured. A new file is created at the top of each hour during thetime that OSWbb is running. The file will be in the following format:
--在第一次运行OSWbb时，会在OSWbb安

装目录下创建7个子目录，分别对应7个不同的收集信息。这7个目录是：oswiostat, oswmpstat, oswnetstat, oswprvtnet,oswps, oswtop, and oswvmstat。在7个目录中，每个一小时生成一个归档文件，这里除了private networks，因为其启动与否决定相关参数是否配置。每个文件名的格式如下：
<node_name>_<OS_utility>_YY.MM.DD.HH24.dat

rac1:/u01/oswbb> cd archive
rac1:/u01/oswbb/archive> ls
oswiostatoswmeminfo oswmpstat oswnetstatoswprvtnet oswps oswslabinfooswtop oswvmstat
rac1:/u01/oswbb/archive> ll
total 36
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswiostat
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswmeminfo
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswmpstat
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswnetstat
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswprvtnet
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswps
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswslabinfo
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswtop
drwxr-xr-x. 2 oracle oinstall 4096 Mar 3 21:04 oswvmstat
rac1:/u01/oswbb/archive> cd oswiostat/
rac1:/u01/oswbb/archive/oswiostat> ls
rac1_iostat_12.03.03.2100.dat

Details about each type of data file can beviewed by clicking on the below links:
oswiostatoswmpstatoswnetstatoswprvtnetoswpsoswtoposwvmstat

2.4.1 oswiostat<node_name>_iostat_YY.MM.DD:HH24.dat

These files willcontain output from the 'iostat' command that is obtained and archive byOSWatcher Black Box at specified intervals. These files will only existif 'iostat' is installed on the OS and if the OSWbb user has privileges to runthe utility.

The iostatcommand is used for monitoring system input/output device loading by observingthe time the physical disks are active in relation to their average transfer rates.This information can be used to change system configuration to better balancethe input/output load between physical disks and adapters.
--iostat 命令可以监控系统的I/O.

The iostatutility is fairly standard across UNIX platforms, but really on useful for thoseplatforms that support extended disk statistics: AIX, Solaris and Linux. Alsoeach platform will have a slightly different version of the iostat utility. Youshould consult your operating system man pages for specifics. The sampleprovided below is for Solaris.

OSWbb runs theiostat utility at the specified interval and stores the data in the oswiostatsubdirectory under the archive directory. The data is stored in hourly archivefiles. Each entry in the file contains a timestamp prefixed by *** embedded inthe iostat output. Notice there are 3 entries for each timestamp. You shouldalways ignore the first entry as this entry is always invalid. The second andthird entry will be valid but the second entry will be 1 sec later than thetimestamp and the third entry will be 2 seconds later than the timestamp.

Sample iostat file produced by OSWbb

extended device statistics
r/s
w/s
kr/s
kw/s
wait
actv
wsvc_t
asvc_t
%w
%b
device
0.0
0.3
0.0
2.1
0.0
0.0
3.4
0.8
0
0
c0t0d0
0.0
2.1
0.1
12.9
0.0
0.0
0.6
0.4
0
0
c0t2d0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0
0
fd0
2.9
1.2
240.8
1.5
0.0
0.1
0.0
13.3
0
5
c1t0d0
1.1
0.8
18.0
8.8
0.0
0.0
0.1
5.9
0
1
c1t1d0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0
0
c0t1d0

2.4.1.1 Field Descriptions –字段描述The iostat output contains summaryinformation for all devices.
Field
Description
r/s
Shows the number of reads/second
w/s
Shows the number of writes/second
kr/s
Shows the number of kilobytes read/second
kw/s
Shows the number of kilobytes written/second
wait
Average number of transactions waiting for service (queue length)
actv
Average number of transactions actively being serviced
wsvc_t
Average service time in wait queue, in milliseconds
asvc_t
Average service time of active transactions, in milliseconds
%w
Percent of time there are transactions waiting for service
%b
Percent of time the disk is busy
device
Device name

2.4.1.2 What to look for – 关注的内容(1)Average service times greaterthan 20msec for long duration.
(2)High average wait times.

2.4.2 oswmpstat<node_name>_mpstat_YY.MM.DD:HH24.dat
These files willcontain output from the 'mpstat' command that is obtained and archive byOSWatcher Black Box at specified intervals. These files will only existif 'mpstat' is installed on the OS and if the OSWbb user has privileges to runthe utility.
The mpstat command collects and displays performancestatistics for all logical CPUs in the system.
--mpstat 命令收集和显示所有逻辑CPU的性能统计信息。

The mpstatutility is fairly standard across UNIX platforms. Each platform will have aslightly different version of the mpstat utility. You should consult youroperating system man pages for specifics. The sample provided below is forSolaris.
--每个平台都有不同版本的mpstat命令。

OSWbb runs thempstat utility at the specified interval and stores the data in the oswmpstatsubdirectory under the archive directory. The data is stored in hourly archivefiles. Each entry in the file contains a timestamp prefixed by *** embedded inthe mpstat output. Notice there are 3 entries for each timestamp. You shouldalways ignore the first entry as this entry is always invalid. The second andthird entry will be valid but the second entry will be 1 sec later than thetimestamp and the third

entry will be 2 seconds later than the timestamp.

Sample mpstat file produced by OSWbb
***Fri Jan 28 12:50:36 EST 2005
CPU
minf
mjf
xcal
intr
ithr
csw
icsw
migr
smtx
srw
syscl
usr
sys
wt
idl
0
0
0
0
483
383
118
1
0
0
0
64
0
0
0
100
0
1268
0
0
486
382
414
42
0
0
0
2902
8
24
0
68
0
4
0
0
479
379
144
3
0
0
0
96
0
0
0
100

2.4.2.1 Field Descriptions Field
Description
cpu
Processor ID
minf
Minor faults
mif
Major Faults
xcal
Processor cross-calls (when one CPU wakes up another by interrupting it).
intr
Interrupts
ithr
Interrupts as threads (except clock)
csw
Context switches
icsw
Involuntary context switches
migr
Thread migrations to another processor
smtx
Number of times a CPU failed to obtain a mutex
srw
Number of times a CPU failed to obtain a read/write lock on the first try
syscl
Number of system calls
usr
Percentage of CPU cycles spent on user processes
sys
Percentage of CPU cycles spent on system processes
wt
Percentage of CPU cycles spent waiting on event
idl
Percentage of unused CPU cycles or idle time when the CPU is basically doing nothing

2.4.2.2 What to look for (1)Involuntary context switches(this is probably the more relevant statistic when examining performanceissues.)
(2)Number of times a CPU failed toobtain a mutex. Values consistently greater than 200 per CPU causes system timeto increase.
(3)xcal is very important, showprocessor migration

2.4.3 oswnetstat<node_name>_netstat_YY.MM.DD:HH24.dat

These files willcontain output from the 'netstat' command that is obtained and archive byOSWatcher Black Box at specified intervals. These files will only existif 'netstat' is installed on the OS and if the OSWbb user has privileges to runthe utility.

Thenetstatcommanddisplays current TCP/IP network connections and protocol statistics.
--netstat 命令显示当前网络连接和协议的相关统计信息。

The netstatutility is standard across UNIX platforms. Each platform will have a slightlydifferent version of the netstat utility. You should consult your operatingsystem man pages for specifics. The sample provided below is for Solaris.

OSWbb runs thenetstat utility at the specified interval and stores the data in the oswnetstatsubdirectory under the archive directory. The data is stored in hourly archivefiles. Each entry in the file contains a timestamp prefixed by *** embedded inthe netstat output. Notice there are 3 entries for each timestamp. You shouldalways ignore the first entry as thi

s entry is always invalid. The second andthird entry will be valid but the second entry will be 1 sec later than thetimestamp and the third entry will be 2 seconds later than the timestamp.

The netstatutility has many command line flags, and the most commonly used to troubleshootRAC is "ia(n)" for the interface level output and "s" forthe protocol level statistics. The following are examples for the two differentcommand parameters.
--netstat命令有写命令标记，常用来解决RAC 问题的是 ian 三个。其含义如下：

The command line options "-ain"have these effects:
Option
Description
-a
The command output will use the logical names of the interface. It will also report the name of the IP address found through normal IP address resolution methods.
-i
This triggers the Interface specific statistics, the columns of which are outlined in table [bla-KR]
-n
This causes the output to use IP addresses instead of the resolved names

Example netstat file produced by OSWbb:
--输出示例，这里省略了大部分内容
Sample netstat file produced by OSWbb
***Fri Jan 28 12:50:36 EST 2005
Name
Mtu
Net/Dest
Address
Ipkts
Ierrs
Opkts
Oerrs
Collis
Queue
lo0
8232
127.0.0.0
127.0.0.1
296065
0
296065
0
0
0
eri0
1500
138.1.140.0
138.1.140.96

0
176244
2
191951
0

RAWIP

rawipInDatagrams
=
0

rawipInErrors
=
0

rawipInCksumErrs
=
0

rawipOutDatagrams
=
0

rawipOutErrors
=
0

UDP

udpInDatagrams
=
295719

udpInErrors
=
0

udpOutDatagrams
=
295671

udpOutErrors
=
0
TCP

tcpRtoAlgorithm
=
4

tcpRtoMin
=
400

tcpRtoMax
=
60000

tcpMaxConn
=
-1

IPv4
….

ipForwarding
=
2

ipDefaultTTL
=
255

ipInReceives
=
17858585

ipInHdrErrors
=
0
ICMPv4

icmpInMsgs
=
17624914

icmpInErrors
=
0

icmpInCksumErrs
=
0

icmpInUnknowns
=
0

icmpInDestUnreachs
=
72

icmpInTimeExcds
=
0

IGMP:

2490

messages received

0

messages received with too few bytes

0

messages received with bad checksum

2490

membership queries received

0

membership queries received with invalid field(s)

0

membership reports received

0

membership reports received with invalid field(s)

0

membership reports received for groups to which we belong

0

membership reports sent

2.4.3.1 Field Descriptions: The netstatoutput produced by OSWbb contains 2 sections. The first section containsinformation about all the network interfaces. The second section containsinformation about per-protocol statistics.
--netstat 输出包含两种类型，第一种包含所有network interfaces 的信息，第二种包含每个protocal 的统计信息。

（1）Section 1: Netstat -ain
Field
Description
name
Device name of interface
Mtu
Maximum transmission unit
Net
Network Segment Address
address
Network address of the device
ipkts
Input packets
Ierrs
Input errors
opkts
Output Packets
Oerrs
Output errors
collis
Collisions
queue
Number in the Queue

（2）Section 2: Protocol Statistics
The per-protocol statistics can be dividedinto several categories:
--per-protocal 统计信息可以分成如下几类：
(1)RAWIP (raw IP) packets
(2)TCP packets
(3)IPv4 packets
(4)ICMPv4 packets
(5)IPv6 packets
(6)ICMPv6 packets
(7)UDP packets
(8)IGMP packet

Each protocoltype has a specific set of measures associated with it. Network analysisrequires evaluation of these measurements on an individual level and alltogether to examine the overall health of the network communications.
The TCP protocolis used the most in Oracle database and applications. Some implementations forRAC use UDP for the interconnect protocol instead of TCP. The statistics cannotbe divided up on a per-interface basis, so these should be compared to the"-i" statistics above.

2.4.3.1 What to look for: --注意内容（1）Section 1
The informationin Section 1 will help diagnose network problems when there is connectivity butresponse is slow.
--这种类型的数据可以用来诊断网络连接正常，但是

反应慢的情况：

Values to look at:
(1)Collisions (Collis)
(2)Output packets (Opkts)
(3)Input errors (Ierrs)
(4)Input packets (Ipkts)

The above values will give information toworkout network collision rates as follows:
1）Network collision rate = Output collision / Output packets
For a switchednetwork, the collisions should be 0.1 percent or less (see theCisco web siteas a reference) of the output packets.Excessive collisions could lead to the switch port the interface is pluggedinto to segment, or pull itself off-line, amongst other switch-related issues.

2）For the input error statistics:
Input Error Rate = Ierrs / Ipkts.
If the inputerror rate is high (over 0.25 percent), the host is excessively droppingpackets. This could mean there is a mismatch of the duplex or speedsettings of the interface card and switch. It could also imply a failedpatch cable.
If ierrs oroerrs show an excessive amount of errors, more information can be found byexamination of the netstat -s output.
For Sun systems,further information about a specific interface can be found by using the"-k" option for netstat. The output will give fuller statistics forthe device, but this option is not mentioned in the netstat man page. Moreinformation can be found athttps://www.doczj.com/doc/e19328052.html,.

（2）Section 2
The informationin Section 2 contains the protocol statistics.
Many performanceproblems associated with the network involve the retransmission of the TCPpackets. For retransmission rate calculationsclick here.

To find the segment retransmission rate:
%segment-retrans=(tcpRetransSegs /tcpOutDataSegs) * 100

To find the byte retransmission rate:
%byte-retrans = ( tcpRetransBytes /tcpOutDataBytes ) * 100

Most networkanalyzers report TCP retransmissions as segments (frames) and not in bytes.

2.4.4 oswprvtnet<node_name>_prvtnet_YY.MM.DD:HH24.dat

These files willcontain output from the 'prvtnet' command that is obtained and archived byOSWatcher Black Box at specified intervals. These files will only existif 'prvtnet' is installed on the OS and if the OSWbb user has privileges to runthe utility.
--这个文件包含prvtnet 命令收集的信息。

Informationabout the status of RAC private networks should be collected. This requires theuser to manually add entries for these private networks into the https://www.doczj.com/doc/e19328052.html,file located in the base oswbb directory. Instructions on how to do this arecontained in the README file.
--如果是RAC private network的信息，那么需要在https://www.doczj.com/doc/e19328052.html, 文件里添加相关的内容。这个前面有示例。

OSWbb uses thetraceroute command to obtain the status of these private networks. Eachoperating system uses slightly different arguments to the traceroute command.Examples of the syntax to use for each operating system are contained in the sampleExample https://www.doczj.com/doc/e19328052.html, file located in the base oswbb directory. This will resultin the output appearing differently across UNIX platforms. OSWbb

runs https://www.doczj.com/doc/e19328052.html, file at the specified interval and stores the data in theoswprvtnet subdirectory under the archive directory. The data is stored inhourly archive files. Each entry in the file contains a timestamp prefixed by*** embedded in the top output.

Sample file produced by OSWbb
***Fri Jan 28 12:50:36 EST 2005
traceroute to https://www.doczj.com/doc/e19328052.html, (138.2.71.112): 1-30 hops (initial packetsize = 1500) https://www.doczj.com/doc/e19328052.html, (138.2.71.112)1.95ms 2.92 ms1.95 ms

2.4.4.1 What to Look For
Example 1: Interface is up andresponding:
traceroute toX.X.X.X, (X.X.X.X) 30 hops max, 1492 byte packets 1 X.X.X.X 1.015 ms 0.766 ms 0.755 ms

Example 2: Target interface is not ona directly connected network, so validate that the address is correct or theswitch it is plugged in is on the same VLAN (or other issue):
traceroute to X.X.X.X, (X.X.X.X) 30 hopsmax, 40 byte packetstraceroute: host X.X.X.X is not on a directly-attached network

Example 3: Network is unreachable:
traceroute to X.X.X.X, (X.X.X.X) 30 hopsmax, 40 byte packetsNetwork is unreachable

2.4.5 oswps<node_name>_ps_YY.MM.DD:HH24.dat
These files willcontain output from the 'ps' command that is obtained and archive by OSWatcherBlack Box at specified intervals. These files will only exist if 'ps' isinstalled on the OS and if the OSWbb user has privileges to run the utility.
--这个文件包含ps命令的输出信息。

The ps (processstate) command list all the processes currently running on the system andprovides information about CPU consumption, process state, priority of theprocess, etc. The ps command has a number of options to control which processesare displayed, and how the output is formatted. OSWbb runs the ps command withthe -elf option.

The ps commandis fairly standard across UNIX platforms Each platform will have a slightlydifferent version of the ps utility. You should consult your operating systemman pages for specifics. The sample provided below is for Solaris.

OSWbb runs theps command at the specified interval and stores the data in the oswps subdirectoryunder the archive directory. The data is stored in hourly archive files. Eachentry in the file contains a timestamp prefixed by *** embedded in the psoutput.

Sample ps file produced by OSWbb
***Wed Feb 2 09:26:54 EST 2005
F
S
UID
PID
PPID
C
PRI
NI
ADDR
SZ
WCHAN
STIME
TTY
TIME
CMD
19
T
root
0
0
0
0
SY
?
0

Jan 31
?
0:13
sched
8
S
root
1
0
0
41
20
?
107
?
Jan 31
?
0:00
/etc
19
S
root
2
0
0
0
SY
?
0
?
Jan 31
?
0:00
page
19
S
root
3
0
0
0
SY
?

0
?
Jan 31
?
0:50
fsflu
8
S
root
355
1
0
41
20
?
232
?
Jan 31
?
0:00
/usr/
8
S
root
297
296
0
41
20
?
379
?
Jan 31
?
0:00
htt_s
8
S
cedavis
391
381
0
89
20
?
301
?
Jan 31
?
0:00
/usr/

2.4.5.1 Field Descriptions Field
Description
f
Flags s State of the process
uid
The effective user ID number of the process
pid
The process ID of the process
ppid
The process ID of the parent process.
d
Processor utilization for scheduling (obsolete).
pri
The priority of the process.
ni
Nice value, used in priority computation.
addr
The memory address of the process.
sz
The total size of the process in virtual memory, including all mapped files and devices, in pages.
wchan
The address of an event for which the process is sleeping (if blank, the process is running).
stime
The starting time of the process, given in hours, minutes, and seconds.
tty
The controlling terminal for the process (the message ?, is printed when there is no controlling terminal).
time
The cumulative execution time for the process.
cmd
The command name process is executing.

2.4.5.2 What to look for The informationin the ps command will primarily be used as supporting information for RACdiagnostics. If for example, the status of a process prior to a system crashmay be important for root cause analysis. The amount of memory a process isconsuming is another example of how this data can be used.

2.4.6 oswtop<node_name>_top_YY.MM.DD:HH24.dat

These files willcontain output from the 'top' command that is obtained and archive by OSWatcherat specified intervals. These files will only exist if 'top' is installedon the OS and if the OSWbb user has privileges to run the utility.
--这个文件包含top命令的信息。

Top is a programthat will give continual reports about the state of the system, including a listof the top CPU using processes. Top has three primary design goals:
(1)provide an accurate snapshot ofthe system and process state,
(2)not be one of the top processesitself,
(3)be as portable as possible.

Each operatingsystem uses a different version of the UNIX utility top. This will result inthe top output appearing differently across UNIX platforms. You should consultyour operating system man pages for specifics. The sample provided below is forSolaris.

OSWbb runs thetop utility at the specified interval and stores the data in the oswtopsubdirectory under the archive directory. The data is stored in hourly archivefiles. Each entry in the file contains a timestamp prefixed by *** embedded inthe top output.

Sample top file produced by OSWbb

***Fri Jan 28 12:50:36 EST 2005 load averages: 0.11, 0.07, 0.06 12:50:36 136 processes: 133 sleeping, 2 running, 1 on cpu Memory: 2048M real, 1061M free, 542M swap in use, 1605M swap free
PID
USERNAME
THR
PRI
NICE
SIZE
RES
STATE
TIME
CPU
COMMAND
704
cedavis
16
49
0
346M
276M
sleep
222:33
3.51%
java
362
root
1
59
0
34M
75M
sleep
11:49
0.21%
Xsun
20675
cedavis
1
0
0
1584K
1064K
cpu
0:00
19%
top
20640
cedavis
1
0
0
1904K
1240K
sleep
0:00
0.14%
OSWatcher.sh
20657
cedavis
1
20
0
1904K
1240K
sleep
0:00
0.14%
oswsub.sh
16881
cedavis
1
59
0
199M
159K
sleep
23:04
0.10%
oracle
20671
cedavis
1
0
0
1904K
1240K
run
0:00
0.09%
oswsub.sh
20653
cedavis
1
0
0
1904K
1240K
sleep
0:00
0.09%
OSWatcherFM.sh
20665
cedavis
1
0
0
1904K
1240K
sleep
0:00
0.09%
oswsub.sh
20672
cedavis
1
0
0
1264K
1031K
sleep
0:00
0.09%
iostat
20659
cedavis
1
10
0
1904K
1240K
sleep
0:00
0.09%
oswsub.sh
20661
cedavis
1
30
0
1096K
880K
sleep
0:00
0.09%
vmstat
20668
cedavis
1
0
0
1904K
1240K
run
0:00
0.05%
oswsub.sh
20674
cedavis
1
0
0
968K
624K
sleep
0:00
0.05%
sleep
20663
cedavis
1
20
0
1080K
864K
sleep
0:00
0.05%
mpstat

2.4.6.1 Field Descriptions（1）load averages: 0.11, 0.07, 0.0612:50:36
This linedisplays the load averages over the last 1, 5 and 15 minutes as well as thesystem time. This is quite handy as top basically includes a timestamp alongwith the data capture.
Load average isdefined as the average number of processes in the run queue. A runnable Unixprocess is one that is available right now to consume CPU resources and is notblocked on I/O or on a system call. The higher the load average, the more workyour machine is doing.
The threenumbers are the average of the depth of the run queue over the last 1, 5, and15 minutes. In this example we can see that .11 processes were on the run queueon average over the last minute, .07 processes on average on the run queue overthe last 5 minutes, etc. It is important to determine what the average load ofthe system is through benc

hmarking and then look for deviations. A dramaticrise in the load average can indicate a serious performance problem.

（2）136 processes: 133 sleeping, 2running, 1 on cpu
This linedisplays the total number of processes running at the time of the last update.It also indicates how many Unix processes exist, how many are sleeping (blockedon I/O or a system call), how many are stopped (someone in a shell hassuspended it), and how many are actually assigned to a CPU. This last numberwill not be greater than the number of processors on the machine, and the valueshould also correlate to the machine's load average provided the load averageis less than the number of CPUs. Like load average, the total number ofprocesses on a healthy machine usually varies just a small amount over time.Suddenly having a significantly larger or smaller number of processes could bea warning sign.

（3）Memory: 2048M real, 1061M free,542M swap in use, 1605M swap free
The"Memory:" line is very important. It reflects how much real and swapmemory a computer has, and how much is free. "Real" memory is theamount of RAM installed in the system, a.k.a. the "physical" memory."Swap" is virtual memory stored on the machine's disk.
Once a computerruns out of physical memory, and starts using swap space, its performancedeteriorates dramatically. If you run out of swap, you'll likely crash yourprograms or the OS.

（4）Individual process fields
Field
Description
PID
Process ID of process
USERNAME
Username of process
THR
Process thread PRI Priority of process
NICE
Nice value of process
SIZE
Total size of a process, including code and data, plus the stack space in kilobytes
RES
Amount of physical memory used by the process
STATE
Current CPU state of process. The states can be S for sleeping, D for uninterrupted, R for running, T for stopped/traced, and Z for zombied
TIME
The CPU time that a process has used since it started
%CPU
The CPU time that a process has used since the last update
COMMAND
The task's command name

2.4.6.2 What to Look For(1)Large run queue. Large numberof processes waiting in the run queue may be an indication that your systemdoes not have sufficient CPU capacity.
(2)Process consuming lots of CPU.A process which is "hogging" CPU is always suspect. If this processis an oracle foreground process it's most likely running an expensive querythat should be tuned. Oracle background process should not hog CPU for longperiods of time.
(3)High load averages. Processesshould not be backed up on the run queue for extended periods of time.
(4)Low swap space. This is anindication you are running low on memory.

2.4.7 oswvmstat<node_name>_vmstat_YY.MM.DD:HH24.dat
These files willcontain output from the 'vmstat' command that is obtained and archive byOSWatcher Black Box at specified intervals. These files w

ill only existif 'vmstat' is installed on the OS and if the OSWbb user has privileges to runthe utility.
--这个文件包含vmstat 命令的内容。

The name vmstatcomes from "report virtual memory statistics". The vmstatutility does a bit more than this, though. In addition to reporting virtualmemory, vmstat reports certain kernel statistics about processes, disk, trap,and CPU activity.

The vmstatutility is fairly standard across UNIX platforms. Each platform will have aslightly different version of the vmstat utility. You should consult youroperating system man pages for specifics. The sample provided below is forSolaris.

OSWbb runs thevmstat utility at the specified interval and stores the data in the oswvmstatsubdirectory under the archive directory. The data is stored in hourly archivefiles. Each entry in the file contains a timestamp prefixed by *** embedded inthe vmstat output. Notice there are 3 entries for each timestamp. You shouldalways ignore the first entry as this entry is always invalid. The second andthird entry will be valid but the second entry will be 1 sec later than thetimestamp and the third entry will be 2 seconds later than the timestamp.

Sample vmstat file produced by OSWbb
***Fri Jan 28 12:50:36 EST 2005
procs
memory
page
disk
faults
cpu
r
b
w
swap
free
re
mf
pi
po
fr
de
sr
dd
f0
s0

in
sy
cs
us
sy
id
0
0
0
1761344
1246520
1
6
0
0
0
0
0
2
0
0
0
380
1364
900
4
1
95
0
0
0
1643920
1086776
331
1485
8
16
16
0
0
31
0
0
0
447
4966
1315
15
31
54
0
0
0
1643872
1086728
6
0
0
0
0
0
0
0
0
0
0
389
1472
932
0
0
100

2.4.7.1 Field DescriptionsThe vmstatoutput is actually broken up into six sections: procs, memory, page, disk,faults and CPU. Each section is outlined in the following table.

Field
Description
PROCS
r
Number of processes that are in a wait state and basically not doing anything but waiting to run
b
Number of processes that were in sleep mode and were interrupted since the last update
w
Number of processes that have been swapped out by mm and vm subsystems and have yet to run
MEMORY
swap
The amount of swap space currently available free The size of the free list
PAGE
re
page reclaims
mf
minor faults
pi
kilobytes paged in
po
kilobytes paged out
fr
kilobytes freed
de
anticipated short-term memory shortfall (Kbytes)
sr
pages scanned by clock algorithm

DISK
Bi
Disk blocks sent to disk devices in blocks per second
FAULTS
In
Interrupts per second, including the CPU clocks
Sy
System calls
Cs
Context switches per second within the kernel
CPU
Us
Percentage of CPU cycles spent on user processes
Sy
Percentage of CPU cycles spent on system processes
Id
Percentage of unused CPU cycles or idle time when the CPU is basically doing nothing

2.4.7.2 What to look for The followinginformation should be used as a guideline and not considered hard and fastrules. The information documented below comes from Adrian Cockcroft's book, SunPerformance Tuning. Other operating systems like HP and Linux may havedifferent thresholds.
(1)Large run queue. AdrianCockcroft defines anything over 4 processes per CPU on the run queue as thethreshold for CPU saturation. This is certainly a problem if this last for anylong period of time.
(2)CPU utilization. The amount oftime spent running system code should not exceed 30% especially if idle time isclose to 0%.
(3)A combination of large runqueue with no idle CPU is an indication the system has insufficient CPUcapacity.
(4)Memory bottlenecks aredetermined by the scan rate (sr) . The scan rate is the pages scanned by theclock algorithm per second. If the scan rate (sr) is continuously over 200pages per second then there is a memory shortage.
(5)Disk problems may be identifiedif the number of processes blocked exceeds the number of processes on runqueue.

三. 配置OS Watcher 自启动MOS：How To Start OSWatcher Black Box Every System Boot [ID 580513.1]

Oracle supportoften recommends that the OSWatcher Black Box(*) tool be run for an extendedperiod. Should the system reboot during this time, the systemadministrator must manually restart the OSWatcher, and allow it to run untilthe necessary data have been collected.
--OSW收集的信息越多，更有利用与系统的分析，所以我们可以设置OSW的自启动。

To automate thisprocedure, a simple shell script can be used. Care must be taken to avoidaccidentally overwriting the log data upon a restart. The scriptmust also ensure that the OSWatcher tool be run using the correct userprivileges.
--让OSW 自启动可以通过脚本来实现，但是要注意的问题就是要避免在故障启动后对原来日志的覆盖，因为这些数据对分析很重要，如果在OSW自动启
动时覆盖了这些历史数据，就不能帮助我们分析问题。

osw-service 包可以从MOS上下载，也可以从我的CSDN下载：
https://www.doczj.com/doc/e19328052.html,/detail/tianlesoftware/4109807

Theosw-service RPM package provides a script to run the OSWatcher at system boot,and to stop it down gracefully at system shutdown. It provides an"osw" service that can be controlled using the standardLinuxinit(1)script controls:
--osw-service RPM 包提供了脚本让系统重启时运行OSWaterch，并且在系统shutdown时gracefull

y的stop。这个包提供了一个osw的服务来控制linux init(1)脚本：

# /sbin/chkconfig osw on# /sbin/service osw start

Theosw-serviceRPMpackage is available as an attachment to this note. Download and installit as any other RPM package. A source RPM is provided for completeness.

[root@rac1 OS Watcher Tool]# rpm -ivhosw_service_0_0_2_1_noarch.rpm
Preparing...########################################### [100%]
1:osw-service########################################### [100%]

Before startingthe service, first change the settings inthe/etc/sysconfig/oswconfiguration file to fit your situation:
--安装好osw service 后，在启动之前，需要修改/etc/sysconfig/osw的配置，具体如下：

# Set OSWHOME to the directory where yourOSWatcher tools are installedOSWHOME=/u01/oswbb
# Set OSWINTERVAL to the number of secondsbetween collectionsOSWINTERVAL=60# Set OSRETENTION to the number of hours logs are to be retainedOSWRETENTION=1# Set OSUSER to the owner of the OSWHOME directoryOSWUSER=oracle

Once this is done, the command:--修改完毕就可以启动OSWatcher 自启动脚本：
# /sbin/service osw start

注意个问题：
[root@rac1 u01]# service osw start
Starting OSWatcher: bash: line 7:./startOSW.sh: No such file or directory
[FAILED]
因为OSWatcher 在4.0 以后做了修改，这里我们启动时报错，只需要将startOSWbb.sh 复制一份成startOSW.sh 就可以了。
rac1:/u01/oswbb> cp startOSWbb.sh startOSW.sh

will start the OSWatcher tool upon everyboot.
--之后每次系统重启，OSWatcher 都会自动启动。

The OSWatcherlogs will be stored in${OSWHOME}/archiveas normal. Whentheosw-serviceis started, anyprior${OSWHOME}/archivedirectory will be movedto${OSWHOME}/archive-<timestamp>first.
--OSWatcher 的log 存储在${OSWHOME}/archive目录下，当osw-service 启动时，任何之前的${OSWHOME}/archive 目录都会先被移到${OSWHOME}/archive-<timestamp>目录，然后启动，这样就避免了日志被覆盖的可能型。

四．OS Watcher Black Box Analyzer安装配置MOS：OS Watcher Black Box Analyzer User Guide [ID 461053.1]

我们用OSWatcher收集了数据存储到归档里，但是这些文件不利于分析，所以Oracle 提供了OSWbba工具，其可以分析OSWbb收集的数据并用图表展示出来。

OSWbba iswritten in java and requires as a minimum java version 1.4.2 or higher. OSWbbacan run on any Unix X Windows or PC Windows platform. An X Windows environmentis required because OSWbba uses Oracle Chartbuilder which requires it.
--OSWbba 是用java 写的，所以运行OSWbba 至少需要Java1.4.2 的版本。OSWbba 可以运行在任何平台下。

OSWbba parsesall the OSWbb vmstat, iostat and top utility log files contained in an archivedirectory. Once the data is parsed, the user is presented with a commandline menu which has options for both displaying graphs, creating binary giffiles of these graphs, generating an ht

ml report containing all the graphs withnarrative on what to look for, and new in this release, the ability toself-analyze the files OSWbb creates.
--OSWbb 通过vmstat，iostat等命令收集数据存放在归档目录里，OSWbba分析这些数据。数据分析之后，用户就可以通过命令行目录来提取这些数据，可以选择图表或者生成图形的gif 文件，亦或html报告。
也就是说，OSWbba 对OSWbb 收集的数据进行一个图形的展现。

OSWbba is certified to run on the followingplatforms:
--OSWbba 可以在一下平台运行：
(1)AIX
(2)Solaris
(3)HP-UX
(4)Linux
(5)Windows XP

2.1 Installing OSWbbaOSWbba requiresno installation. It comes shipped as a standalone java jar file with OSWbbv4.0.0 and higher.
--OSWbba 不需要安装，其是一个独立的java 包。

2.2 Starting OSWbba在启动OSWbba 工具之前，必须先安装java 1.4.2 或以上版本。当然如果安装过了Oracle，那么oracle 安装目录里也有java。
[root@rac1oswbb]# su - oracle
rac1:/home/oracle>java -version
java version"1.6.0_20"
OpenJDK RuntimeEnvironment (IcedTea6 1.9.7) (rhel-1.39.1.9.7.el6-x86_64)
OpenJDK 64-BitServer VM (build 19.0-b09, mixed mode)
--我这里安装的java 是1.6 的版本。

如果使用Oracle的Java，那么需要修改一下环境变量，在Path里添加Java的路径，如：
PATH=$ORACLE_HOME/jdk/bin:$PATH

rac1:/u02/app/oracle/product/11.2.0/db_1/jdk/bin>./java -version
java version "1.5.0_30"
Java(TM) 2 Runtime Environment, StandardEdition (build 1.5.0_30-b03)
Java HotSpot(TM) 64-Bit Server VM (build1.5.0_30-b03, mixed mode)
--我这里的oracle是11.2.0.3，其自带的java 版本是1.5.

运行OSWbba 需要用-i 参数指定input 目录, 这里的目录是OSWbb log归档的全路径。这个归档目录必须和OSWbb 的目录结构相同，其必须包含其他的子目录，如oswvmstat,oswiostat, oswps, oswtop, oswnetstat 等。

--注意这里显示图片需要条用X windows，所以我们要在图形窗口中执行：

[root@rac1 u02]# xhost +
access control disabled, clients canconnect from any host

然后执行如下命令：
rac1:/u01/oswbb> java -jar oswbba.jar -i/u01/oswbb/archive

Starting OSW Black Box Analyzer V4.0
OSWatcher Black Box Analyzer Written byOracle Center of Expertise
Copyright (c) 2012 by Oracle Corporation

Parsing Data. Please Wait...

Parsing file rac1_iostat_12.03.03.2200.dat...

Parsing file rac1_vmstat_12.03.03.2200.dat...

Parsing file rac1_top_12.03.03.2200.dat ...

Parsing Completed.

Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter 6 to Generate All CPU Gif Files
Enter 7 to Generate All Memory Gif Files
Enter 8 to Generate All Disk Gif Files

Enter L to Specify Alternate Location ofGif Directory
Enter T to

Specify Different Time Scale
Enter D to Return to Default Time Scale
Enter R to Remove Currently DisplayedGraphs
Enter P to Generate A Profile
Enter A to Analyze Data
Enter Q to Quit Program

Please Select an Option:2

这里按Q退出OSWbba。

相关分析的图形结果如下：

上面是在交互模式下进行，也可以使用命令行执行：
java -jar oswbba.jar -i <fully qualifiedpath name of an osw archive directory> -P <name> -L <name> -6 -7-8 -B <time> -E <time>
这里的参数，在上面有说明，6，7，8 是生成图片。

OSWbba parsesall the archive files in memory prior to generating graphs or performing ananalysis. If you have a large amount of files to parse you may need to allocatemore memory in the java heap. If you experience any error messages regardingout of memory such as https://www.doczj.com/doc/e19328052.html,ng.OutOfMemoryError, you may have to increase thesize of the java heap. To increase the size of the java heap use the -Xmx flag.
--OSWbba 解析所有的归档文件在内存中进行，然后生成图表，如果有大量的文件需要解析，可以指定java heap 大小。

$java -jar -Xmx512M OSWbba.jar -i /u01/oswbb/archive
Starting OSWbba V4.0.0OSWatcher Black Box Analyzer Written by Oracle Center of ExpertiseCopyright (c) 2012 by Oracle Corporation
Parsing Data. Please Wait...