三 性能调试---CPU性能分析( 三 )



cpu: cpu number (only on a multi-processor system with the -M option);
\usr: user mode;
%sys: system mode;
%wio: idle with some process waiting for I/O (only block I/O, raw I/O, or VM pageins/swapins indicated);
%idle: otherwise idle;
对结果的分析

首先,我们看%idle列的值,如果为接近零,则再看对应%wio列的值,如果这列的大于7,则表明系统的磁盘或其他I/O可能有问题,需要进一步的分析:

用iostat命令分析各个磁盘的传输闲忙状况,如#iostat -t 5 2,每隔5秒取样一次,共取2次;
用sar -d命令分析各块设备(磁盘、磁带)活动情况;
用sar -b命令分析系统的缓存的活动情况;
用sar -w命令分析进程的deactivation/reactivation and switching activities of the system;
如果%idle列很小,而对应的%wio列的值也很小,这时,我们查看\usr列和%sys列的值 。如果\usr列的值很大,说明有用户进程占用很多CPU时间;如果%sys列的值很大,则说明系统管理方面花了很多时间 。需要进一步的分析:

用GlancePlus对占用CPU时间最大的进程进行单独分析,为什么它会占用如此多的CPU时间 。
如果%sys列的值很大,可以用SAR -C命令对系统调用进行进一步分解,看这些系统调用主要是做些什么 。同时,还必须分析是否有其他瓶颈,如paging也会引起%sys的值很大,这时,可以用sar -q查看系统的运行进程队列长度,也可以用GlancePlus和vmstat查看内存的使用情况;

利用SAR工具分析运行进程队列长度


利用SAR进行运行进程队列长度分析的命令形式:

#sar -q,这时数据是通过sa1在后台定时生成;
#sar -q 5 100,每隔5秒取样一次,共取100次;
SAR -q: Report average queue length while occupied, and percent of time occupied. On a multi-processor Machine, if the -M option is used together with the -q option, the per-CPU run queue as well as the average run queue of all the processors are reported. If the -M option is not used, only the average run queue information of all the processors is reported:

cpu: cpu number (only on a multi-processor system with the -M option);
runq-sz: Average length of the run queue(s) of processes (in memory and runnable);
%runocc: The percentage of time the run queue(s) were occupied by processes (in memory and runnable);
swpq-sz: Average length of the swap queue of runnable processes (processes swapped out but ready to run);
%swpocc: The percentage of time the swap queue of runnable processes (processes swapped out but ready to run) was occupied.
对结果的分析:

这些数据越小越好 。

如果runq-sz大于4,或者%swapocc大于5时,则表明系统的CPU或内存可能有问题,需要进一步的分析:

用sar -u命令分析CPU的使用情况;
用sar -w命令分析进程的deactivation/reactivation and switching activities of the system;
也可以用GlancePlus;

利用SAR工具分析系统调用


利用SAR进行系统调用分析的命令形式:

#sar -c,这时数据是通过sa1在后台定时生成;
#sar -c 5 100,每隔5秒取样一次,共取100次;
SAR -c: Report system calls:

scall/s: Number of system calls of all types per second;
sread/s: Number of read() and/or readv() system calls per second;
swrit/s: Number of write() and/or writev() system calls per second;
swpq-sz: Average length of the swap queue of runnable processes (processes swapped out but ready to run);
fork/s: Number of fork() and/or vfork() system calls per second;
exec/s: Number of exec() system calls per second;
rchar/s: Number of characters transferred by read system calls block devices only) per second;
wchar/s: Number of characters transferred by write system calls (block devices only) per second.
对结果的分析:

如果scall/s列的值很大,那么这么多的系统调用的原因就必须仔细分析了 。

推荐阅读