Monitoring linux – 2. sar

jak jsem psal, tak SNMP není moc použitené pro monitoring, proto jsem zvolil raději redhatovskou utilitu sar (System Activity Reporter)

Instalace:

yum install sysstat
systemctl start sysstat
systemctl enable sysstat

Nastavení:

cat /etc/cron.d/sysstat
cd /var/log/sa/

Example:

man sar 😉 základ
zjednodušeně parametry -r (RAM), -n DEV (Network), -u (CPU), -d -p (Disk)

// zobrazí 3  základní  výstupy po jedné sekundě:
# sar 1 3
Linux 3.10.0-957.el7.x86_64 (feeder01)  09/30/2019      x86_64        (24 CPU)
03:22:24 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
03:22:25 PM     all      0.08      0.00      0.08      0.01      0.00     99.83
03:22:26 PM     all      0.08      0.00      0.04      0.00      0.00     99.87
03:22:27 PM     all      0.13      0.00      0.08      0.01      0.00     99.79
Average:        all      0.10      0.00      0.07      0.01      0.00     99.83
//přes awk lze odfiltrovat sloupec:
# sar -u 1 3 | awk '{print $7}' | grep -v CPU
%iowait
0.01
0.01
0.00
0.01
# sar 1 3 | awk '{if($7 >= 0.01 ) print $7}'  | grep -v CPU 
 0.01

součástí balíku sar je i příkaz sadf, kterým lze exportovat pěkně do souboru (za dvěma pomlčkama je parametr pro sar a /var/log/sa/sa24 nám říká že analyzujeme 24den v měsíci):

RAM:

sadf -d /var/log/sa/sa24 -- -r
hostname;interval;timestamp;kbmemfree;kbmemused;%memused;kbbuffers;kbcached;kbcommit;%commit;kbactive;kbinact;kbdirty
server01;600;2019-09-24 08:30:01 UTC;91525908;40266352;30.55;6888;5086476;35104264;25.81;35259808;3693476;0
server01;-1;2019-09-24 08:30:50 UTC;LINUX-RESTART
    kbmemfree: amount of free memory available in kilobytes.
    kbmemused: amount of used memory in kilobytes. This does not take into account memory used by the kernel itself.
    %memused: percentage of used memory.
    kbbuffers: amount of memory used as buffers by the kernel in kilobytes.
    kbcached: amount of memory used to cache data by the kernel in kilobytes.
    kbcommit: amount of memory in kilobytes needed for current workload. This is an estimate of how much RAM/swap is needed to guarantee that there never is out of memory.
    %commit: percentage of memory needed for current workload in relation to the total amount of memory (RAM+swap). This number may be greater than 100% because the kernel usually overcommits memory.
    kbactive: amount of active memory in kilobytes (memory that has been used more recently and usually not reclaimed unless absolutely necessary).
    kbinact: amount of inactive memory in kilobytes (memory which has been less recently used. It is more eligible to be reclaimed for other purposes).
Network

 sadf -d /var/log/sa/sa24 -- -n DEV | grep bond
 hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s
server01;600;2019-09-24 08:30:01 UTC;bond1;5.64;2.23;1.34;1.00;0.00;0.00;4.33
server01;600;2019-09-24 08:30:01 UTC;bond0;8.47;1.27;0.86;0.13;0.00;0.00;6.95
    IFACE: name of the network interface for which statistics are reported.
    rxpck/s: total number of packets received per second.
    txpck/s: total number of packets transmitted per second.
    rxkB/s: total number of kilobytes received per second.
    txkB/s: total number of kilobytes transmitted per second.
    rxcmp/s: number of compressed packets received per second.
    txcmp/s: number of compressed packets transmitted per second.
    rxmcst/s: number of multicast packets received per second.
CPU

 sadf -d /var/log/sa/sa24 -- -u
 hostname;interval;timestamp;CPU;%user;%nice;%system;%iowait;%steal;%idle
server01;600;2019-09-24 08:30:01 UTC;-1;0.12;0.00;0.06;0.01;0.00;99.80
    %user: percentage of CPU utilisation that occurred while executing at the user level (application). Note that this field includes time spent running virtual processors.
    %nice: percentage of CPU utilisation that occurred while executing at the user level with nice priority.
    %system: percentage of CPU utilisation that occurred while executing at the system level (kernel). Note that this field includes time spent servicing hardware and software interrupts.
    %iowait: percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
    %steal: percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.
    %idle: percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
DISK 

 sadf -d /var/log/sa/sa24 -- -d -p | grep sd
 hostname;interval;timestamp;DEV;tps;rd_sec/s;wr_sec/s;avgrq-sz;avgqu-sz;await;svctm;%util
server01;600;2019-09-24 08:30:01 UTC;sda;1.32;0.00;16.86;12.81;0.00;0.15;0.02;0.00
    tps: indicate the number of transfers per second that were issued to the device.
    rd_sec/s: number of sectors read from the device. The size of a sector is 512 bytes.
    wr_sec/s: number of sectors written to the device. The size of a sector is 512 bytes.
    avgrq-sz: the average size (in sectors) of the requests that were issued to the device.
    avgqu-sz: the average queue length of the requests that were issued to the device.
    await: the average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
    svctm: the average service time (in milliseconds) for I/O requests that were issued to the device. Deprecated, will be removed in a future sysstat version.
    %util: percentage of CPU time during which I/O requests were issued to the device (bandwidth utilisation for the device). Device saturation occurs when this value is close to 100%.

About Author:

Error! Keyboard not detected. Press any key to continue.