jak jsem psal, tak SNMP není moc použitené pro monitoring, proto jsem zvolil raději redhatovskou utilitu sar (System Activity Reporter)
Instalace:
yum install sysstat systemctl start sysstat systemctl enable sysstat
Nastavení:
cat /etc/cron.d/sysstat
cd /var/log/sa/
Example:
man sar 😉 základ
zjednodušeně parametry -r (RAM), -n DEV (Network), -u (CPU), -d -p (Disk)
// zobrazí 3 základní výstupy po jedné sekundě: # sar 1 3 Linux 3.10.0-957.el7.x86_64 (feeder01) 09/30/2019 x86_64 (24 CPU) 03:22:24 PM CPU %user %nice %system %iowait %steal %idle 03:22:25 PM all 0.08 0.00 0.08 0.01 0.00 99.83 03:22:26 PM all 0.08 0.00 0.04 0.00 0.00 99.87 03:22:27 PM all 0.13 0.00 0.08 0.01 0.00 99.79 Average: all 0.10 0.00 0.07 0.01 0.00 99.83
//přes awk lze odfiltrovat sloupec: # sar -u 1 3 | awk '{print $7}' | grep -v CPU %iowait 0.01 0.01 0.00 0.01 # sar 1 3 | awk '{if($7 >= 0.01 ) print $7}' | grep -v CPU 0.01
součástí balíku sar je i příkaz sadf, kterým lze exportovat pěkně do souboru (za dvěma pomlčkama je parametr pro sar a /var/log/sa/sa24 nám říká že analyzujeme 24den v měsíci):
RAM: sadf -d /var/log/sa/sa24 -- -r hostname;interval;timestamp;kbmemfree;kbmemused;%memused;kbbuffers;kbcached;kbcommit;%commit;kbactive;kbinact;kbdirty server01;600;2019-09-24 08:30:01 UTC;91525908;40266352;30.55;6888;5086476;35104264;25.81;35259808;3693476;0 server01;-1;2019-09-24 08:30:50 UTC;LINUX-RESTART kbmemfree: amount of free memory available in kilobytes. kbmemused: amount of used memory in kilobytes. This does not take into account memory used by the kernel itself. %memused: percentage of used memory. kbbuffers: amount of memory used as buffers by the kernel in kilobytes. kbcached: amount of memory used to cache data by the kernel in kilobytes. kbcommit: amount of memory in kilobytes needed for current workload. This is an estimate of how much RAM/swap is needed to guarantee that there never is out of memory. %commit: percentage of memory needed for current workload in relation to the total amount of memory (RAM+swap). This number may be greater than 100% because the kernel usually overcommits memory. kbactive: amount of active memory in kilobytes (memory that has been used more recently and usually not reclaimed unless absolutely necessary). kbinact: amount of inactive memory in kilobytes (memory which has been less recently used. It is more eligible to be reclaimed for other purposes).
Network sadf -d /var/log/sa/sa24 -- -n DEV | grep bond hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s server01;600;2019-09-24 08:30:01 UTC;bond1;5.64;2.23;1.34;1.00;0.00;0.00;4.33 server01;600;2019-09-24 08:30:01 UTC;bond0;8.47;1.27;0.86;0.13;0.00;0.00;6.95 IFACE: name of the network interface for which statistics are reported. rxpck/s: total number of packets received per second. txpck/s: total number of packets transmitted per second. rxkB/s: total number of kilobytes received per second. txkB/s: total number of kilobytes transmitted per second. rxcmp/s: number of compressed packets received per second. txcmp/s: number of compressed packets transmitted per second. rxmcst/s: number of multicast packets received per second.
CPU sadf -d /var/log/sa/sa24 -- -u hostname;interval;timestamp;CPU;%user;%nice;%system;%iowait;%steal;%idle server01;600;2019-09-24 08:30:01 UTC;-1;0.12;0.00;0.06;0.01;0.00;99.80 %user: percentage of CPU utilisation that occurred while executing at the user level (application). Note that this field includes time spent running virtual processors. %nice: percentage of CPU utilisation that occurred while executing at the user level with nice priority. %system: percentage of CPU utilisation that occurred while executing at the system level (kernel). Note that this field includes time spent servicing hardware and software interrupts. %iowait: percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request. %steal: percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor. %idle: percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.
DISK sadf -d /var/log/sa/sa24 -- -d -p | grep sd hostname;interval;timestamp;DEV;tps;rd_sec/s;wr_sec/s;avgrq-sz;avgqu-sz;await;svctm;%util server01;600;2019-09-24 08:30:01 UTC;sda;1.32;0.00;16.86;12.81;0.00;0.15;0.02;0.00 tps: indicate the number of transfers per second that were issued to the device. rd_sec/s: number of sectors read from the device. The size of a sector is 512 bytes. wr_sec/s: number of sectors written to the device. The size of a sector is 512 bytes. avgrq-sz: the average size (in sectors) of the requests that were issued to the device. avgqu-sz: the average queue length of the requests that were issued to the device. await: the average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them. svctm: the average service time (in milliseconds) for I/O requests that were issued to the device. Deprecated, will be removed in a future sysstat version. %util: percentage of CPU time during which I/O requests were issued to the device (bandwidth utilisation for the device). Device saturation occurs when this value is close to 100%.