TrumanWong

atop

Tools for monitoring Linux system resources and processes

Supplementary instructions

[Non-internal program, installation required] It records the running status of the system at a certain frequency. The collected data includes system resource (CPU, memory, disk and network) usage and process running status, and can be saved in the form of log files. In the disk, after a problem occurs on the server, we can obtain the corresponding atop log file for analysis. atop is an open source software, we can obtain its source code and rpm installation package from here.

grammar

atop(option)(parameter)

illustrate

ATOP column: This column displays the host name, information sampling date and time point

PRC column: This column displays the overall running status of the process

  • The sys and usr fields indicate the running time of the process in kernel mode and user mode respectively.
  • The #proc field indicates the total number of processes
  • The #zombie field indicates the number of zombie processes
  • The #exit field indicates the number of processes that exited during the atop sampling period

CPU column: This column displays the usage of the entire CPU (i.e., multi-core CPU as a whole CPU resource). We know that the CPU can be used to execute processes, handle interrupts, or it can be in an idle state (there are two types of idle states) , one is the active process waiting for disk IO causing the CPU to be idle, the other is completely idle)

  • The sys and usr fields indicate the proportion of CPU time occupied by the process in kernel mode and user mode when the CPU is used to process the process.
  • The irq field indicates the proportion of time the CPU was used to handle interrupts
  • The idle field indicates the proportion of time the CPU is in a completely idle state
  • The wait field indicates the proportion of time the CPU is in the "process waiting for disk IO causing the CPU to be idle" state

The sum of the indication values of each field in the CPU column is N00%, where N is the number of CPU cores.

cpu column: This column displays the usage of a certain core CPU. The meaning of each field can be referred to the CPU column. The sum of the values of each field is 100%.

CPL column: This column displays the CPU load

  • avg1, avg5 and avg15 fields: average number of processes in the run queue over the past 1 minute, 5 minutes and 15 minutes -csw field indicates the number of context exchanges
  • The intr field indicates the number of interrupt occurrences

MEM column: This column indicates memory usage

  • The tot field indicates the total amount of physical memory
  • The free field indicates the size of free memory
  • The cache field indicates the memory size used for page caching
  • The buff field indicates the memory size used for file caching
  • The slab field indicates the memory size occupied by the system kernel

SWP column: This column indicates swap space usage

  • The tot field indicates the total amount of swap area
  • The free field indicates the size of free swap space

PAG column: This column indicates virtual memory paging status

swin, swout fields: number of memory pages swapped in and out

DSK column: This column indicates disk usage. Each disk device corresponds to one column. If there is an sdb device, then an additional column of DSK information is added.

  • sda field: disk device identification
  • busy field: disk busy ratio
  • read and write fields: number of read and write requests

NET column: Multiple NET columns show the network status, including the transport layer (TCP and UDP), IP layer and each active network port information

  • The XXXi field indicates the number of packets received by each layer or active network port
  • The XXXo field indicates the number of packets sent by each layer or active network port

atop log

The sampling pages at each time point are combined to form an atop log file. We can use the "atop -r XXX" command to view the log file. In what form do you save atop log files?

For how to save atop log files, we can do this:

  • Save an atop log file every day, which records the information of the day
  • The log file is named in the format of "atop_YYYYMMDD"
  • Set the log expiration date and automatically delete log files from a period of time ago

In fact, atop developers have provided the above log saving methods, and the corresponding atop.daily script can be found in the source code directory. In the atop.daily script, we can change the atop information sampling period by modifying the INTERVAL variable (default is 10 minutes); change the log storage days by modifying the value in the following command (default is 28 days):

(sleep 3; find $LOGPATH -name 'atop_*' -mtime +28 -exec rm {} \; )&

Finally, we modify the cron file and execute the atop.daily script every morning:

0 0 * * * root /etc/cron.daily/atop.daily

Relevant information