Linux default checks and policies
Agent Client Collector provides the following default checks and policies for Linux Metrics monitoring.
Linux monitoring metrics checks
| Check | Metric Name | Resource | Description | Units | Featured Metric | Anomaly Detection |
|---|---|---|---|---|---|---|
| os.linux.metrics-process-usage | proc.acc.running | process-name | Number of processes running with this name(acc) | count | ||
| proc.acc.cpuPercent | process-name | Percentage of cpu taken by the process. | percent | |||
proc.acc.memPercent |
process-name | Percentage of memory taken by the process. | percent | |||
| os.linux.metrics-reboot-count-today | reboot.count.today | empty | Number of reboot done on today | count | ||
| os.linux.metrics-system-cpu | cpu.total.user | total | Normal processes executing in user mode; cpu.total.user is the total of the cpuN.user metrics. | count | ||
| cpu.total.nice | total | Niced processes executing in user mode; cpu.total.nice is the total of the cpuN.nice metrics. | count | |||
| cpu.total.system | total | Time the CPU spent running the kernel; cpu.total.system is the total of the cpuN.system metrics. | sec | |||
| cpu.total.idle | total | Total time the CPU spent in an idle state.; cpu.total.idle is the total of the cpuN.idle metrics. | sec | |||
| cpu.total.iowait | total | Total time the CPU spent waiting for IO operations to complete.; cpu.total.iowait is the total of the cpuN.iowait metrics. | sec | |||
| cpu.total.irq | total | Total time that the processor is spending on handling Interrupts.; cpu.total.irq is the total of the cpuN.irq metrics. | sec | |||
| cpu.total.softirq | total | Time spent on servicing soft interrupt requests; cpu.total.softirq is the total of the cpuN.softirq metrics. | sec | |||
| cpu.total.steal | total | Total time the virtual CPU spent waiting for the hypervisor to service another virtual CPU. Only applies to virtual machines. | sec | |||
| cpu.total.guest | total | Total time the CPU spent running the virtual processor. Only applies to hypervisors. | sec | |||
| cpu.total.guest_nice | total | Total time the CPU spent running as nice guest OS. cpu.total.guset_nice si the total of the cpuN.guest_nice metrics | sec | |||
| cpu.<cpu-core>.user | cpu-core | Time spent with normal processing in user mode. | sec | |||
| cpu.<cpu-core>.nice | cpu-core | Time spent with niced processes in user mode. | sec | |||
| cpu.<cpu-core>.system | cpu-core | Time spent running in kernel mode. | sec | |||
| cpu.<cpu-core>.idle | cpu-core | Time spent in vacations twiddling thumbs. | sec | |||
| cpu.<cpu-core>.iowait | cpu-core | Time spent waiting for I/O to completed. This is considered idle time too. | sec | |||
| cpu.<cpu-core>.irq | cpu-core | Time spent serving hardware interrupts. | sec | |||
| cpu.<cpu-core>.softirq | cpu-core | Time spent serving software interrupts. | sec | |||
| cpu.<cpu-core>.steal | cpu-core | Time stolen by other operating systems running in a virtual environment. | sec | |||
| cpu.<cpu-core>.guest | cpu-core | Time spent for running a virtual CPU or guest OS under the control of the kernel. | sec | |||
| cpu.<cpu-core>.guest_nice | cpu-core | Total time the CPU spent running as nice guest OS. | sec | |||
| cpu.intr | empty | Interrupts serviced since boot time. | sec | |||
| cpu.ctxt | empty | Total number of context switches across all CPUs. | count | |||
| cpu.btime | empty | The time the system booted | sec | |||
| cpu.processes | empty | The number of processes and threads created, which includes (but is not limited to) those created by calls to the fork() and clone() system calls. | count | |||
| cpu.procs_running | empty | The total number of processes running on all CPUs. | count | |||
| cpu.procs_blocked | empty | The number of processes currently blocked, waiting for I/O to complete. | count | |||
| cpu.cpu_count | empty | Number of cpu on the system | count | |||
| cpu.<cpu-core>.cores | cpu-core | The number of CPU cores | core count | |||
| os.linux.metrics-system-cpu-load | load_avg.one | empty | The average system load over one minute. | thread count | yes | yes |
| load_avg.five | empty | The average system load over five minutes. | thread count | yes | yes | |
| load_avg.fifteen | empty | The average system load over fifteen minutes. | thread count | yes | yes | |
| load_avg.norm.one | empty | The average system load over one minute normalized by the number of CPUs. | thread count | |||
| load_avg.norm.five | empty | The average system load over five minutes normalized by the number of CPUs. | thread count | |||
| load_avg.norm.fifteen | empty | The average system load over fifteen minutes normalized by the number of CPUs. | thread count | |||
| os.linux.metrics-system-cpu-percentage | cpu.avgutilization_percentage | empty | Percent of cpu was used on average | percent | ||
| cpu.user_percentage | empty | Percent of time total cpu was used by normal processes in user mode | percent | yes | yes | |
| cpu.nice_percentage | empty | Percent of time all cpus used by niced processes in user mode | percent | yes | yes | |
cpu.system_percentage |
empty | The percent of time the CPU spent running the kernel. | percent | yes | yes | |
| cpu.idle_percentage | empty | Percent of time all cpus were idle | percent | yes | yes | |
cpu.iowait_percentage |
empty | Percent of time all cpus waiting for I/O to complete | percent | yes | yes | |
| cpu.irq_percentage | empty | Percent of time all cpus servicing interrupts | percent | yes | yes | |
cpu.softirq_percentage |
empty | Percent of time all cpus servicing software interrupts | percent | yes | yes | |
| cpu.steal_percentage | empty | Percent of time all cpus serviced virtual hosts operating systems | percent | yes | yes | |
| cpu.guest_percentage | empty | Percent of time all cpus serviced guest operating system | percent | yes | yes | |
| os.linux.metrics-system-disk | disk.<disk-name>.reads | disk-name | Total number of reads completed successfully. | count | yes | yes |
| disk.<disk-name>.readsMerged | disk-name | Total number of reads merged | count | |||
| disk.<disk-name>.sectorsRead | disk-name | Total number of sectors read successfully. | count | |||
| disk.<disk-name>.readTime | disk-name | Total number of milliseconds spent by all reads. | millisec | |||
| disk.<disk-name>.writes | disk-name | Total number of writes completed successfully. | count | yes | yes | |
| disk.<disk-name>.writesMerged | disk-name | Total number of writes merged | count | |||
| disk.<disk-name>.sectorsWritten | disk-name | Total number of sectors written successfully. | count | |||
| disk.<disk-name>.writeTime | disk-name | Total number of milliseconds spent by all writes. | misllisec | |||
| disk.<disk-name>.ioInProgress | disk-name | Total number of I/Os currently in progress | count | |||
| disk.<disk-name>.ioTime | Total time spent doing I/Os | millisec | yes | yes | ||
| disk.<disk-name>.ioTimeWeighted | disk-name | Total time spent doing I/Os. This can provide an easy measure of both I/O completion time and the backlog that may be accumulating. | millisec | |||
| os.linux.metrics-system-disk-capacity | disk.<file-system-name>.total | file-system-name | The total size of the file system. | byte | ||
| disk.<file-system-name>.used | file-system-name | The total amount of space allocated to existing files in the file system. | byte | |||
| disk.<file-system-name>.avail | file-system-name | The total amount of space available within the file system. | byte | |||
| disk.<file-system-name>.used_percentage | file-system-name | The percentage of the available space that currently allocated to all files on the file system. | percent | |||
| disk.<file-system-name>.itotal | file-system-name | The total number of inodes on the file system. | count | |||
| disk.<file-system-name>.iused | file-system-name | The number of used inodes. | count | |||
| disk.<file-system-name>.iavail | file-system-name | The number of free (unused) inodes. | count | |||
| disk.<file-system-name>.iused_percentage | file-system-name | The percentage of used inodes. | percent | |||
| os.linux.metrics-system-disk-usage | disk_usage.<disk>.total | disk-name | Total amount of space available on this disk | bytes | ||
| disk_usage.<disk>.used | disk-name | Total amount of space used in this disk | bytes | |||
| disk_usage.<disk>.avail | disk-name | Total amount of space available on this disk | bytes | |||
| disk_usage.<disk>.used_percentage | disk-name | The percentage of space used on this disk | percent | yes | yes | |
| os.linux.metrics-system-memoryos.linux.metrics-system-memory-percent | memory.total | empty | Total usable RAM. | KB | ||
| memory.free | empty | Total free RAM. | KB | |||
| memory.available | empty | An estimate of how much memory is available for starting new applications, without swapping. | KB | |||
| memory.buffers | empty | Temporary storage used for raw disk blocks. | KB | |||
| memory.cached | empty | In-memory cache for files read from disk (the page cache). Does not include mem_swapcached. | KB | |||
| memory.swapTotal | empty | Total amount of swap space available. | KB | yes | yes | |
| memory.swapFree | empty | Amount of swap space that is currently unused. | yes | yes | ||
| memory.dirty | empty | Memory which is waiting to get written back to the disk. | KB | |||
| memory.swapUsed | empty | The amount of swap space in use. | KB | yes | yes | |
| memory.used | empty | The amount of RAM in use. | KB | |||
| memory.usedWOBuffersCaches | empty | The amount of memory in use. | KB | |||
| memory.freeWOBuffersCaches | empty | Value of MemAvailable from /proc/meminfo if present, but falls back to adding free + buffered + cached memory if not. | KB | |||
| memory.swapUsedPercentage | empty | Percent of swap space used. | percent | |||
| memory_percent.free | empty | Percent of free RAM | percent | yes | yes | |
| memory_percent.available | empty | Percent of Mem available | percent | yes | yes | |
| memory_percent.buffers | empty | Precent of Memory used for raw disk blocks | percent | yes | yes | |
| memory_percent.cached | empty | Percent of memory used for in-memory cache for files read from disk | percent | yes | yes | |
| memory_percent.dirty | empty | Percent of memory waiting to get written back to the disk. | percent | yes | yes | |
| memory_percent.swapUsed | empty | Percent of swap space used. | percent | yes | yes | |
| memory_percent.usedWOBuffersCaches | empty | Percent of memory is being used | percent | yes | yes | |
| memory_percent.freeWOBuffersCaches | empty | Percent of Memory available | percent | yes | yes | |
| os.linux.metrics-system-uptime | system.uptime(sec) | empty | The amount of time the system has been working and available. | sec | ||
| os.linux.metrics-memory-vmstat | vmstat.nr_free_pages | empty | Pages that are currently unused by the system. | pages | ||
| vmstat.nr_alloc_batch | empty | pages allocated to other domains due to insufficient memory in each domain of each NUMA | pages | |||
| vmstat.nr_inactive_anon | empty | memory pages in each domain of each NUMA node that have not been accessed for a long time | pages | |||
| vmstat.nr_active_anon | empty | Anonymous virtual memory pages that have been recently used | KB | |||
| vmstat.nr_inactive_file | empty | The memory page corresponding to the file that has not been accessed for a long time in each domain of each NUMA. | KB | |||
| vmstat.nr_active_file | empty | The memory page corresponding to the file that has been accesseed recently . | pages | |||
| vmstat.nr_unevictable | empty | The number of pages is in the unevictable (non-)LRU list | count | |||
| vmstat.nr_mlock | empty | Pages mapped into a VM_LOCKED VMA - are a class of unevictable pages. | pages | |||
| vmstat.nr_anon_pages | empty | Memory mapped pages that is not part of a file. | pages | |||
| vmstat.nr_mapped | empty | The number of memory mapped pages. | count | |||
| vmstat.nr_file_pages | empty | |||||
| vmstat.nr_dirty | empty | Pages waiting to be written to disk | pages | |||
| vmstat.nr_writeback | empty | Pages currently being written to disk | pages | |||
| vmstat.nr_slab_reclaimable | empty | Pages from the kernel slab memory usage that can be reclaimed | pages | |||
| vmstat.nr_slab_unreclaimable | empty | Pages from the kernel slab memory usage that cannot be reclaimed | pages | |||
| vmstat.nr_page_table_pages | empty | Pages allocated to page tables | pages | |||
| vmstat.nr_kernel_stack | empty | Amount of memory allocated to kernel stacks. | KB | |||
| vmstat.nr_unstable | empty | The number of unstable pages in each domain of each NUMA node | count | |||
| vmstat.nr_bounce | empty | |||||
| vmstat.nr_vmscan_write | empty | The number of dirty pages written back during a scan of LRU(s) | count | |||
| vmstat.nr_vmscan_immediate_reclaim | empty | |||||
| vmstat.nr_writeback_temp | empty | |||||
| vmstat.nr_isolated_anon | empty | The number of anonymous memory pages isolated in each domain of each NUMA node | count | |||
| vmstat.nr_isolated_file | empty | The number of pages of file storage pages isolated in each domain of each NUMA node | count | |||
| vmstat.nr_shmem | empty | The number of shared memory pages | count | |||
| vmstat.nr_dirtied | empty | The number of dirty pages in each domain of each NUMA node | count | |||
| vmstat.nr_written | empty | |||||
| vmstat.numa_hit | empty | The number of pages that were successfully allocated to this node. | count | |||
| vmstat.numa_miss | empty | The number of pages that were allocated on this node because of low memory on the intended node. | count | |||
| vmstat.numa_foreign | empty | The number of pages initially intended for this node that were allocated to another node instead. | count | |||
| vmstat.numa_interleave | empty | The number of interleave policy pages successfully allocated to this node. | count | |||
| vmstat.numa_local | empty | The number of pages successfully allocated on this node, by a process on this node | count | |||
| vmstat.numa_other | empty | The number of pages allocated on this node, by a process on another node. | count | |||
| vmstat.workingset_refault | empty | |||||
| vmstat.workingset_activate | empty | |||||
| vmstat.workingset_nodereclaim | empty | |||||
| vmstat.nr_anon_transparent_hugepages | empty | |||||
| vmstat.nr_free_cma | empty | Free continuous memory allocator pages in each domain of each NUMA | ||||
| vmstat.nr_dirty_threshold | empty | |||||
| vmstat.nr_dirty_background_threshold | empty | |||||
| vmstat.pgpgin | empty | The number of pages brought in from disk | count | |||
| vmstat.pgpgout | empty | The number of pages written out to disk | count | |||
| vmstat.pswpin | empty | The number of pages brought in from swap space | count | |||
| vmstat.pswpout | empty | The number of pages swapped out into swap space | count | |||
| vmstat.pgalloc_dma | empty | |||||
| vmstat.pgalloc_dma32 | empty | |||||
| vmstat.pgalloc_normal | empty | |||||
| vmstat.pgalloc_movable | empty | |||||
| vmstat.pgfree | empty | The number of pages are free since last boot | count | |||
| vmstat.pgactivat | empty | Number of page activations since last boot | count | |||
| vmstat.pgdeactivate | empty | Number of page deactivations since last boot | count | |||
| vmstat.pgfault | empty | Minor faults since last boot | pages | |||
| vmstat.pgmajfault | empty | Major faults since last boot | pages | |||
| vmstat.pglazyfreed | empty | |||||
| vmstat.pgrefill_dma | empty | |||||
| vmstat.pgrefill_dma32 | empty | |||||
| vmstat.pgrefill_normal | empty | Number of page refills since last boot | count | |||
| vmstat.pgrefill_movable | empty | |||||
| vmstat.pgsteal_kswapd_dma | empty | |||||
| vmstat.pgsteal_kswapd_dma32 | empty | |||||
| vmstat.pgsteal_kswapd_normal | empty | |||||
| vmstat.pgsteal_kswapd_movable | empty | |||||
| vmstat.pgsteal_direct_dma | empty | |||||
| vmstat.pgsteal_direct_dma32 | empty | |||||
| vmstat.pgsteal_direct_normal | empty | |||||
| vmstat.pgsteal_direct_movable | empty | |||||
| vmstat.pgscan_kswapd_dma | empty | |||||
| vmstat.pgscan_kswapd_dma32 | empty | |||||
| vmstat.pgscan_kswapd_normal | empty | Number of pages scanned by kswapd since boot | count | |||
| vmstat.pgscan_kswapd_movable | empty | |||||
| vmstat.pgscan_direct_dma | empty | |||||
| vmstat.pgscan_direct_dma32 | empty | |||||
| vmstat.pgscan_direct_normal | empty | Number of pages reclaimed since boot | count | |||
| vmstat.pgscan_direct_movable | empty | |||||
| vmstat.pgscan_direct_throttle | empty | |||||
| vmstat.zone_reclaim_failed | empty | |||||
| vmstat.pginodesteal | empty | |||||
| vmstat.slabs_scanned | empty | |||||
| vmstat.kswapd_inodesteal | empty | |||||
| vmstat.kswapd_low_wmark_hit_quickly | empty | |||||
| vmstat.kswapd_high_wmark_hit_quickly | empty | |||||
| vmstat.pageoutrun | empty | Number of times kswapd called page reclaim | count | |||
| vmstat.allocstall | empty | Number of times page reclaim was called directly (low memory) | count | |||
| vmstat.pgrotated | empty | |||||
| vmstat.drop_pagecache | empty | |||||
| vmstat.drop_slab | empty | |||||
| vmstat.numa_pte_updates | empty | |||||
| vmstat.numa_huge_pte_updates | empty | |||||
| vmstat.numa_hint_faults | empty | |||||
| vmstat.numa_hint_faults_local | empty | |||||
| vmstat.numa_pages_migrated | empty | |||||
| vmstat.pgmigrate_success | empty | |||||
| vmstat.pgmigrate_fail | empty | |||||
| vmstat.compact_migrate_scanned | empty | |||||
| vmstat.compact_free_scanned | empty | |||||
| vmstat.compact_isolated | empty | |||||
| vmstat.compact_stall | empty | The number of times a process stalls to run memory compaction so that a huge page is free for use. | count | |||
| vmstat.compact_fail | empty | The number of times the system tries to compact memory but failed. | count | |||
| vmstat.compact_success | empty | The number of times the system compacted memory and freed a huge page for use. | count | |||
| vmstat.htlb_buddy_alloc_success | empty | |||||
| vmstat.htlb_buddy_alloc_fail | empty | |||||
| vmstat.unevictable_pgs_culled | empty | |||||
| vmstat.unevictable_pgs_scanned | empty | |||||
| vmstat.unevictable_pgs_rescued | empty | |||||
| vmstat.unevictable_pgs_mlocked | empty | |||||
| vmstat.unevictable_pgs_munlocked | empty | |||||
| vmstat.unevictable_pgs_cleared | empty | |||||
| vmstat.unevictable_pgs_stranded | empty | |||||
| vmstat.thp_fault_alloc | empty | The number of huge pages is successfully allocated to handle a page fault. | count | |||
| vmstat.thp_fault_fallback | empty | The number of page fault fails to allocate a huge page and instead falls back to using small pages. | count | |||
| vmstat.thp_collapse_alloc | empty | The number of collapse of a range of pages into one huge page and then successfull allocation of a new huge page to store the data. | count | |||
| vmstat.thp_collapse_alloc_failed | empty | The number of collapse of a range of pages into one huge page but failed the allocation. | count | |||
| vmstat.thp_split | empty | The number of split of a huge page into base pages | count | |||
| vmstat.thp_zero_page_alloc | empty | The number of successful allocation of huge zero page | count | |||
| vmstat.thp_zero_page_alloc_failed | empty | The number of times the kernel failed to allocate huge zero page and falls back to using small pages. | count | |||
| vmstat.balloon_inflate | empty | |||||
| vmstat.balloon_deflate | empty | |||||
| vmstat.balloon_migrate | empty | |||||
| os.linux.metrics-process-status | proc.<process>.VmSize | process-name | The total amount of virtual memory used by the process | KB | ||
| proc.<process>.VmRSS | process-name | The non-swapped physical memory a process has used | KB | |||
| proc.<process>.VmSwap | process-name | The total amount of swap space used. | KB |
Linux network monitoring checks
注:
When upgrading from an earlier version, manually add the checks in this table to the
Linux metrics policy.
| Type | Check | Description | Usage and usage example | Metrics collected | Featured Metric |
|---|---|---|---|---|---|
| Metric | os.linux.metrics-network-interface | Retrieves all network interface related metrics for Linux servers. | Usage:
Usage example: |
|
yes |
| Metric | os.linux.metrics-netstat-tcp | Retrieves metrics on TCP socket states from netstat. Useful on high-traffic web or proxy servers with large numbers of short-lived TCP connections coming and going. | Usage:
Usage example: |
|
no |