
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Using Metrics, next to Logs, is an important part of building AIOPS capabilities. With ServiceNow ITOM Health or AIOPS, anomaly detection can play an important role in driving operational efficiencies by reducing MTTD, MTTR.
Metrics can come from many sources. If you are using Microsoft Azure as a cloud provider getting Azure Monitor Metrics into ServiceNow in an effective manner is important. This can be accomplished a new capability to use Agent Client Collector in proxy and multi CI mode to collect metrics from Azure Batch API. Now 20 metrics for 50 CI's can be collected in one API call greatly improving the efficiency of the mechanism so significantly less ACC proxy agents are required to collect the Azure Monitor Metrics.
This is a guide to setting up Agent Client Collector (ACC-M) for gathering Metrics from Microsoft Azure Batch API.
Currently this feature is available for Azure Virtual Machines, Storage Accounts, Load Balancers, Application Gateway, Redis.
Prerequisites
- ServiceNow instance with ITOM Health or AIOPS installed
-
Minimum Agent Client Collector Monitoring 3.10.4 (There are improvement in later versions)
- Service Operation Workspace installed
- MID Server is setup
- Azure Cloud Discovery has been run (We need to know for what Azure resources to collect metrics)
- Agent Client collector running on a Linux VM (To follow this guide)
Setup
To setup ACC Proxy agent to collect Azure metrics from the Azure Batch API the following steps need to be completed:
- Setup MID (Not included in this guide but see Reference below for some info)
- Setup Cloud Discovery (Not included in this guide but see Reference below for some info)
- Configure MID for ACC and Metric Intelligence (Not included in this guide but see Reference below for some info)
- Setup ACC on a VM (Not included in this guide but see Reference below for some info)
- Setup ACC Policy for Azure Batch API collection
- View results
Azure credentials
In order for the ACC to read from the Azure Batch API it needs to have the right credentials. You may need to get these from your Cloud team or if you already have Azure Discovery running they may already be present.
Setup a Discovery->Credentials of the Type “Azure Service Principal”.
Cloud provisioning and Governance-> Service Accounts
Currently there is no exact specification of the Role required in Azure for reading Metric data from the Batch API. I have used the Reader role.
Locating ACC Check for Metrics in Azure Monitor via Batch API
Review the check that runs: Agent Client Collector-> Check Definitions
Open the “Azure Metrics Collector” and review the check that runs on the ACC.
List of Policies
Agent Client Collector-> Policies
Added column “Multi-CI mode”. Note how Policy “Azure VM Metrics is running on Proxy agent “Agent_AccLinuxMetricsClient”.
CI types have their own policy because Azure Batch API can only be called with one CI type.
Running ACC-M in proxy mode
When ACC is running in proxy mode it can use most of the resources of this VM for processing. This is different from ACC-M running checks for the local VM where resource consumption must be minimised. See the first comment in the blue box below:
Assigning the Policy to the Agent
To assign the Policy to a Proxy Agent, “Edit in Sandbox” and assign the ACC of your choice. At some point the agent should show up in “Agents”.
Now that the Policy is assigned to an ACC to run it we will configure what Azure VM’s we will collect Metrics for.
If we Save and Activate and Publish the Policy it will start executing. Let's see the results
Please give a few minutes for the first results will show up.
Service Operations Workspace-> AIOPS Dashboard ->Microsoft Azure Monitoring
Success, Nice!
Now how to drive anomaly detections from metrics is for another blog.
The rest of the document contains a lot of details, if you just wanted to get it running feel free to stop reading.
Deeper dive
High level design
The diagram details how the ServiceNow instance, MID Server, ACC and Azure components interact.
During the configuration 2 configuration files are created, a list of resource to collect metrics for, and a list of Metrics to collect for those resources. They play an important role, see more details below.
Azure and Metrics
Both Azure and ServiceNow have agents. ServiceNow’s ACC-M agent can collect Guest OS metrics. Azure by default collects Metrics on VM Instance level.
Azure Monitor Agent:
https://learn.microsoft.com/nl-nl/azure/azure-monitor/agents/agents-overview
ACC M |
Local Guest OS Metric collection by ACC checks into ServiceNow platform via MID Server |
Azure Monitor Agent |
Local Guest OS Metric collection into Azure Monitor Metrics database |
ACC-M Proxy |
ACC read the metric data from Azure monitor on a CI by CI basis
|
ACC-M Proxy with Batch API support |
ACC read the metric data from Azure monitor via Azure Batch API. 50 CI’s with 20 Metrics max per API call.
|
Azure Monitor Agent:
https://learn.microsoft.com/nl-nl/azure/azure-monitor/agents/agents-overview
Azure Batch API documentation
https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/metrics-index
Note: Azure documentation does not specify which Metrics are supported by Azure Batch API. You need to call the Azure Batch API (Postman or Curl) for a Azure object and Metrics and see if it supports Batch API access.
Note 2: This functionality can quite easily extended for other Azure services provided they have Batch API support for their metrics. See the config files that need to be created for a new service below.
Configuration files
Two files are being constructed by the check and passed to the proxy agent to be used as input for the running check. Acc_azure_statis_vm_config.json contains the Metric being collected. The file starting with AzureVMMetrics_.... contains the objects (VM’s for this check) that are dynamically being added based on the filter in the policy. See examples below:
The files below can be changed manually if more/less/different metrics need to be collected. Also to support other Azure objects beyond the currently supported out of the box. A copy can be made, and the Check can be pointed toward the updated or other copy if appropriate.
The file below is dynamically created (And edited slightly for easy viewing):
The Check Instance is the place where the static config file and the credentials are configured to the Policy. Check parameters:
Check Secure Parameters:
Troubleshooting
Debugging and well-known failure modes
- Customer can enable the azure-metrics-collector check log by updating the check instance configuration inside each Azure policy.
The command should be updated to use the following flag:
azure-metrics-collector --nolog=false
- The check log is written directly to the agent log customer can collect it from the instance by grab agent logs
Possible points of failures:
- Wrong Azure Credentials (Fix: Get credentials that allow reading the metrics)
- No CI’s in CMDB (Fix: Setup Discovery)
- Multi CI mode configuration file script is failing
- Metrics configuration file corrupted format or incorrect metrics definition inside (unsupported metrics for Azure Batch API)
- Your credentials covers only partial resources (Fix: duplicate the policy and assign different credentials per policy and filter resources that are relevant only for the provided credentials)
Important things to know
- CI’s need to be in the CMDB otherwise metric collection will not occur. Discovery needs to be run regularly. (Cloud Discovery + Event based discovery or Service Graph connector)
- Not all resources in Azure supports Batch API. Currently not documented by Microsoft. (Find out by trial and error)
- In ACC Proxy mode the assumption is ACC can use all resources on the host.
- One Batch API call returns multiple metrics (max 20) for multiple CI (max 50), but all CI's need to be of the same type (that how Azure batch API works).
- Azure allows a API request only in for 1 Azure Metric location. If the CI’s are in 3 locations that will result in 3 Batch API queries.
- Subscriptions are not relevant.
- Credentials are needed for initial setup (Tennant ID, Client ID, Secret key).
- A Mid Server is required as it play an important role in Metric Intelligence data pipeline
Running in local mode
It is possible to run the azure-metric-collector in local mode for debugging purposes when you log into the Linux VM running the ACC checks.
Load the 2 configuration files from the ServiceNow instance (Agent Client Collector->Configuration Files) and put them in a directory config-files. The directory needs to be created. Sample content of these files can be found a few pages back.
The screenshot below shows how to run the azure-metric-collector in local mode.
Content of the run.sh file for easy copy and paste
#!/bin/bash
export AZURE_TENANT_ID='xxxxxxxx-9979-491e-8683-d8ced0850bad'
export AZURE_CLIENT_ID='xxxxxxxx-244c-4615-b73b-25967c0ded29'
export AZURE_CLIENT_SECRET='xxxxxxxxxxxxxxxxxxxxxxxxxxx'
/var/cache/servicenow/agent-client-collector/monitoring-plugin-azure-metrics-collector/bin/azure-metrics-collector --local -l info -c acc_azure_static_vm_config.json -r AzureVMMetrics_RL.json -i 60 -w 120
An example run with log level “info” can be seen below
Reference
Documentation links to setup MID, Cloud Discovery, ACC and Metric intelligence
I recommend to setup a Windows MID server (2 cpu 4 Gb) and a Linux ACC client (1 cpu 1 Gb) for functional testing.
Setup Windows MID server:
https://docs.servicenow.com/bundle/washingtondc-servicenow-platform/page/product/mid-server/concept/...
Create a Windows service account with "Log on as Service":
https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0867669
Setup Agent Client Collector
Do not forget to open the inbound firewall port on the Windows MID server. For example:
It is sufficient to have ACC being run with basic discovery mode. If the ACC has reported itself correctly then you can move on. See Agent Health Dashboard.
A performance test has been conducted with the following results.
Host OS - Linux
Host Spec - Ubuntu OS 20, 8 CPUs, 16GB RAM
Test Duration - 24 hrs
Policy Name- Azure VM Metrics (Linux Proxy Agent)
Check Name - Azure Metrics Collector
Number of VMs - 10K
Number of checks - 1
Number of metrics / minute - 350K
Network utilization – Tx (Agent -> MID) 16 MB/s
Network utilization – Rx (Agent <- MID) 18 MB/s
Memory consumption ~180 MB
CPU of all checks 0%
Process CPU utilization ~70%
Host CPU utilization ~95%
Check help page
./azure-metrics-collector -h
A tool to collect Azure metrics and forward them to the acc agent
Usage:
azure-metrics-collector [flags]
Flags:
-g, --agg string The list of aggregation types (comma separated) to retrieve. Examples: average,minimum,maximum,count,total (default "average")
-h, --help help for azure-metrics-collector
-i, --interval int Interval between metric collections in seconds (default 60)
-l, --ll string Provide log level. Possible values: debug, info, warn, error, fatal, trace (default "info")
--local Local mode. If true, Credentials will be collected from environment variables. If false, credentials will be collected from stdin
-c, --mc string Name of the config file contains namespace and list of metrics (default "acc_azure_static_config.json")
-p, --mp string metric prefix to be added to the metric name
-m, --mpr int Max number of metrics per request (Azure Default: 20) (default 20)
--nolog Skip logging to ACC. If true, the logs will not be sent to ACC. If false, the logs will be sent to ACC, default is true (default true)
-n, --npr int Number of parallel requests to Azure API (default 100)
-r, --rc string Name of the config file with the list of resources to collect metrics for (default "acc_azure_check_config.json")
-s, --sci int Sync resources config file interval in seconds (default 60)
--scv Skip certificate validation. If true, the certificate validation will be skipped. If false, the certificate validation will be enabled
-w, --sw int Sliding window in seconds to collect metrics from Azure Monitor (default 1)
-t, --timeout int Max number of seconds to wait for a response from Azure API (default 30)
- 2,150 Views
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.