Exploring Event Management

  • Release version: Xanadu
  • Updated August 1, 2024
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Exploring Event Management

    Event Management is a feature designed for comprehensive monitoring, analysis, and remediation of IT issues within an IT environment. It manages various components, including discovered services, application services, dynamic CI groups, and alert groups, enabling effective IT issue resolution.

    Show full answer Show less

    Key Features

    • Discovered Services: Identified through Service Mapping, these services provide a service map with relationships, outage severity, active alerts, and CI properties displayed on dashboards.
    • Application Services: Created by selecting specific CIs for targeted monitoring and management.
    • Dynamic CI Groups: Collections of CIs based on shared criteria, simplifying management and populating application services.
    • Alert Groups: Organizes alerts for efficient response to IT issues.
    • Process Flow: External events are received, generating alerts based on rules, stored in the Event table, and enriched for analysis.

    User Roles

    • Admin (evtmgmtadmin): Configures Event Management properties and rules; exercise caution as this role can modify global scripts.
    • Operator (evtmgmtoperator): Manages alerts by closing and acknowledging them.
    • User (evtmgmtuser): Handles the lifecycle of alerts with basic operations like viewing and acknowledging.

    Key Outcomes

    • Rapid Issue Detection: Quickly identifies potential IT issues.
    • Efficient Alert Handling: Streamlines management through alert aggregation and correlation.
    • Automated Actions: Initiates remediation processes to hasten issue resolution.
    • Comprehensive Monitoring: Integrates with multiple tools for a holistic view.
    • Root Cause Analysis: Tools available for identifying underlying issues.
    • Customizable Rules: Allows tailoring of event and alert management to meet specific needs.
    • Reduced Downtime: Minimizes system downtime through prompt resolutions.
    • Enhanced Visibility: Real-time dashboards provide insights into system health.
    • Cost Efficiency: Reduces operational costs by preventing prolonged issues.

    Explore Event Management to understand its overview, process flow, user roles, and benefits for comprehensive IT issue monitoring and resolution.

    Event Management provides comprehensive monitoring, analysis, and remediation of IT issues by effectively managing various components within an IT environment. These components include discovered services, application services, dynamic CI groups, and alert groups.
    • Discovered Services: Defined by interrelated Configuration Items (CIs) from the CMDB, a discovered service is identified through Service Mapping. It includes a service map with mapping relationships, an impact tree showing outage severity, active and related alerts, and CI properties. This service information is displayed on dashboards, the Alerts list, and the Events list.
    • Application Services: Created by selecting specific CIs, application services allow for targeted monitoring and management. For more details, refer to the Application Services documentation.
    • Dynamic CI Groups: These are collections of CIs grouped based on shared criteria, such as location. Dynamic CI groups help populate application services, simplifying management.
    • Alert Groups: Alert groups organize sets of alerts to streamline maintenance and management, making it easier to respond to IT issues efficiently.

    Process flow

    Event Management receives external events and generates alerts based on predefined rules. The MID Server polls external event tracking tools and sends data to Event Management for storage and processing. Events are stored in the Event [em_event] table, and alerts are created by matching event rules. Alerts are then transformed and enriched with additional content, accumulated if thresholds are met, and mapped to specific fields. The system searches for matching message keys to update existing alerts or create new ones, associating related events under a single alert. Alerts are bound to specific Configuration Items (CIs) for root cause analysis. For more information, see Event Management process flow.

    Users

    Role title [name] Description
    Admin

    [evt_mgmt_admin]

    Configures and sets up Event Management properties and rules.
    Note:
    Exercise caution with the evt_mgmt_admin role, as it can be elevated to the admin role. A user with the evt_mgmt_admin role has the ability to add and modify scripts that run on a global scope. Ensure proper access control. With this role, the user can create and/or update the following scripts:
    • Alert correlation rules
    • Alert management rules
    • Maintenance rules
    • Advanced scripts
    • Event field mapping
    • Pre- and post-binding scripts
    Operator

    [evt_mgmt_operator]

    Manages alerts, including closing and acknowledging them.
    User

    [evt_mgmt_user]

    Manages the lifecycle of alerts, including performing basic operations such as viewing and acknowledging them.

    Benefits

    • Rapid Issue Detection: Quickly identifies and highlights potential IT issues.
    • Efficient Alert Handling: Aggregates and correlates alerts for streamlined management.
    • Automated Actions: Initiates automatic remediation processes to speed up issue resolution.
    • Comprehensive Monitoring: Integrates with multiple tools for a complete system overview.
    • Root Cause Analysis: Offers tools to identify underlying causes of issues.
    • Customizable Rules: Tailors event and alert management rules to specific needs.
    • Reduced Downtime: Minimizes system downtime with prompt problem resolution.
    • Enhanced Visibility: Real-time dashboards offer insights into system health.
    • Cost Efficiency: Lowers operational costs by preventing prolonged issues.