Guide: Integrate OpsGenie and NetXMS

Started by graeChris, November 15, 2023, 09:45:56 PM

Previous topic - Next topic

graeChris

Hello,
Through extensive setup and testing, I have been able to implement an integration with OpsGenie using NetXMS server actions and the OpsGenie API. This is a one-way integration currently and I may work on a two-way integration in the future.

Key Components:
  • OpsGenie API
  • Custom Scripts in Script Library
  • Event Processing Policy Rules

The most basic functionality using this would be to create an OpsGenie alert when a node is down. This guide will walk you through setting up functionality to create, acknowledge, and close OpsGenie alerts from inside the NetXMS client. I won't go through the steps necessary to set up and opsgenie API integration but the link to the official guide for that will be posted below.

OpsGenie API Setup
You can obtain an API key by following the instructions here:  https://support.atlassian.com/opsgenie/docs/create-a-default-api-integration/
Additional API Docs can be found here: https://docs.opsgenie.com/docs/alert-api

Create Alert
To create an alert in OpsGenie we will need to set up an event processing policy that will call a server action.

Server Action:
Name: OpsGenie Create Alert
Type: Execute command on management server
Options: None
C:\\Windows\\System32\\curl.exe -v -X POST https://api.opsgenie.com/v2/alerts -H "Content-Type: application/json" -H "Authorization: GenieKey XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" -d "{ \\"message\\": \\"%Z - %m - %n\\", \\"alias\\": \\"%N%g\\", \\"description\\":\\"Ip Address: %a \\n \\n GUID: %g \\n \\n Severity: %S \\n \\n Event Name: %N \\n Event Source Object Name: %n \\n Time: %t \\n Zone Name: %Z\\", \\"entity\\":\\"%n\\", \\"priority\\":\\"%[ConvertSeverity]\\" }"
Important Notes:
·        Message will be the title of the OpsGenie Alert. In this example we have the Zone, the Event Name, and the Object Name for easy identification when receiving mobile alerts.
·        You MUST use \\ to escape the characters in the message or the request will not submit properly.
·        To be able to acknowledge and close OpsGenie alerts from inside NetXMS, you MUST use an alias that can be referenced. We will use %N%g for our alias. This will give us a unique Alias of EventName+GUID of the source object.
·        In multi-tenant MSP scenarios you can set "Zone Name: %Z" inside the command in order to route the alerts to the appropriate OpsGenie teams.
Macros used:
    ·        %n - Name of event source object. Name of interface when interface name is generated using macros.
    ·        %Z - Zone name of event source object.
    ·        %t - Event's timestamp is a form day-month-year hour:minute:second.
    ·        %g - Globally unique identifier (GUID) of event source object.
    ·        %N - Event's name.
    ·        %a - IP address of event source object.
    ·        %S - Event's severity code as text.

You can use additional macros for event processing: https://www.netxms.org/documentation/adminguide/event-processing.html#macros-for-event-processing

Custom Scripts:
This script is necessary because the severity codes for NetXMS are the opposite of OpsGenie. You can create this script in the script library.

Script ( ConvertSeverity )

sub main()
{
 prioritylevel = "P5";
   switch($event->severity)
   {
   case 0:
        prioritylevel = "P5";
        break;
   case 1:
        prioritylevel = "P4";
        break;
   case 2:
        prioritylevel = "P3";
        break;
   case 3:
        prioritylevel = "P2";
        break;
   case 4:
        prioritylevel = "P1";
        break;
   };
   return prioritylevel;
}





graeChris

#1
Event Processing Policy

Conditions:
Event Codes:
·        SYS_NODE_DOWN
Filtering Script:

if (IsAlerted@$node == "Yes" )
{
    return true;
}
else
{
     return false;
}


The filtering script will check for a Custom Attribute on the node called IsAlerted. If the Custom Attribute does not exist you may see script error alarms. This filter is optional and is used to prevent notification spam. You can set the attribute to "No" for your entire network and then set the attribute to "Yes" for the nodes you wish to be alerted.

Actions:
We will assign two actions to this Event Processing Policy. The first action will be to create an alarm in NetXMS. The second action will be to execute our server action called OpsGenie Create Alert.
Alarm configuration
    ·        Message: %m
    ·        Alarm key: NODE_DOWN_%i
    ·        Alarm severity: From event
    ·        Alarm timeout: 0
Comments:  Create Alarm and OpsGenie Alert when Node is Down

Make sure to save your EPP Rules.

graeChris

#2
To Acknowledge or close your OpsGenie Alert we have to modify a built in script to post new events.

Script Name: Hook::AlarmStateChange
Code:

sub main()
{
 eventserver = FindNodeObject($node, 100);
 global alarmstate = $alarm->state;
 global sourceAlarm = $alarm->id;
 global eventname = $alarm->eventName;
 global sourceobj = FindObject($alarm->sourceObject);
 global nameobj = sourceobj->guid;
 

 switch(alarmstate)
 {
/* Alarm State is Outstanding */
     case "0":
         break;
/* Alarm State is Acknowledged */
     case "1":
         PostEvent(eventserver, "Xms_Alarm_Ack","ACKALARM", eventname . nameobj);
         break;
/* Alarm State is Resolved */
     case "2":
         PostEvent(eventserver, "Xms_Alarm_Resolve","RSLVALARM", eventname . nameobj);
         break;
/* Alarm State is Sticky Acknowledged */
     case "17":
         PostEvent(eventserver, "Xms_Alarm_Ack","SACKALARM", eventname . nameobj);
         break;
 }
}


We also have to create these events in Configuration->Event templates

Alarms Needed:
·        Xms_Alarm_Ack
    o    Message: ALARM ACK - Alarm: %1
    o    Write to log: TRUE
    o    Description: Event created when an Alarm is acknowledged
    o    Severity: Normal
·        Xms_Alarm_StickyAck
    o    Message: ALARM STICKY_ACK - Alarm: %1
    o    Write to log: TRUE
    o    Description: Event created when an Alarm is sticky acknowledged
    o    Severity: Normal
·        Xms_Alarm_Resolve
    o    Message: ALARM RESOLVED - Alarm: %1
    o    Write to log: TRUE
    o    Description: Event created when an Alarm has been resolved
    o    Severity: Normal

graeChris

Acknowledge Alert
When an alarm is acknowledged or sticky acknowledged in NetXMS, send an API call via curl to Opsgenie to acknowledge the alert.

Server Action:
Name: OpsGenie Acknowledge Alarm
Type:  Execute command on management server
Command:
C:\\Windows\\System32\\curl.exe -v -X POST https://api.opsgenie.com/v2/alerts/%1/acknowledge\?identifierType=alias -H "Content-Type: application/json" -H "Authorization: GenieKey XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" -d "{ \\"user\\": \\"netXMS Admin\\", \\"source\\": \\"netXMS\\", \\"note\\":\\"Acknowledged via Alert API from netXMS \\n Alarm ID:%1 \\n Time: %t\\" }"
 
Macros Used:
     ·         %1 – Alarm Source Event Name + GUID
     ·         %t - %t - Event's timestamp is a form day-month-year hour:minute:second.
 
Event Processing Policy Rule
Conditions:
 Event Code:
     ·         Xms_Alarm_Ack
     ·         Xms_Alarm_StickyAck
Actions:
 Execute Server Action called Opsgenie Acknowledge Alarm

Comments: Acknowledge OpsGenie alert when NetXMS alarm is acknowledged


Close Alert
When an alarm is resolved in NetXMS, send an API call via curl to Opsgenie to close the alert.
Server Action:

Name: OpsGenie Close Alert
Type:  Execute command on management server
Command:
C:\\Windows\\System32\\curl.exe -v -X POST https://api.opsgenie.com/v2/alerts/%1/close\?identifierType=alias -H "Content-Type: application/json" -H "Authorization: GenieKey XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" -d "{ \\"user\\": \\"netXMS Admin\\", \\"source\\": \\"netXMS\\", \\"note\\":\\"Alert Closed via Alert API from netXMS \\n Alarm ID:%1 \\n Time: %t\\" }"
Macros Used:
·         %1 – Alarm Source Event Name + GUID
·         %t - %t - Event's timestamp is a form day-month-year hour:minute:second.
 
Event Processing Policy Rule
Conditions:
            Event Code:
     ·         Xms_Alarm_Resolve

Actions:
Execute Server Action called Opsgenie Close Alert

Comments: Close OpsGenie alerts for resolved Alarms
 


Known issues/limitations:
·         Alert status change in OpsGenie does not change alarm status in NetXMS.
·         Custom OpsGenie routing rules may cause priority to be assigned improperly if priority tag is not used for severity calculation in OpsGenie.
 
Notes:
Creating additional custom attributes and filtering scripts can give better control over what events result in OpsGenie notifications. For example, servers and switches may be a higher priority than desktops or laptops.

graeChris

I'll be updating this using correct syntax for NetXMS v5.0.