Problem Escalation setup issue

Started by sharpspro, August 17, 2015, 02:07:54 AM

Previous topic - Next topic

sharpspro

I setup the Problem Escalation per this youtube video - https://www.youtube.com/watch?v=KVV6R1RwKjc

In my test scenario when the service goes down it alerts Support after 1 minute and if not acknowledged after 1 hour is will alert the manager.

I got this to work just fine but the only problem is that the support team is not notified when the service is back up. It will terminate the alarm when it receives the APP_NORMAL event but does not notify support unless I create a server action in the "APP back to normal" event process. The issue with that is if a service goes down and is back up within the 1 minute it will still send out the alarm to support that it is "APP Back to Normal"  when they have not been notified its even down yet.

I attached a screen shot.



multix

#1
Hi. I don't know if this is the  easiest way but I can offer you something: (step by step)


1. Create a new event as INFORM_APP_NORMAL. (note this event's id)
2. Create a Script and name it minutecheck
    Script should be as this :



a1=$node->id;
a2=d2x(a1,8);
a2="EO_SERVICE_0x" . a2;

alarm=FindAlarmByKey(a2);

if (alarm->severity>1) // Here, we are checking if alert's severity is bigger then minor.
{
PostEvent($node, INFORM_APP_NORMAL event's id, null, "param1", "param2","param3"); // if you want to send parameters to event, you can use      param1 ..... param99 as parameter..     INFORM_APP_NORMAL event's id must be numeric ID of INFORM_APP_NORMAL event.

}

alarm->terminate(); //we will terminate alarm in all case with this script. So we won't need terminate alarm in event processing policy.




3. save the script.
4. Create a new action and tell this action to Execute NXSL Script. and name it as you want (you can use minute check again). and write script's name (in this example minutecheck).
5. In Event Processing Policy section, edit EO_SERVICE_NORMAL policy.
    You must choose "Do not change Alarms" in alarm section and In server actions section, remove Notify Suppor Analyst and add "minutecheck" action. So there will be only one action in EO_SERVICE_NORMAL event process policy.
6. Create a new Event Processing Policy for the event that you created (in this example INFORM_APP_NORMAL event).
    In this event processing policy, there will be only one thing.  "Notify Suppor Analyst ".

Thats all I think.

sharpspro

Thanks a lot for the info. Unfortunately have not tried it yet due to my schedule. I just wanted to thank you for your input and I will update this thread with my results.

sharpspro

#3
I tried your suggestions and ran into a couple of issues.

1. After waiting the minute an email is sent out. When the service come back up and alert is sent and alarm is cleared like it is supposed to BUT email alert says
  • CLIENTSERVER01: param2[/color] and not the normal info i would normally get.

    2. When the service goes down and goes back up before the 1 minute email alert is sent. Another email is sent showing
    • CLIENTSERVER01: param2[/color] even though an alert has not even been sent showing it is down.

multix

can you send event processing policy section and threshold section of your dcis please.

sharpspro

Here you go. Thanks a lot for your time

sharpspro

Also getting script error Alarm. Pls view attached screen shot.