Don't create/update alarm when there's an outstanding alarm

Started by doki, February 24, 2015, 05:27:34 AM

Previous topic - Next topic

doki

Hi,

I currently monitoring our 71 Aruba AP's, the AP's is controlled by a master aruba controller. I manage to get the total no of AP in the controller via SNMP and create an alarm and email Alert me if the total AP is not equal to 71. And resolve this alarm and alert me (email) if total AP is equal to 71. This works well so far.

The problem I encounter now is if I have outstanding alarm (ex 1 AP down) and then another 1 AP just went down with outstanding alarm is not resolved. It will not create another alarm. Data collection detects the total AP down is 2 now but no alarm generated since the first alarm is still outstanding.

is there any way i can resolved it? thanks in advance!

Victor Kirhenshtein

Hi,

you can set threshold not on absolute value, but on difference with previous value, and generate event if it is less then 0 (i.e, some APs goes down). Then you can achieve what you want with two thresholds: diff threshold will generate new alarms on each change, and recover on next poll - but you should ignore recover event. Another threshold could be set as now, but used only for recover event.

Best regards,
Victor



doki

thanks victor for the reply. now i have an idea on how to use Diff from previous now:) i can use it on my other monitoring.

If my understanding is correct, this is what you mean? please refer to attached picture. this resolved the alert issue if there's an additional AP went down :)

Another problem I'm thinking is I will return the "$1" in my email alert (e.i. Total of 1 AP(s) is down, Please login to aruba controller for more info). In this new config I will got negative value? or I will just create a transformation script?

In my existing config I just create a conversion script:

sub main()
{
  if ($1 >= 71)
     return 71;    //no event will be generated
     
  else
     return (71 - $1);   //event will be generated and return the no. of AP went down.
}

My threshold would be last(1) != 71 will generate alarm and returned diff of total AP and current AP total

my event/email alert:
Total of the actual value"%4" AP(s) is down, Please login to aruba controller for more info)



Victor Kirhenshtein

the problem with your transformation script is that it's not linear - it will return 71 when everything is OK and then 1 or more when some APs are down. So diff will actually grow as more APs will go down. I would recommend to call DCI "number of down APs" and return 0 when all APs are up. That way it will be consistent and diff() > 0 will not be triggered when all APs comes back online.

Best regards,
Victor

doki

Thats the best approach but problem is OID for "no of down AP's" was depreciated by Aruba so I monitor the number of online AP instead of down AP and the make a work around transformation script that could return a down AP (diff from known AP which is 71 71). I know there could be a better approach on this so need an expert advise :)

Victor Kirhenshtein

I mean you can monitor OID you are monitoring now, just call it differently and use transformation script to return 0 when all APs are up. Or, you can delete transformation script at all and use script only for adding number of down APs into message (via %[] macro).

Best regards,
Victor

doki