scheduling Node Down mail Alerts for different devices

Started by The Duk, June 24, 2009, 03:49:02 PM

Previous topic - Next topic

The Duk

Hi

I'm currently setting up version 0.2.26 on windows 2003. I'm having an issue regarding the System_Node_Down alert. I have about 30 Servers that I will need to be alerted about within 2 min if they go down. I also have about 100 remote sites using checkpoint edge devices, now they are over standard adsl so can go down from time to time, so I'd need to alerted if they were down for lets say 2 hours.

I have set the PollCountForStatusChange to 2 which covers the server, but for my remote sites I haven't been able to stop them alerting after 2 mins. I have looked a DCI's but am unsure what to to.

Is there a way around this? I really don't want to set up 2 instances of the server to monitor both.

Thanks in advance.

Brian

The Duk

SO I really don't have a clue but what I've done so far is:

1. Create a Server and Wan Container.
2. I have removed the Wan Container form the SYS_NODE_DOWN event process (so that I don't get emailed 2 minutes after it goes down).
3. I created a Wan Template with a STATUS DCI which included a threshold that will be activated if after 120 consecutive samples will be greater than 0. this then mail me the the alert.

If anyone can think of a better way of doing this please let me, it's the best I could come up with so far.

Victor Kirhenshtein

Hi!

Probably it's currently better solution, but a bit inaccurate - status can change not only because node is down, but also if one of the interfaces goes down, or if you have active alarm for that node (as a result of threshold violation, for example). You at least should set threshold to "equals 4" instead of "greater than 0", because 4 is status code for CRITICAL status, so at least you will not send notification if node, for example, go to MINOR status due to an alarm.

You can also avoid false positives when node's status is CRITICAL, but it is not down, but in a quite complicated way. You can create situation (in View -> Situations), and update it on each NODE_DOWN and NODE_UP. For example, on NODE_DOWN set "down" attribute to 1, and on NODE_UP, set it to 0. Then, add additional check using script to notification rule, to check stat situation's attribute "down" is set to 1.

Best regards,
Victor