Node down - but for how long? And high latency notification

Started by millerpaint, January 27, 2014, 08:48:30 PM

Previous topic - Next topic

millerpaint

Greetings,

Can someone explain how I can accomplish two things:

1) I am currently sending node down/up email notifications, which works great.  How can I add the number of hours/minutes/seconds that the node was actually down?  Of course, this would be based on when NetXMS saw the node go down and then come back up.

2) How can I trigger an email notification when a circuit is experiencing higher than normal latency - say over 75ms, for longer than 1 minute?

Any help is appreciated!


-Kevin C.

Victor Kirhenshtein

Hi!

1) On SYS_NODE_DOWN event you can set some custom attribute for the node to current time, executing script like this:


SetCustomAttribute($node, "DowntimeStart", time());


then, you can use macros which call script in notification on SYS_NODE_UP, like this: %[GetDowntimeText]

and script (GetDowntimeText in our example) in script library like this:


return SecondsToUptime(time() - GetCustomAttribute($node, "DowntimeStart"));


2) You can use PING subagent to do regular pings and set threshold for rtt > 75. Ping subagents described here: http://wiki.netxms.org/wiki/Subagent:Ping.

Best regards,
Victor

Percy

hi Millerpaint,

can you please help me to form the DCI for the node down. what to use in DCI Parameter and does it require any script.

thank you

sperlm

There is no DCI for "node down". DCI is just collecting inspected parameter value over time and can have action based on threshold of these values.

Not sure what you need then but for the record, how to recreate "Node Down" discussed in this topic:

- in Script Library create script called "OnNodeDown" - SetCustomAttribute($node, "DowntimeStart", time());
- in Actions Configuration create Action "OnNodeDown" - Type: Execute NXSL Script, Script name: OnNodeDown
- in Event Processing Policy edit node down rule (usually no.1) and add Action - Server Action - OnNodeDown

This is just the part that "writes" exact time when node went down.

To calculate downtime and include it in the SYS_NODE_UP event:
- create a script "DownTime" - return SecondsToUptime(time() - GetCustomAttribute($node, "DowntimeStart"));
- add macro to desired place (action, event...) - %[DownTime]

There can be some errors when calling the DownTime macro and no "DowntimeStart" value exists, but these should not happen eventually (script can be adjusted to handle these events too).

With regards,
Milan

Nikk

Hi,

This script was working for a long time, but now we are receiving error:
"Script (DownTime) execution error: Error 5 in line 1: Invalid operation with NULL value"

Script:
Quotereturn SecondsToUptime(time() - GetCustomAttribute($node, "DowntimeStart"));

Has something changed there, and this script doesn't work anymore?
Server version 2.0.1

Best regards,
Nikk

Victor Kirhenshtein

Hi,

do you have attribute DowntimeStart set on that node? Will it work if you run the following script on a node:


println GetCustomAttribute($node, "DowntimeStart");


via "Execute server script" menu?

Best regards,
Victor

Nikk

Hi Victor,

Yes, that attribute is set and the script works as well. But since that day I haven't received this alarm anymore.
And it seems that it is working fine now.

I guess i will need to set up event for this alarm, and monitor this for a while.

Thank you!

Best regards,
Nikk