Alerts based on custom attribute, is this right? And how to set repeat interval?

Started by Millenium7, February 13, 2020, 06:17:55 AM

Previous topic - Next topic

Millenium7

I want to create some additional alerting based on custom attributes
i.e. an example right now is I have a container, i've assigned a custom attribute to it "UpDownCheck" and set the value to true, and inheritable. So all nodes inside this container have that custom attribute
I want any nodes with that custom attribute to send a SLACK alert. So what i've done so far is this

Event Processing Policy -> New Event

Condition->Events = SYS_NODE_DOWN
Filtering Script = return GetCustomAttribute($node, "UpDownCheck");
Server Actions = SLACK Notification

I 'think' this is correct, any node that is 'down' and has that custom attribute should result in 'true' and the slack notification posted right?
But what I need is for this to re-trigger every 2 hours, so if the node is still down it again sends a new SLACK alert

As far as I can tell, in order to have it repeat the event I would have to create a DCI, then set a threshold, and in the threshold I can set the 'repeat event' timer to 7200 seconds
But theres a problem, I can't see any way in a DCI to actually check for 'node down'? I can use DCI Internal->Status but theres no status code for 'down'
0 = Normal
1 = Warning
2 = Minor
3 = Major
4 = Critical
5 = Unknown
6 = Unmanaged
7 = Disabled
8 = Testing
Node down is 'critical/4' but other events can also make it code 4 but the node is still up, so i'll get a false alert. I can't use that. I can't see anything to poll for nodes 'up/down' status

So whats the best method to go about this?

Millenium7

Bump

Any update to this? Specifically for 'node down' as I still can't find a way to do that outside of the built in 'node-down' event

I've managed to get alerts based on custom attributes working for other things such as SNR or Ethernet speed
I.e. in the case of ethernet speed, I want alerts on any core/distribution equipment in our network running at 100mbit as 99% of the time that will indicate a wire has gone bad. However that other 1% are ones I don't care about, edge networks with no more than i.e. 20mbit/s of bandwidth. Or something we inherited from a company acquisition. So it doesn't matter, and I want to suppress the alerts

So I just create my DCI's as I would. I set up thresholds like so...

DCI - Ethernet Speed (reported in megabits/sec)
Thresholds
1) script
if ($1 < GetCustomAttribute($node, "Target_EthSpeed")) return true
activation event: slack alert (with relevant details)
repeat every 86400 seconds
2) script
if ($1 >= GetCustomAttribute($node, "Target_EthSpeed")) return true;
activation event: sys_threshold_rearmed
repeat every 86400 seconds
3) last polled value
<1000
activation event: send slack alert (with relevant details)
repeat every 86400 seconds

If the custom attribute of 'Target_EthSpeed' exists, the first 2 are checked, and one of them will always trigger. The second one means it will never check the third one. So I can add Target_EthSpeed of 100 to a node
If the custom attribute doesn't exist, it ignores those first 2 conditions entirely and will use the third, in which case if its less than 1000mbit/s it will still trigger an alert

If there is a cleaner and more appropriate method i'm all ears. Either way I still need 'node down' alerts setup. As far as I can tell there is no DCI for 'node down' only 'node status' which isn't appropriate as the status code is shared with other events