Threshold generating alarm, was working, now not working

Started by Millenium7, December 06, 2019, 07:09:20 AM

Previous topic - Next topic

Millenium7

I don't know how to troubleshoot this. I did have alarms working for a particular DCI but now they are not.
The DCI shows its over a threshold on the 'last values' page, but no alarm is ever generated anymore
Maybe they are stuck in some state and not allowing new alarms to be generated for the DCI (don't know, and don't know how to check this)

Here's the way i've set it up





Sch.Donat

Hi!

From the thresholds configuration I assume your Resolve Temperatue Alert should clear the alarm, if this is the case, then you should filter for it in your Terminate temperature alert alarms EPP rule.

Best regards,
Sch. Donat

Millenium7

I tried disabling the event processing rule that is associated with the 'clear temperature alert' and still i don't get alarms

Note that I actually do get Slack notifications of high temperature. So it is triggering, its just not adding itself as an alarm to the node
I'm wondering if maybe there's some internal timer thats stuck related to 'repeat interval' and its not ever repeating, thus not generating the alarm again. I wonder if there's a place I can see all pending timers and clear them if thats the case

Sch.Donat

You can try to recreate the EPP rule, but of course even if that helps, the problem may reoccur. You can check in the alarm log (F8) to see if the alarm was generated and got terminated or didn't even get generated. I can't answer your question about the internal timer, sorry.

Best regards,
Sch.Donat

Millenium7

Ok, so to troubleshoot this i've done 2 things

Lowered the 'repeat interval' to 5 seconds. This seems to override everything. Even if the alarm is terminated it just will not occur again unless that timer is over and done with. So setting it to 5 seconds lets me do a Force DCI Poll on that entry and have the alarm re-trigger. Otherwise if that is set to 86400 (which I do want, but not for testing) then it will not trigger again whatsoever until 86400 seconds / 1 day has passed. Doesn't matter if you terminate/delete the alarm or not. You'd have to delete the node entirely and re-add it, or just lower the timer temporarily

The second thing I did was change the terminate event to a 'resolve' event so it keeps the alarm instead of deleting it. And i'm finding this is triggering immediately and the alarm instantly resolves. When I disable that Event Processing Policy, the alarm stays in its intended state. So there is something wrong
Maybe my Alarm Key syntax is wrong? TBH I find scripting incredibly frustrating and confusing at times due to a lack of simple easy troubleshooting tools. For instance I cannot see what the alarm key is, maybe there's some convoluted way to find it through the console and filtering for it, but I can't just see it in the alarms tab. I'm thinking maybe TEMPALERT_%i_%<dciId> is wrong
TBH I don't know what %i and %<dciId> mean exactly. I don't know if they are the correct variables to use. Maybe this is the problem

What I want to happen is for 1 alarm per temperature alert to be created. Right now I have 5 temperatures I monitor on a node and I want individual alarms to be created for each sensor/DCI rather than having them grouped together (which is what happens if I just use TEMPALERT_%i )
If I need to use a different syntax i'm happy but I don't know what to put. I thought this would be correct, i'm guessing $i is the node and %<dciID> would be the ID number of that particular DCI, therefore something like TEMPALERT_[NodeID]_[DCI] and that would create 1 alarm for each DCI, thus 5 different alarms if all 5 sensors had an issue, right?

Millenium7

/Facepalm

Ok the terminate event processing policy had no events defined. I'm thinking maybe I renamed or deleted the event and recreated it at some point and didn't re-apply it to the terminate EPP. Hence it was 'always' triggering as it had no match criteria. Added and its working
However I still think I needed to enter some number in the 'repeat event' in DCI configuration/threshold for each entry

Anyway i've learned a few things along the way, and its working now