alarm_events table still holds an alert of removed node

Started by jonasc, June 16, 2016, 11:06:47 AM

Previous topic - Next topic

jonasc

Hi!

We had a node with an active critical alert and the alert was listed in the table alarm_events and also shown in the "alarms entire network" tab in the Console.
When I removed the node from the inventory the alert in "alarms entire network" was removed and all was fine.

However, the entry in the alarm_events table still holds the record as if it would be active.
My impression was that alarm_events used to contain only active alerts.

Can anyone shed some light on this?
Thanks in advance


jonasc

New information:

In Alarms [F8] I have one entry more than in [ALT]-[Shift]-[A].

That entry is referring to the source/node that previously was removed.
The extra entry in Alarms [F8] looks like this:

[Critical]   [1526]   Node down

It only references to 1526 instead the source name.
When I click on terminate it goes away.

A thought F8 and Alt-Shift-A would show the same things.

zshnet

Hi there,

Not sure if this is helpful to your issue, but alt-shift-a opens the alarm page for whatever node you've selected while F8 opens the alarm browser for all alarms. My suspicion is you were opening Alt-Shift-A while selecting a container for all items in your network, which would of course not contain the deleted node. Thus, when you click F8, an extra alarm appeared because it shows all alarms, regardless of the node. Also, in my experience, deleting or unmanaging a node does not remove all alarms relating to that node, so before/after deleting it you need to delete all alarms relating to that node.

Hope that helps!
Thanks,
Zach

Tatjana Dubrovica

There is server configuration parameter that defines what to do with events of deleted object: "DeleteEventsOfDeletedObject". By default it is 1(delete them). In case if it is set to 0, then events are not deleted, but you can't find them by node name(as this node is already deleted). You can find them by object id of deleted object. As event source will be displayed object id in square braces(like this: "[100]").

Please check DeleteEventsOfDeletedObject server configuration parameter.

jonasc

Quote from: zshnet on June 20, 2016, 09:13:46 PM
Hi there,

Not sure if this is helpful to your issue, but alt-shift-a opens the alarm page for whatever node you've selected while F8 opens the alarm browser for all alarms. My suspicion is you were opening Alt-Shift-A while selecting a container for all items in your network, which would of course not contain the deleted node. Thus, when you click F8, an extra alarm appeared because it shows all alarms, regardless of the node. Also, in my experience, deleting or unmanaging a node does not remove all alarms relating to that node, so before/after deleting it you need to delete all alarms relating to that node.

Hope that helps!
Thanks,
Zach

I wish it was that simple...
No, I was on the top level Entire Network, right click, Alarms.
By accident I went into F8 (Alarm browser) and found ID 1526 (which I recognized from the alarm_events table).
We have a dashboard where I publish active alerts from alarm_events and we wondered why a deleted node still was active.

zshnet

Yes, that's what I'm saying. Since you deleted the node, it's not in the entire network anymore. You are only searching nodes in your network, when you select "Entire Network." However, I would suggest taking Tatjana's advice and checking on the server configuration, so that it doesn't happen again.

blairmc96

Sorry to reply to an older thread, but this is very much in line with the problem I'm having now.

I have some nodes that had active alarms when I deleted them, and I can clear those alarms out, but they always come back when I restart the core service on the server.

My alarm_events table shows what I see in F8 or ALT+SHIFT+A when I have entire network selected.

I think I've managed to delete the alarms, however, I get email alerts about these when I restart the service.

Is there another table that holds these orphaned alarms?

Thanks!

blairmc96


Tursiops

Can you confirm that DeleteEventsOfDeletedObject is set to 1?
Have you tried shutting NetXMS down and running "nxdbmgr check" on your server?
Just in case your problem is related to inconsistencies in your database.

blairmc96

Quote from: Tursiops on June 27, 2019, 01:33:12 AM
Can you confirm that DeleteEventsOfDeletedObject is set to 1?
Have you tried shutting NetXMS down and running "nxdbmgr check" on your server?
Just in case your problem is related to inconsistencies in your database.

I can confirm that DeleteEventsOfDeletedObject is set to 1.

I stopped the service and ran nxdbmgr check and got the following output.

NetXMS Database Manager Version 2.2.15 Build 9523 (2.2.15.2) (UNICODE)

Checking database (excluding collected data):
* Zone object properties                                               [PASSED]
* Node object properties                                               [PASSED]
* Node to subnet bindings                                              [PASSED]
* Interface object properties                                          [PASSED]
* Interface bindings                                                   [PASSED]
* Network service object properties                                    [PASSED]
* Network service bindings                                             [PASSED]
* Cluster object properties                                            [PASSED]
* Cluster member nodes                                                 [PASSED]
* Template to node mapping                                             [PASSED]
* Object properties                                                    [PASSED]
* Container membership                                                 [  94% ]
Container 2446 contains non-existing child 2455. Fix it? (Yes/No/All/Skip) A
* Container membership                                                 [FIXED ]
* Event processing policy                                              [PASSED]
* Network map links                                                    [PASSED]
* Data tables                                                          [PASSED]
* Raw DCI values table                                                 [  24% ]
Found raw value record for non-existing DCI [3458]. Delete it? (Yes/No/All/Skip) A
* Raw DCI values table                                                 [  61% ]
Found raw value record for non-existing DCI [5684]. Delete it? (Yes/No/All/Skip) Y
* Raw DCI values table                                                 [FIXED ]
* DCI thresholds                                                       [  46% ]
Found threshold configuration for non-existing DCI [3458]. Delete? (Yes/No/All/Skip) A
* DCI thresholds                                                       [FIXED ]
* Table DCI thresholds                                                 [PASSED]
4 errors was found, 4 errors was corrected
All errors in database was fixed
Commit changes? (Yes/No) Y
Committing changes...
Changes was successfully committed to database
Database check completed

C:\NetXMS\bin>


I ran the check again and it found no issues.  I then restarted the service, but still got the alert emails.  However, they all are just telling me these nodes are up, and just a few specific ones, not all of them.

I can understand if there was an active alarm, like a node down, and I did get one like that - but my node is actually down.  The others are all up with no issues, and it just decides it has to tell me that for these 5 nodes every time I restart the service.

Any help is appreciated!