Templates being removed from unreachable/down nodes

Started by Tursiops, September 13, 2016, 09:33:41 AM

Previous topic - Next topic

Tursiops

Hi,

I have encountered this several times now:
- A system has all its DCIs applied via templates. The templates are configured to remove the DCIs if the node no longer matches a certain condition
- The device becomes unreachable for an extended period of time (as in hours or even days). Reasons can be internet connection problems, dead routers, the NetXMS agent not running for whatever reason or a server being shutdown "temporarily".
- Eventually the NetXMS server will remove the templates from the node.

I originally assumed that the server would not remove the templates from an unreachable node for that reason: it is unreachable. It therefore cannot determine if the condition that lead to the template being applied is still valid or not. If I try to run a manual Configuration Poll in such a scenario, it actually tells me it is not going to poll, because the node is unreachable - and then it goes ahead and wipes the template anyway. (That one just happened to me today)

I am not sure how I can configure the templates so they are not neither removed nor assigned when a node is actually unreachable?

Any ideas? Or is this a misconfiguration or bug of sorts?

Thanks

tomaskir

What version of NetXMS are you running?

I remember in old version (pre 1.2.17), auto-bind remove would remove the templates from nodes.
This was however fixed.

Tursiops

We're running 2.0.5.

The templates are configured to be removed if auto-bind no longer applies, which is fine.
The issue is just that if the system is unreachable and NetXMS cannot detect if the template should still apply, it seems to remove it.

I'll see if I can run some more tests to confirm if this happens all the time or if something else is required to trigger this.
Maybe a race condition with an active Configuration Poll just as the system goes down.

The most recent one I ran into was a Cisco which was offline for a weekend (remote, non-critical office, closed over weekend) and basically all data was wiped and started fresh on Monday after it was rebooted.