Modifying Templates with lots of nodes and DCIs

Started by Tursiops, July 01, 2016, 07:23:07 AM

Previous topic - Next topic

Tursiops

Hi,

Not quite sure if these figures actually qualify as "lots".

I am trying to modify a template that (now) has 21 DCIs (8 of them are for instances) and is applied to around 300-400 systems at present.
The idea was to remove four DCIs (i.e. it used to have 25 DCIs) as they are not required.

Pretty much as soon as I close the template's DCI configuration, NetXMS doesn't react to anything (throws an error if I want to look at things) and after that the GUI says the server is not responding and disconnects. On the server itself I can see that NetXMS indeed simply crashed.
Restarting NetXMS leaves the template at now 21 DCIs, but the nodes assigned to the template still have all 25. Manually removing them works, but is very tedious.

For testing purposes I changed DebugLog to 6, then simply renamed one of the DCIs in the template (I also tested a different template with a similar number of nodes).

The result is that the logs are full of entries like these:
Node::onDataCollectionChange[..]: executing data collection sync
Node::onDataCollectionChange[..]: executing data collection sync for SNMP proxy [..]
ApplyTemplateThread: template=[..] updateType=0 target=[..] removeDci=false
Apply 21 items from template "[..]" to target "[..]"
Applying DCO "[..]" to target "[..]"

With the last one repeating once for each DCI.

After around 300 "executing data collection sync (for SNMP Proxy)" lines, logging simply stops and the NetXMS process is gone.
It looks like the updates for each node are not put into some queue/pool to be worked through, but the system tries to update everything at once and fails at doing so?


Tursiops

Looks like my issues are related to previously reported bugs regarding server crashes on deleting Instance DCIs and templates applying DCIs twice.
Have updated Bugtracker accordingly. Happy to help out in any way to resolve those.

Prior to finding that, I tried to just start fresh with the template by returning false on the auto-apply script. The result is that right now our NetXMS is pretty much unusable - it crashes within a minute of starting. In between it manages to delete some of the DCIs that is has in it's queue for deletion (turned on debug logging to level 9 to see the queries).
I'm running netxmsd in a while loop now to restart within 15 seconds of crashing. I'm not expecting to get any useful data out of this, but do hope it will eventually catch up with the deletions... :|
If it still has issues after a day, I'll probably have to dig into the database to work out what I need to remove and do it manually.