ExternalParametersProvider race condition causes "Unsupported" alarms

Started by BillLortz, June 24, 2016, 09:29:08 PM

Previous topic - Next topic

BillLortz

We rely on the wonderful ExternalParametersProvider feature of NetXMS extensively.   Most of the time the values we are after are relatively static so we often only run the script every 10 minutes.

Unfortunately, when a node reboots we see a race condition where NetXMS declares many of the variables that came from ExternalParametersProvidor as unsupported.   It then disables that DCI variable.

For example, if we have an ExternalParametersProvider that provides license info from a USB Licensing Hasp plugged into the node, it might return a variable "HaspID" and another variable "HaspType".    In the DCI tables for a node, we would create an entry that pulls the Agent Variable "HaspID".     That entry works just fine.   But, after a reboot, many of the nodes that reference that type log alarms and declare that DCI Entry as unsupported.    We then have to individually go to each node and manually re-enable them.    Sometimes, it will generate a new alarm and re-disable that entry a few minutes later.   If we do a configuration poll and re-enable, it often solves the problem until the node reboots.     We usually try to match the polling interval of the DCI entry to the frequency specified in the ExternalParametersProvider function.   For example, if we use 600 seconds for the ExternalParametersProvider definition, we'll also set the polling interval of the DCI entry to 600.

It can be fairly painful after Microsoft Patch Tuesday when 50 machines have rebooted.   we have to go in and re-enable the dci entry on each node and re-poll.

Is there something we need to change in our configuration to solve this issue?

We see this issue on different versions of netXMS.   For example, it occurs on 1.2.17 and on 2.0.3

BillLortz

I would like to add that version 2.0.3 seems a bit more forgiving of these errors and doesn't always mark the dci entry as inactive.    But, I believe it still does sometimes.

Victor Kirhenshtein

Hi,

proper solution will include changes on agent side. I'll change the agent so it will save parameter names in local database so if server request comes before provider first run agent will respond with collection error instead of unsupported parameter.

Best regards,
Victor