Catching and Handling Exceptions (SNMP timeout) in NXSL instance discovery

Started by blazarov, November 13, 2016, 11:18:12 AM

Previous topic - Next topic

blazarov

Hi,
I have developped and been using an NXSL script for interface instance discovery using SNMP for over an year now.
Although it is working fine it keeps on throwing alarms for script execution errors on the server node. I am pretty sure that the cause for those errors in 99% of the cases is just SNMP timeout. I have many of the monitored nodes with slow and unreliable connection so this is expected and more or less inevitable.

So now i am looking for a way to enhance my script in a such way that it catches and handles such problems and just quietly abort until the next execution. I have looked int the documents and in the forum, but couldnt find a solution.
Ideas anyone?


Here is my instance discovery script:
snmp = CreateSNMPTransport($node);
ifName = SNMPGetValue(snmp, ".1.3.6.1.2.1.2.2.1.2." . $1);
ifName .= " ";
ifName .= SNMPGetValue(snmp, ".1.3.6.1.2.1.31.1.1.1.18." . $1);
if (ifName ~= "Loopback.*") {
return %(false, $1, ifName);
} else {
return %(true, $1, ifName);
}


Screenshot of the alarms is attached.

Thanks in advance!

Victor Kirhenshtein

Hi,

just add appropriate error checking:


snmp = CreateSNMPTransport($node);
if (snmp == null)
   return false; // node does not support SNMP or transport cannot be created
ifName = SNMPGetValue(snmp, ".1.3.6.1.2.1.2.2.1.2." . $1);
if (ifName == null)
   return false; // cannot read from node
ifXName = SNMPGetValue(snmp, ".1.3.6.1.2.1.31.1.1.1.18." . $1);
if (ifXName != null)
{
   ifName .= " " . ifXName;
}
if (ifName ~= "Loopback.*") {
return %(false, $1, ifName);
} else {
return %(true, $1, ifName);
}


Best regards,
Victor

blazarov

Thanks Victor,
I understand the logic.

My understanding now is when the instance discovery filtering script returns False that particular instance is skipped or removed if already available. Is that true?
If yes, does that mean that if there's a single SNMP error (eg Timeout) in a regular Instance discovery and the script returns False for that particular instance it will be deleted from the node DCI table? As it happens now if i really delete the interface from the router?

This will be problematic, because now NetXMS immediately deletes a DCI when it detects its absense during instance discovery resulting in permanent history data loss, which is not acceptable for the bussiness. This is a feature that i have already requested in "FEature Requests" - to have the option to leave DCI's present (for example in UNSUPPORTED state) after they dissapear from instance discovery.

I was thinking of somehow "breaking" the script in order to avoid any changes instead of returing False which will eventually result in deleting DCIs with their history data. Am i on the right track?

Victor Kirhenshtein

Hi,

yes, if script returns false instance will be deleted. Currently there is no way to stop instance discovery by aborting filter script.

There is feature request for adding "grace period" for instances being deleted: https://dev.raden.solutions/issues/976. I think implementing it should solve your problem.

Best regards,
Victor