SNMPV3

Started by gmonk63, November 10, 2015, 01:45:38 AM

Previous topic - Next topic

Dani@M3T

Thanks Victor for the patch and your work in general!!
Now I'm only waiting for a patch for the snmp authentication failures (other thread). Then I hope our SNMP problems are gone in NetXMS.

Dani

Dani@M3T

We have a new problem with SNMP, maybe something similar as the points above.

We use NetXMS V2.0-RC2 compiled with patched transport.cpp (I attached the patched one).

After Software Update on some ZyXEL Access-Points these devices stop responding to SNMPv3 (SHA1, AES) after a while. I also checked by snmpwalk on command line. I have to reboot the affected Access-Points, but after a short while the same. When I disable all SNMP on these nodes in NetXMS, SNMP on these APs are stable (checked by snmpwalk on command line).
Which information do you need for further troubleshooting?

thanks
Dani

Victor Kirhenshtein

Hi,

how fast device stops responding to SNMP? If you disable SNMP in NetXMS and run snmpwalk or snmpget from command line in a loop, will it have the same effect?

Best regards,
Victor

Dani@M3T

#18
Hi Victor

First I thought it's only by configuration polls. But some APs 'crash' quite fast, after a few minutes. But I also saw APs where it lasts longer. I can start about 50 snmpwalks from command line in short time without any problems (about 20 OIDs each). Only disabling configuration polls on these nodes is not enough. I have to disable SNMP completely for these nodes. If SNMP is completely disabled the SNMP service on these APs are stable permanent.

thanks
Dani

Victor Kirhenshtein

Try to disable routing table and topology polls.

Best regards,
Victor

Dani@M3T

#20
Ok I will try on one AP.
So only NetXMS agent-, routing table- and topology-polling are disabled, everything else is enabled again. I will check and report.

Beside I also have SNMP_AUTH_FAILURE events on these nodes (since V2.0-RC2)

Dani@M3T

Hi Victor

The test AP is now stable for 18h. I only have tested with one of the affected APs because of some other APs I do not have credentials to reboot. I have to contact my customer each time.
Do you have a suspicion?


thanks
Dani

gmonk63

Dani@M3T

I think you are still experiencing  this issue https://www.netxms.org/forum/configuration/snmp-credentials/  I dont think the patch fixed this particular error where even though snmpv3 credentials now work the server is still sending out some requests with public as default ..  I am seeing the same thing on my test switch snmpv3 and v2 works great now but I am still seeing snmp auth failures  about 100 a day but on our switches there is a threshold that can be set  I think the default is 10 failures in a minute before its stops responding which i increased untill this gets fixed ....

Dani@M3T

I know the other issue. But I'm not so sure it is the same. One test node only with deactivated topology and routing table polling is stable for 24h.

Victor Kirhenshtein

So it seems that topology or routing table polls cause SNMP agent hang in access points. Most likely it's a bug in SNMP agent. We can try to narrow it further by enabling one of the polls. Topology poll usually do more requests than routing table poll.

Best regards,
Victor

Dani@M3T

#25
Hi Victor

I made these tests:
- reenabled routing table poll, waiting 20h -> SNMP agent on node still replying
- reenable topology polling, waiting about 20h -> SNMP agent dead

But what can we do? Is it a problem in NetXMS or in SNMP implementation on node? If it is the second I need good arguments for there support ;-)

edit:
A later test with disabled topology poll -> SNMP agent dead again
So the problem is not fixed to topology poll :-(

thanks
Dani

Dani@M3T

I will try it with a support case at the vendor of the APs.

Another problem on other SNMP enabled nodes:
Suddenly no SNMP data is possible to poll. But the SNMP agent on these nodes are fine (test by snmpwalk on command line).
When I restart netxmsd everything is fine again. It's also with SNMPv3.

Dani@M3T

I found a hint about this problem in the local log file of the affected node. There is a system protection function which blocks the SNMP get requests from the NetXMS server because of requests with null community string (CVE-1999-0517). In the communication settings of this NetXMS node there are valid SNMPv3 credentials of course.
So maybe this problem has the same reason as the 'SNMP auth. failures'.

tomaskir

There has been a bunch of SNMP discovery fixes in the develop branch this weekend, any chance you can build from develop and try if issues still persist?

Dani@M3T

#29
Hi Tomas

I normaly compile from 'official' sources (at the moment V2.0-RC2) on Linux, but I have never built from develop branch.
Do you mean compiling only the server from sources from hourly Snapshot of GIT repository?

Thanks
Dani