Has any one gotten V3 to work correctly ? Ive tried both 1.2.17 and 2.0RC and it seems to have issues in both versions .. Im querying a switch for sys.name and I can pull that same information using SNMPB client no problems but every time i try netxms i have problems is this a known problem
Quote from: gmonk63 on November 10, 2015, 01:45:38 AM
Has any one gotten V3 to work correctly ? Ive tried both 1.2.17 and 2.0RC and it seems to have issues in both versions .. Im querying a switch for sys.name and I can pull that same information using SNMPB client no problems but every time i try netxms i have problems is this a known problem
I'm using 1.2.7 and we are querying quite a few systems with SNMP V3, no problems so far.
Authentication and encryption are correct (method and password)?
All our devices use SNMPv3 and work without a problem (with both 1.2.17 and 2.0-RC2).
We have multiple NetXMS servers, together maybe 500 nodes, it works across all servers with different combinations of auth/priv, etc.
Hmm... Probly a vendor issue then ? Are you using AuthPriv on your devices ?
Thanks for you help
ok after doing some packet captures there does seem to be a slight difference in response from netxms .. after the initial report from the switch which includes the engineid nextxms fails to include the engineid in following get request messages. I have verified that both snmpb and using net-snmp command line from my laptop both work and all but netxms includes the engineid. Not sure if this is still a vendor issues or not but the specific switch i am testing is a Ruggedcom RS900
Which version of NetXMS are those captures from?
1.2.17 I also tried 2.0 with out any luck but I did not do any captures
Could you please try the same captures with 2.0-RC2?
I remember there were some changes to SNMPv3 in the 2.0 branch that fixed some edge-cases... but I might be wrong.
(and I dont know if any of those will apply to you or not)
Thanks in advance!
Here you go.. Its seems worse with 2.0-RC2 I had to reboot my switch several times because it was continually trying several times per second and by default the ruggedcoms will quit responding after so many failed attempts. I also attached the switch debug screen. I have also tried 1.2.17 linux version running on ubuntu thinking maybe it was a windows thing
Hi,
it could be a bug in NetXMS SNMP library. Is it possibe to get access over SNMP to your switch so we can test and fix it? Please send me private message if that's possible.
Best regards,
Victor
Hi,
I've fixed bug in SNMP library, thanks for the access to the switch! Fix will be included in 2.0 release. If you built NetXMS from sources I can provide you a patch.
Best regards,
Victor
Hi Victor
I would be interested in the source code patch. Is the problem with the SNMP auth. failures a separate thing?
Thanks
Dani
could you provide me the patch
And by the way thank you for the quick fix ....you mentioned the fix will be released in 2.0 ... ive been wanting to move to 2.0 rc but wasnt sure how stable things are and didnt want to have to migrate everything from 1.2 .. how far along would you guess 2.0 is from being stable release
Hi,
patch for src/snmp/libnxsnmp/transport.cpp attached.
We plan to make 2.0 release before December 10th.
Best regards,
Victor
Thanks Victor for the patch and your work in general!!
Now I'm only waiting for a patch for the snmp authentication failures (other thread). Then I hope our SNMP problems are gone in NetXMS.
Dani
We have a new problem with SNMP, maybe something similar as the points above.
We use NetXMS V2.0-RC2 compiled with patched transport.cpp (I attached the patched one).
After Software Update on some ZyXEL Access-Points these devices stop responding to SNMPv3 (SHA1, AES) after a while. I also checked by snmpwalk on command line. I have to reboot the affected Access-Points, but after a short while the same. When I disable all SNMP on these nodes in NetXMS, SNMP on these APs are stable (checked by snmpwalk on command line).
Which information do you need for further troubleshooting?
thanks
Dani
Hi,
how fast device stops responding to SNMP? If you disable SNMP in NetXMS and run snmpwalk or snmpget from command line in a loop, will it have the same effect?
Best regards,
Victor
Hi Victor
First I thought it's only by configuration polls. But some APs 'crash' quite fast, after a few minutes. But I also saw APs where it lasts longer. I can start about 50 snmpwalks from command line in short time without any problems (about 20 OIDs each). Only disabling configuration polls on these nodes is not enough. I have to disable SNMP completely for these nodes. If SNMP is completely disabled the SNMP service on these APs are stable permanent.
thanks
Dani
Try to disable routing table and topology polls.
Best regards,
Victor
Ok I will try on one AP.
So only NetXMS agent-, routing table- and topology-polling are disabled, everything else is enabled again. I will check and report.
Beside I also have SNMP_AUTH_FAILURE events on these nodes (since V2.0-RC2)
Hi Victor
The test AP is now stable for 18h. I only have tested with one of the affected APs because of some other APs I do not have credentials to reboot. I have to contact my customer each time.
Do you have a suspicion?
thanks
Dani
Dani@M3T
I think you are still experiencing this issue https://www.netxms.org/forum/configuration/snmp-credentials/ I dont think the patch fixed this particular error where even though snmpv3 credentials now work the server is still sending out some requests with public as default .. I am seeing the same thing on my test switch snmpv3 and v2 works great now but I am still seeing snmp auth failures about 100 a day but on our switches there is a threshold that can be set I think the default is 10 failures in a minute before its stops responding which i increased untill this gets fixed ....
I know the other issue. But I'm not so sure it is the same. One test node only with deactivated topology and routing table polling is stable for 24h.
So it seems that topology or routing table polls cause SNMP agent hang in access points. Most likely it's a bug in SNMP agent. We can try to narrow it further by enabling one of the polls. Topology poll usually do more requests than routing table poll.
Best regards,
Victor
Hi Victor
I made these tests:
- reenabled routing table poll, waiting 20h -> SNMP agent on node still replying
- reenable topology polling, waiting about 20h -> SNMP agent dead
But what can we do? Is it a problem in NetXMS or in SNMP implementation on node? If it is the second I need good arguments for there support ;-)
edit:
A later test with disabled topology poll -> SNMP agent dead again
So the problem is not fixed to topology poll :-(
thanks
Dani
I will try it with a support case at the vendor of the APs.
Another problem on other SNMP enabled nodes:
Suddenly no SNMP data is possible to poll. But the SNMP agent on these nodes are fine (test by snmpwalk on command line).
When I restart netxmsd everything is fine again. It's also with SNMPv3.
I found a hint about this problem in the local log file of the affected node. There is a system protection function which blocks the SNMP get requests from the NetXMS server because of requests with null community string (CVE-1999-0517). In the communication settings of this NetXMS node there are valid SNMPv3 credentials of course.
So maybe this problem has the same reason as the 'SNMP auth. failures'.
There has been a bunch of SNMP discovery fixes in the develop branch this weekend, any chance you can build from develop and try if issues still persist?
Hi Tomas
I normaly compile from 'official' sources (at the moment V2.0-RC2) on Linux, but I have never built from develop branch.
Do you mean compiling only the server from sources from hourly Snapshot of GIT repository?
Thanks
Dani
Yes, you can compile from the hourly snapshot of GIT repo and test with that.
The fixes were mostly to snmp in relation to agent proxying and to snmp discovery, but maybe you will see some difference.
If none of these help (so your issue is still present after testing these newest changes), we will know yours is a totally separate issue.
The develop branch is pretty stable at the moment (I run 2 medium sized installs ~150 nodes each on it atm.), but of course be advised that it is not released code and consider backups of db/server before deploying it :)
Thanks Tomas.
I tried to compile hourly snapshot from git repo. But I get this error:
make[3]: Entering directory `/usr/src/netxms-2.0-dev-2015-12-14/src/libstrophe'
CC libstrophe_la-auth.lo
In file included from ../../include/nms_common.h:78:0,
from auth.c:19:
../../include/netxms-version.h:27:30: fatal error: netxms-build-tag.h: No such file or directory
#include <netxms-build-tag.h>
^
compilation terminated.
make[3]: *** [libstrophe_la-auth.lo] Error 1
make[3]: Leaving directory `/usr/src/netxms-2.0-dev-2015-12-14/src/libstrophe'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/usr/src/netxms-2.0-dev-2015-12-14/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/netxms-2.0-dev-2015-12-14'
make: *** [all] Error 2
autoconf, flex, bison, libtool are installed.
What can I do?
Script, which generate this header require git repo, not a snapshot. It's a bug.
However, if you clone repo instead of using snapshot – it will work.
Repository url for clone is https://git.netxms.org/public/netxms.git
P.S. don't forget to switch branch to "develop" (git checkout develop)
I've fixed that, now snapshots should work as well
Thanks Alex for the fast fix
Now downloading the snapshot from download site on website, './reconf', 'configure...', 'make',... works!
The SNMP_AUTH_FAILURE events are gone!
For the other problems I need more time to test.
thanks for the fixes
Dani
Is the also a solution for this topic
https://www.netxms.org/forum/configuration/snmp-credentials/
im still hesitant to upgrade to 2.0 because i really dont want to have to reset over 200 switches due to the wrong community string being sent out repeatedly
In my opinion I would say yes! (11h of test)