SNMP agent unreachable after network outage

Started by Dani@M3T, May 19, 2016, 08:52:13 PM

Previous topic - Next topic

Dani@M3T

Hi

I observe the following problem for some time:
We have some sites connected by VPN connections.
When a VPN connection goes down and up again, the NetXMS server reports the remote VPN-gateway as 'snmp agent unreachable'. When I manually do a status Poll on a remote node I get 'node is connected' but also 'snmp agent unreachable'.

My findings so far:

  • SNMP to all other nodes in the same remote sites is ok
  • RuntimeFlags is 0x1010 on the remote VPN-gateway node
  • snmpwalk for remote VPN-gateway on command line of NetXMS server is ok
  • nxsnmpwalk for remote VPN-gateway on command line of NetXMS server is ok too
  • restart of netxmsd fix the problem for the moment
  • remote VPN-gateway and local VPN-gateways has VPN connectors in NetXMS configured
  • in server debug log I found the following

[19-May-2016 18:28:50.789] [DEBUG] Node(vpn-gateway.domain.intern)->GetItemFromSNMP(.1.3.6.1.2.1.1.3.0): dwResult=17
[19-May-2016 18:28:50.789] [DEBUG] StatusPoll(vpn-gateway.domain.intern [951]): unable to get system uptime
[19-May-2016 18:28:50.789] [DEBUG] StatusPoll(vpn-gateway.domain.intern [951]): unable to get agent uptime
[19-May-2016 18:28:50.789] [DEBUG] StatusPoll(vpn-gateway.domain.intern [951]): unable to get system location
[19-May-2016 18:28:50.789] [DEBUG] Finished status poll for node vpn-gateway.domain.intern (ID: 951)
[19-May-2016 18:28:50.789] [DEBUG] ConfigReadStr: (cached) name=DeleteUnreachableNodesPeriod value="0"
...
[19-May-2016 18:30:07.540] [DEBUG] Node(vpn-gateway.domain.intern)->GetItemFromSNMP(.1.3.6.1.4.1.890.1.6.22.1.6.0): dwResult=4


What could be the best next steps for troubleshooting?

NetXMS server is V2.0.3 on Linux x64 built from sources.

thanks
Dani

Victor Kirhenshtein

Hi,

error 4 is general communication error, and error 17 is SNMP engine ID mismatch. Does this problematic gateway use SNMP version 3? Could you please capture SNMP traffic between NetXMS server and gateway during unsuccessful status poll?

Best regards,
Victor

Dani@M3T

Hi Victor

Yes these are SNMPV3 Devices.
I rebooted one of the remote gateways to activate the problem and captured the SNMP traffic between the remote gateway and the NetXMS server.

I get a lot of exactly this communication:
Server to gateway: "encryptedPDU: privKey Unknown"
Gateway to server: "report 1.3.6.1.6.3.15.1.1.4.0"
(I can also send you the tcpdump file but not in the forum)

Maybe the Engine ID changed when remote gateway was rebooted. But I cannot set a static Engine ID on these gateways.

kind regards
Dani

Victor Kirhenshtein

Hi,

could you please try to apply attached patch and check if server will handle gateway restart correctly?

Best regards,
Victor

Dani@M3T

Hi Victor

Your patch for reset of the engine ID fixes the problem. First test is ok!
Do you apply this to the 2.0.4 release?

Thanks!
Dani

Victor Kirhenshtein

Hi,

yes, this patch will be included into 2.0.4 release.

Best regards,
Victor