SNMP Stopped working ESXi 5.1

Started by nihplod, February 26, 2014, 08:23:31 PM

Previous topic - Next topic

nihplod

Hi

I have migrated my ESXi-network from a vSphere Standard Switch to a vSphere Distributed Switch with 2 x 10Gbe Uplinks with LACP.
After that SNMP stopped working when i try to poll the node.


Running a SNMPwalk from the machine running NetXMS-server
[root@netxms] [/var/log] snmpwalk -c public -v 2c 10.0.0.2 | more
iso.3.6.1.2.1.1.1.0 = STRING: "VMware ESXi 5.1.0 build-1021289 VMware, Inc. x86_64"
iso.3.6.1.2.1.1.2.0 = OID: iso.3.6.1.4.1.6876.4.1
iso.3.6.1.2.1.1.3.0 = Timeticks: (10246400) 1 day, 4:27:44.00
iso.3.6.1.2.1.1.4.0 = ""
iso.3.6.1.2.1.1.5.0 = STRING: "esxi.int.nihplod"
iso.3.6.1.2.1.1.6.0 = ""
iso.3.6.1.2.1.1.7.0 = INTEGER: 72
iso.3.6.1.2.1.1.8.0 = Timeticks: (0) 0:00:00.00


Running walk from a other node on the network
[nihplod@godfather] [~] snmpwalk -c public -v 2c 10.0.0.2 | more
SNMPv2-MIB::sysDescr.0 = STRING: VMware ESXi 5.1.0 build-1021289 VMware, Inc. x86_64
SNMPv2-MIB::sysObjectID.0 = OID: SNMPv2-SMI::enterprises.6876.4.1
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (10250000) 1 day, 4:28:20.00
SNMPv2-MIB::sysContact.0 = STRING:
SNMPv2-MIB::sysName.0 = STRING: esxi.int.nihplod
SNMPv2-MIB::sysLocation.0 = STRING:
SNMPv2-MIB::sysServices.0 = INTEGER: 72
SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00
SNMPv2-MIB::sysORID.1 = OID: SNMPv2-MIB::snmpMIB
SNMPv2-MIB::sysORID.2 = OID: IF-MIB::ifMIB
SNMPv2-MIB::sysORID.3 = OID: IP-MIB::ip
SNMPv2-MIB::sysORID.4 = OID: IP-MIB::ip.24
SNMPv2-MIB::sysORID.5 = OID: UDP-MIB::udp
SNMPv2-MIB::sysORID.6 = OID: TCP-MIB::tcp
SNMPv2-MIB::sysORID.7 = OID: SNMPv2-SMI::mib-2.47


Running walk from the machine running the NetXMS Management Console
C:\>SnmpWalk.exe -r:10.0.0.2
SnmpWalk v1.01 - Copyright (C) 2009 SnmpSoft Company
[ More useful network tools on http://www.snmpsoft.com ]

OID=.1.2.840.10006.300.43.1.3.0, Type=TimeTicks, Value=0:00:00.00
OID=.1.3.6.1.2.1.1.1.0, Type=OctetString, Value=VMware ESXi 5.1.0 build-1021289
VMware, Inc. x86_64
OID=.1.3.6.1.2.1.1.2.0, Type=OID, Value=1.3.6.1.4.1.6876.4.1
OID=.1.3.6.1.2.1.1.3.0, Type=TimeTicks, Value=1 day 4:29:10.00
OID=.1.3.6.1.2.1.1.4.0, Type=OctetString, Value=
OID=.1.3.6.1.2.1.1.5.0, Type=OctetString, Value=esxi.int.nihplod
OID=.1.3.6.1.2.1.1.6.0, Type=OctetString, Value=
OID=.1.3.6.1.2.1.1.7.0, Type=Integer, Value=72
OID=.1.3.6.1.2.1.1.8.0, Type=TimeTicks, Value=0:00:00.00
OID=.1.3.6.1.2.1.1.9.1.2.1, Type=OID, Value=1.3.6.1.6.3.1
OID=.1.3.6.1.2.1.1.9.1.2.2, Type=OID, Value=1.3.6.1.2.1.31


The config poll in NetXMS Management Console
[26.02.2014 19:13:16] **** Poll request sent to server ****
[26.02.2014 19:13:16] Poll request accepted
[26.02.2014 19:13:16] Starting configuration poll for node ESXi
[26.02.2014 19:13:16] Checking node's capabilities...
[26.02.2014 19:13:16]    Checking SNMP...
[26.02.2014 19:14:36] Capability check finished
[26.02.2014 19:14:36] Checking interface configuration...
[26.02.2014 19:14:36] Unable to get interface list from node
[26.02.2014 19:14:36]    Interface "unknown" is no longer exist
[26.02.2014 19:14:36] Interface configuration check finished
[26.02.2014 19:14:36] Checking node name
[26.02.2014 19:14:36] Node name is OK
[26.02.2014 19:14:36] Finished configuration poll for node ESXi
[26.02.2014 19:14:36] Node configuration was not changed after poll
[26.02.2014 19:14:36] **** Poll completed successfully ****


This is what the netxms log-file tells me as soon as i try to do a poll.
[26-Feb-2014 19:13:19.137] [DEBUG] Starting configuration poll for node ESXi (ID: 1228)
[26-Feb-2014 19:13:19.137] [DEBUG] ConfPoll(ESXi): checking for NetXMS agent Flags={02000000} DynamicFlags={00000400}
[26-Feb-2014 19:13:19.138] [DEBUG] ConfPoll(ESXi): calling SnmpCheckCommSettings()
[26-Feb-2014 19:13:19.138] [DEBUG] SnmpCheckV3CommSettings: failed
[26-Feb-2014 19:13:19.139] [DEBUG] SnmpCheckCommSettings: trying version 1 community 'public'


I have tried to use SNMP version 1, 2c and 3 with the same result, but it works when doing the snmpwalk. No idea why it does not respond.
SNMP to my switch works perfectly.

I am running version 1.2.9 of server and console, i have tried to delete and add the node again. Also restarted NetXMS-server and rebooted the machine where NetXMS-server is running on.
I haven't restarted the ESXi-server, but i assume it should work when all other machines can snmpwalk it.

Any fancy clues?

Edit
I have upgraded the server to 1.2.12 and the issue is still there for me with the same output.

Victor Kirhenshtein

Hi!

Could you also get tcpdump from NetXMS server node and other node? Also, is there any difference between NetXMS and other node in network connectivity? Could be also issue with LACP (or related somehow) which causes packet duplication and/or loss. Did you try it with one uplink disconnected?

Best regards,
Victor

nihplod

Hi

I had a second vmk on a other interface that it could pull the snmp from, so i tried to move the primary vmk to the backup NIC but it did nothing. But i could still pull the snmp from the secondary vmk.

So i removed the secondary vmk and now it works flawlessy again with snmp.

I tried to reproduce this issue

vmk0 on LACP
vmk1 on single NIC

This makes snmp to stop working for vmk0 but works for vmk1
moving vmk0 to single nic together with vmk1 gives the same result. snmp works for vmk1 but not for vmk0

Removing vmk1 makes snmp to work instantly for vmk0
Moving vmk0 to LACP still works perfectly.

So, no idea why it stops working for vmk0 when i am having a vmk1 active on the same subnet. Specially when a snmpwalk works from different machines and the same machine hosting NetXMS-server.

If you would like some more information i could reproduce this issue for you and give you any dumps you require.
otherwise i am happy that it works again.

I just need to go through all the OID's again cause i removed the Node over and over again just to make sure there wasn't any weird issues :P

Victor Kirhenshtein

I suspect that when there is two adapters in same subnet ESXi choose one to reply from, and so response packet comes from different IP address and ignored by request sender. Capturing SNMP packets with tcpdump can prove or deny this assumption.

Best regards,
Victor