Dear NetXMS community,
The following issue has me stumped. I've got four Brocade VDX switches, all added to NetXMS. Two of them are behaving fine, responding to SNMP properly, returning their hostname and interface data for example. The other two don't seem to be replying to SNMP out of NetXMS at all. One obvious difference is that the two switches that don't work are holding some L3 config (they run VRRP interfaces), and the other 2 switches that are working fine don't have L3 config as they run purely as L2 switches. This shouldn't make a difference, and is still possibly a coincidence, but maybe it's important. Here is what I have done so far and why I'm stuck:
If anybody has any more clues about where to dig deeper to find the cause, I would be very grateful.
			The following issue has me stumped. I've got four Brocade VDX switches, all added to NetXMS. Two of them are behaving fine, responding to SNMP properly, returning their hostname and interface data for example. The other two don't seem to be replying to SNMP out of NetXMS at all. One obvious difference is that the two switches that don't work are holding some L3 config (they run VRRP interfaces), and the other 2 switches that are working fine don't have L3 config as they run purely as L2 switches. This shouldn't make a difference, and is still possibly a coincidence, but maybe it's important. Here is what I have done so far and why I'm stuck:
- Try plain snmpwalk to all four switches from the (Windows) CLI of the NetXMS server. All four switches respond fine (and the same) to this, all of them showing the right hostname for example
- Try the MIB explorer, selecting OID .1.3.6.1.2.1.1.5.0 (sysName for this MIB) for example and then trying to walk the specific OID to read its value. This works for two of the switches, but not for the other two
- At this point I concluded that the behavior points to something within NetXMS. I'm new to the platform but I did some digging. I found the CLI server console and set the debug level to 6. After that, I put one of the misbehaving switches in unmanaged mode, then back to managed mode, to trigger SNMP discovery. I reverted the debug level right after, as this causes our environment to collect a few MB of logs per minute. Here is the log output for one of the switches, secret-device-name. I can see the discovery being initiated but not being able to return much. In the end it just creates an unknown interface with the node mgmt IP on it, reachable via ICMP, and sets the device status to NORMAL based in that. I'm assuming this is default NetXMS behavior:
Quote2020.11.09 16:25:11.429 *D* [poll.manager ] Data collection target secret-device-name [7660] queued for status poll
2020.11.09 16:25:11.429 *D* [poll.manager ] Node secret-device-name [7660] queued for ICMP poll
2020.11.09 16:25:11.429 *D* [poll.status ] Starting status poll for node secret-device-name (ID: 7660)
2020.11.09 16:25:11.445 *D* [poll.status ] StatusPoll(secret-device-name): allDown=false, statFlags=0x00000000
2020.11.09 16:25:15.226 *D* [event.corr ] CorrelateEvent: event SYS_NODE_UNMANAGED id 102178392 source secret-device-name [7660]
2020.11.09 16:25:15.226 *D* [event.proc ] EVENT SYS_NODE_UNMANAGED [12] (ID:102178392 F:0x0001 S:0 TAGS:"NodeStatus") FROM secret-device-name: Node status changed to UNMANAGED
2020.11.09 16:25:15.445 *D* [client.session.2 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.1 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.2 ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.1 ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.3 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.3 ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.0 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.0 ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:17.477 *D* [poll.status ] StatusPoll(secret-device-name [7660]): unable to get system uptime
2020.11.09 16:25:17.477 *D* [poll.status ] Finished status poll for node secret-device-name (ID: 7660)
2020.11.09 16:25:21.430 *D* [event.corr ] CorrelateEvent: event SYS_NODE_UNKNOWN id 102178403 source secret-device-name [7660]
2020.11.09 16:25:21.430 *D* [event.proc ] EVENT SYS_NODE_UNKNOWN [11] (ID:102178403 F:0x0001 S:0 TAGS:"NodeStatus") FROM secret-device-name: Node status changed to UNKNOWN
2020.11.09 16:25:21.633 *D* [client.session.2 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.1 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.3 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.0 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.2 ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.1 ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.3 ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.0 ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:07.025 *D* [obj.sync ] Object secret-device-name [7660] modified
2020.11.09 16:26:21.885 *D* [poll.manager ] Data collection target secret-device-name [7660] queued for status poll
2020.11.09 16:26:21.885 *D* [poll.manager ] Node secret-device-name [7660] queued for ICMP poll
2020.11.09 16:26:21.885 *D* [poll.status ] Starting status poll for node secret-device-name (ID: 7660)
2020.11.09 16:26:21.885 *D* [poll.status ] StatusPoll(secret-device-name): allDown=false, statFlags=0x00000000
2020.11.09 16:26:27.917 *D* [poll.status ] StatusPoll(secret-device-name [7660]): unable to get system uptime
2020.11.09 16:26:27.917 *D* [event.corr ] CorrelateEvent: event SYS_IF_UP id 102178496 source secret-device-name [7660]
2020.11.09 16:26:27.917 *D* [poll.status ] Finished status poll for node secret-device-name (ID: 7660)
2020.11.09 16:26:27.917 *D* [event.proc ] EVENT SYS_IF_UP [4] (ID:102178496 F:0x0001 S:0 TAGS:"") FROM secret-device-name: Interface "unknown" changed state to UP (IP Addr: x.x.x.x/24, IfIndex: 1)
2020.11.09 16:26:27.917 *D* [event.corr ] CorrelateEvent: event SYS_NODE_NORMAL id 102178509 source secret-device-name [7660]
2020.11.09 16:26:27.917 *D* [event.proc ] EVENT SYS_NODE_NORMAL [6] (ID:102178509 F:0x0001 S:0 TAGS:"NodeStatus") FROM secret-device-name: Node status changed to NORMAL
2020.11.09 16:26:28.120 *D* [client.session.2 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.2 ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.1 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.1 ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.3 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.3 ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.0 ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.0 ] Sending update for object secret-device-name [7660]
If anybody has any more clues about where to dig deeper to find the cause, I would be very grateful.