Dear NetXMS community,
The following issue has me stumped. I've got four Brocade VDX switches, all added to NetXMS. Two of them are behaving fine, responding to SNMP properly, returning their hostname and interface data for example. The other two don't seem to be replying to SNMP out of NetXMS at all. One obvious difference is that the two switches that don't work are holding some L3 config (they run VRRP interfaces), and the other 2 switches that are working fine don't have L3 config as they run purely as L2 switches. This shouldn't make a difference, and is still possibly a coincidence, but maybe it's important. Here is what I have done so far and why I'm stuck:
- Try plain snmpwalk to all four switches from the (Windows) CLI of the NetXMS server. All four switches respond fine (and the same) to this, all of them showing the right hostname for example
- Try the MIB explorer, selecting OID .1.3.6.1.2.1.1.5.0 (sysName for this MIB) for example and then trying to walk the specific OID to read its value. This works for two of the switches, but not for the other two
- At this point I concluded that the behavior points to something within NetXMS. I'm new to the platform but I did some digging. I found the CLI server console and set the debug level to 6. After that, I put one of the misbehaving switches in unmanaged mode, then back to managed mode, to trigger SNMP discovery. I reverted the debug level right after, as this causes our environment to collect a few MB of logs per minute. Here is the log output for one of the switches, secret-device-name. I can see the discovery being initiated but not being able to return much. In the end it just creates an unknown interface with the node mgmt IP on it, reachable via ICMP, and sets the device status to NORMAL based in that. I'm assuming this is default NetXMS behavior:
Quote2020.11.09 16:25:11.429 *D* [poll.manager       ] Data collection target secret-device-name [7660] queued for status poll
2020.11.09 16:25:11.429 *D* [poll.manager       ] Node secret-device-name [7660] queued for ICMP poll
2020.11.09 16:25:11.429 *D* [poll.status        ] Starting status poll for node secret-device-name (ID: 7660)
2020.11.09 16:25:11.445 *D* [poll.status        ] StatusPoll(secret-device-name): allDown=false, statFlags=0x00000000
2020.11.09 16:25:15.226 *D* [event.corr         ] CorrelateEvent: event SYS_NODE_UNMANAGED id 102178392 source secret-device-name [7660]
2020.11.09 16:25:15.226 *D* [event.proc         ] EVENT SYS_NODE_UNMANAGED [12] (ID:102178392 F:0x0001 S:0 TAGS:"NodeStatus") FROM secret-device-name: Node status changed to UNMANAGED
2020.11.09 16:25:15.445 *D* [client.session.2   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.1   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.2   ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.1   ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.3   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.3   ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.0   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:15.445 *D* [client.session.0   ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:17.477 *D* [poll.status        ] StatusPoll(secret-device-name [7660]): unable to get system uptime
2020.11.09 16:25:17.477 *D* [poll.status        ] Finished status poll for node secret-device-name (ID: 7660)
2020.11.09 16:25:21.430 *D* [event.corr         ] CorrelateEvent: event SYS_NODE_UNKNOWN id 102178403 source secret-device-name [7660]
2020.11.09 16:25:21.430 *D* [event.proc         ] EVENT SYS_NODE_UNKNOWN [11] (ID:102178403 F:0x0001 S:0 TAGS:"NodeStatus") FROM secret-device-name: Node status changed to UNKNOWN
2020.11.09 16:25:21.633 *D* [client.session.2   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.1   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.3   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.0   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.2   ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.1   ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.3   ] Sending update for object secret-device-name [7660]
2020.11.09 16:25:21.633 *D* [client.session.0   ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:07.025 *D* [obj.sync           ] Object secret-device-name [7660] modified
2020.11.09 16:26:21.885 *D* [poll.manager       ] Data collection target secret-device-name [7660] queued for status poll
2020.11.09 16:26:21.885 *D* [poll.manager       ] Node secret-device-name [7660] queued for ICMP poll
2020.11.09 16:26:21.885 *D* [poll.status        ] Starting status poll for node secret-device-name (ID: 7660)
2020.11.09 16:26:21.885 *D* [poll.status        ] StatusPoll(secret-device-name): allDown=false, statFlags=0x00000000
2020.11.09 16:26:27.917 *D* [poll.status        ] StatusPoll(secret-device-name [7660]): unable to get system uptime
2020.11.09 16:26:27.917 *D* [event.corr         ] CorrelateEvent: event SYS_IF_UP id 102178496 source secret-device-name [7660]
2020.11.09 16:26:27.917 *D* [poll.status        ] Finished status poll for node secret-device-name (ID: 7660)
2020.11.09 16:26:27.917 *D* [event.proc         ] EVENT SYS_IF_UP [4] (ID:102178496 F:0x0001 S:0 TAGS:"") FROM secret-device-name: Interface "unknown" changed state to UP (IP Addr: x.x.x.x/24, IfIndex: 1)
2020.11.09 16:26:27.917 *D* [event.corr         ] CorrelateEvent: event SYS_NODE_NORMAL id 102178509 source secret-device-name [7660]
2020.11.09 16:26:27.917 *D* [event.proc         ] EVENT SYS_NODE_NORMAL [6] (ID:102178509 F:0x0001 S:0 TAGS:"NodeStatus") FROM secret-device-name: Node status changed to NORMAL
2020.11.09 16:26:28.120 *D* [client.session.2   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.2   ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.1   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.1   ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.3   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.3   ] Sending update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.0   ] Scheduling update for object secret-device-name [7660]
2020.11.09 16:26:28.120 *D* [client.session.0   ] Sending update for object secret-device-name [7660]
If anybody has any more clues about where to dig deeper to find the cause, I would be very grateful.
			
				Hi!
If you open Object Details -> Overview, there's column "Capabilities" on the right hand side. There's is property called "isSNMP". If it's "Yes", it means that NetXMS has detected that the device is SNMP capable and is querying it via SNMP. Capabilities are detected on Full Configuration poll. So if you problematic devices have "No" there, try doing Full Configuration Poll. 
P.S. 
In netxms debug console it's possible to filter out some of the things by the debug tad. e.g. you can issue 
debug client.* 0 
to turn off debug messages for management console.
debug client.* -1
will return debugging for this debug tag to default. 
			
			
			
				Dear Filipp,
Thank you for providing some additional information. I did see the Capabilities column. For the problematic devices all capabilities are telling me "No".
I just tried the Full Configuration Poll. It's failing to get additional capabilities and tells me that nothing changed at the end:
Quote[10.11.2020 15:46:32] **** Poll request sent to server ****
[10.11.2020 15:46:33] Poll request accepted
[10.11.2020 15:46:36] Starting configuration poll for node secret-device-name
[10.11.2020 15:46:36] Checking node's capabilities...
[10.11.2020 15:46:36]    Checking NetXMS agent...
[10.11.2020 15:46:37]    Cannot connect to NetXMS agent (Connect failed)
[10.11.2020 15:46:37]    Checking SNMP...
[10.11.2020 15:46:55]    Checking EtherNet/IP...
[10.11.2020 15:46:56]    Cannot get device identity via EtherNet/IP (CONNECT FAILED)
[10.11.2020 15:46:56] Capability check finished
[10.11.2020 15:46:56] Checking interface configuration...
[10.11.2020 15:46:56] Unable to get interface list from node
[10.11.2020 15:46:56] Interface configuration check finished
[10.11.2020 15:46:56] Checking node name
[10.11.2020 15:46:56] Node name is OK
[10.11.2020 15:46:56] Updating general system hardware information
[10.11.2020 15:46:56] Finished configuration poll for node secret-device-name
[10.11.2020 15:46:56] Node configuration was not changed after poll
[10.11.2020 15:46:56] **** Poll completed successfully ****
Would it be worthwhile to inspect debug logs during the Full Configuration Poll?
			
				There's 18 seconds between Checking SNMP... and the next line of this configuration poll, this means that SNMP commnunication step has timed out. This basically means that SNMP device did no answer. 
- Check Communication -> SNMP settings in properties of the node - if version and community string is correct. 
- you can check with wireshark or tcpdump what OIDs NetXMS is requesting during configuration poll. E.g. for SNMP v2 request for these OIDs is sent:
    1.3.6.1.2.1.1.2.0
    1.3.6.1.2.1.1.1.0
    1.3.6.1.4.1.35160.1.1.0
Try running e.g.
snmpget -v 2c -c public node-ip-address 1.3.6.1.2.1.1.1.0
at the command line on the machine where NetXMS is running. Will it return any data?
			
			
			
				Dear Filipp,
Thanks once more for providing the pointers. I tried some manual SNMP get requests via the CLI again, from our NetXMS server, and they were succeeding. I tried the full configuration poll once more, and it failed again.
Next I started looking at the NetXMS full configuration poll and at the manual SNMP get requests in wireshark. I was filtering on the destination device IP and noticed that while the manual requests are successful, I only see the get requests when filtering my pcap on destination IP, I don't see the responses. Then it finally dawned on my: because these are L3 switches and the IP we use to manage them is in a different subnet, the L3 switches are sourcing their SNMP response from the IP they have in the NetXMS subnet.
I need to solve this with some SNMP source configuration.
So the last mystery was why are manual requests succeeding and NetXMS requests failing? It looks like the snmpwalk and snmpget from Net-SNMP are able to recognize the responses based on their query ID, even though the response comes from a different IP. NetXMS does not accept this, which is arguably cleaner and safer.
Thanks again for helping me figure this out. This topic can be closed as the root cause is identified.