How to troubleshoot active network discovery?

Started by troffasky, June 16, 2022, 08:12:55 PM

Previous topic - Next topic

troffasky

Currently have debug level on 5.
Have a /24 subnet with multiple nodes that NetXMS has discovered previously. Two new hosts were added to this network but I just cannot get NetXMS to discover them automatically.
"Active discovery - node 172.22.141.37 responded to ICMP probe via proxy" is logged but that's it, it doesn't log that it took any further action on that specific IP.
If I add the node manually it works instantly, ie, the host is configured with the correct SNMP community.
"Starting SNMP check on range" is logged for the IP range that covers that host, but that's it.
It also mentions that it knows this host from another node's ARP cache.

Filipp Sudanov

Upgrade to the latest version - discovery debugging has been a bit improved recently.
Set debug level 6 for tag poll.discovery
Try running discovery - you can create a small range that covers just that IP address and manually force discovery for that range.

troffasky

The active discovery interval is 1800s, passive is 900s. I last made any changes to NetXMS around 18:00 then posted here. 04:48 the next day, it's discovered.

2022.06.17 04:48:44.046 *D* [event.proc         ] EVENT SYS_NODE_ADDED [1] at {0} (ID:12649088 F:0x0001 S:0 TAGS:"NewObjects") FROM SW6: Node added
2022.06.17 04:48:44.046 *D* [                   ] Name for node 118685 was resolved to SW6
2022.06.17 04:48:44.046 *D* [poll.conf          ] ConfPoll(SW6 [118685]): node name resolved


Doesn't make any sense! Similar story [probably] with the other switch on site that wasn't discovering, but in that case because I increased the log level my logs only go back to 0200 and it seems to have been added before then.
Version is 4.1.377

troffasky

It doesn't seem to like a /32 netmask for this purpose. I get no error in the GUI when adding and scanning a test discovery target with a /32 but this is logged:


2022.07.01 14:00:08.686 *D* [poll.discovery     ] Invalid address range 10.127.10.3/32


[Yes, "Subnet" is ticked and not "Address range"].
Changed it to a range of one IP address and the discovery runs as expected.

troffasky

Anyway. It still will not acquire this node, even though all the conditions appear to be right.


2022.07.01 14:12:52.785 *D* [poll.discovery     ] Starting active discovery check on range 10.127.10.3 - 10.127.10.3 (snmp=true tcp=true bs=1024 delay=0)
2022.07.01 14:12:54.291 *D* [poll.discovery     ] Active discovery - node 10.127.10.3 responded to ICMP probe
2022.07.01 14:12:54.291 *D* [poll.discovery     ] Checking address 10.127.10.3 in zone 0 (source: Active Discovery)
2022.07.01 14:12:54.292 *D* [poll.discovery     ] New node queued: 10.127.10.3/24
2022.07.01 14:12:54.292 *D* [poll.discovery     ] Starting SNMP check on range 10.127.10.3 - 10.127.10.3
2022.07.01 14:12:54.293 *D* [poll.discovery     ] NodePoller: processing address 10.127.10.3/24 in zone 0 (source type Active Discovery, source node [0])
2022.07.01 14:12:55.806 *D* [poll.discovery     ] Active discovery - node 10.127.10.3 responded to SNMP probe
2022.07.01 14:12:55.806 *D* [poll.discovery     ] Checking address 10.127.10.3 in zone 0 (source: Active Discovery)
2022.07.01 14:12:55.808 *D* [poll.discovery     ] Potential node 10.127.10.3 rejected (IP address already queued for polling)
2022.07.01 14:12:57.320 *D* [poll.discovery     ] Active discovery - node 10.127.10.3 responded to SNMP probe
2022.07.01 14:12:57.320 *D* [poll.discovery     ] Checking address 10.127.10.3 in zone 0 (source: Active Discovery)
2022.07.01 14:12:57.321 *D* [poll.discovery     ] Potential node 10.127.10.3 rejected (IP address already queued for polling)
2022.07.01 14:13:04.879 *D* [poll.discovery     ] Starting TCP check on range 10.127.10.3 - 10.127.10.3
2022.07.01 14:13:04.968 *D* [poll.discovery     ] Finished active discovery check on range 10.127.14.3 - 10.127.10.3



'show queues':
Node discovery poller            : 887

So is that 887 nodes waiting to be added?


troffasky

Server.QueueSize.Current(NodeDiscoveryPoller) history over last 30 days is always in range 800-900. Does this indicate a resource issue on the server or do I need to increase the number of pollers somewhere?

Victor Kirhenshtein

Hi!

This is queue for processing discovered addresses. If you have discovery filters some (or most) of them could be filtered out and then re-discovered again and again. This can create permanently non-empty discovery queue, but this is fine.

Best regards,
Victor

troffasky

#7
Address filter is empty, just filtering on SNMP/SSH/agent protocol.
Following hints in this thread
https://www.netxms.org/forum/general-support/discovery-through-proxy/15/
I increased ThreadPool.Discovery.BaseSize from 1 to 4 and enabled parallel processing, then restarted the service.
The expected node was then discovered by the next day [as with the other system, log level is such that they were rotated away so I don't know exactly when].

I am going to assume that other things I wasn't specifically looking for have been discovered as well.