[SOLVED] Active Discovery Issue

Started by lweidig, August 02, 2021, 04:41:54 PM

Previous topic - Next topic

lweidig

We are having some problems with active discovery and looked at messages that might help.  Running debug level 6 I see the nodes we are hoping to get discovered showing IP address already queued for polling.  From the server console and show queues I see the Node discovery poller with 6,370 entries and growing!  It does go down 1 or 2 once in a while, but for the most part it is going up.  Server is not loaded.   Can this be cleared and start again?  We are running the latest 3.9.156 but the issue was happening prior to that.

Filipp Sudanov

#1
I believe that queue would clear if you restart the server.

In debug console
show threads

might also get some information - what's the load for Discovery and Pollers?

You can try turning on
NetworkDiscovery.EnableParallelProcessing
NetworkDiscovery.MergeDuplicateNodes
this would enable parallel discovery process.
Increasing of ThreadPool.Discovery.MaxSize also might help if parallel processing is enabled.





lweidig

#2
Filipp,

Thanks for the ideas, we definitely did not have enough threads setup for discovery and restarting the server did indeed clear that queue, but it just shot back up after restart.  We have now been able to get the queue to a point where it sits near 0 but now see the issue that started us down this path.  We KNOW there is a device (well 6+ of them) that needs to be discovered but is not getting discovered.  Just going to focus on one though.  So we set debug to 6 and started an active scan of that subnet.  We end up getting:

2021.08.03 07:35:44.589 *D* [poll.discovery     ] Active discovery - node 10.0.154.19 responded to ICMP ping
2021.08.03 07:35:44.589 *D* [poll.discovery     ] Checking address 10.0.154.19 in zone 0 (source: Active Discovery)
2021.08.03 07:35:44.589 *D* [poll.discovery     ] Potential node 10.0.154.19 rejected (IP address already known at node 10.0.154.19 [29153])


From the console search by IP for that IP and it only shows that it is connected to the switch and port where it really is, but there is not a node that shows this information in the management console.  However from the server console if I run the command show objects 10.0.154.19 I can see the following:

Object ID 29153 "10.0.154.19"
    Class=Node Status=CRITICAL IsModified=0 IsDeleted=0
    Parents: <>
    Children: <>
    Last Change.....: 03.Aug.2021 07:56:29
    State flags........: 0x00000001
    Primary IP........: 10.0.154.19
    Primary Hostname...: ups.mydomain.com
    Capabilities.......: isSNMP=0 isAgent=0 isLocalMgmt=0
    SNMP ObjectId..:
    ICMP Polling......: ON
    ICMP statistics (PRI):
        RTT last....: 57 ms
        RTT min....: 56 ms
        RTT max....: 62 ms
        RTT average: 57 ms
        Packet loss...: 0


Providing this was a PAIN seeing that the Server Console window does not appear to allow copying, so hand typed it all.  So system thinks it is there, but it is NOWHERE to be found.


lweidig

The "fix" was to write a small NXSL script to delete these nodes at which point they were rediscovered and added properly.