Will nodes be rediscovered by auto discovery?

Started by lweidig, July 13, 2017, 05:29:01 AM

Previous topic - Next topic

lweidig

I had a number of nodes that I had significantly changed the templates for and thought it would be a great idea to delete the nodes and let them get discovered again.  I has been past an activediscoveryinterval (900s) and they are not coming back.  Wondering if there is something I need to do to get them to come back or if I will just need to add them all manually.  This is on a 2.1 server.

Tursiops

Hi,

I'm seeing similar behaviour. As we are using a lot of proxies and zones, we have to rely on passive rather than active discovery.
It looks like it only discovers once and then never again (server restarts do not have any effect).

If I remove something that was discovered, it does not come back later.
If I add new devices to the network or install an Agent on some Workstation, they are not discovered, even though they do show in ARP tables of switches and have SNMP and or NetXMS Agents installed and match the filter condition.

Cheers

Victor Kirhenshtein

Hi,

deleted nodes should be rediscovered unless there is something stuck in the system. Could you try to login with user "system" and check if these nodes still present somewhere, or use nxadm -c "show objects" to list all objects in the system and check?

Best regards,
Victor

lweidig

There is nothing for these nodes when logged in under system or using nxadm -c "show objects".  I have also as the other person tried restarting the server to see if they will get automatically discovered again and no luck with that.

Tursiops

Hi,

I wiped a number of discovered nodes from the system last night, ran hkrun and checked show objects before and after.
The nodes were definitely no longer in the list, but have not been rediscovered yet (~10 hours later, with passive discovery meant to run every 15 minutes).

Discovery also doesn't seem to pick up network changes. It really behaves like it only runs once. Is there some flag in the database that might be stuck?

Cheers

Victor Kirhenshtein

Hi,

very weird. Please check server queues and try to run server with debug level 6 fr some time and check messages with prefix DiscoveryPoller.

Best regards,
Victor

Tursiops

Hi,

The logs show a lot of "potential node x.x.x.x rejected (IP address already queued for polling)".
When I check for existing objects, I either can't find them or they are in a different zone to the node that's used for discovery.

As it mentioned the "queued for polling", I had a look at the queues and Node Poller is at 40k+. Looks like that's our problem.
This value rarely decreases by 1, but otherwise just keeps increasing. Quite possible that 40k is simply the figure of all IPs across our networks which NetXMS discovered and wants to check.
I am not quite sure which poller figure to increase for this in the server config. Status? Discovery? Is there a NumberOfNodePollers configuration item?

Cheers

lweidig

For us all of the queues are empty:

netxmsd: show queues
Data collector                   : 0
DCI cache loader                 : 0
Database writer                  : 0
Database writer (IData)          : 0
Database writer (raw DCI values) : 0
Event processor                  : 0
Node poller                      : 0
Syslog processing                : 0
Syslog writer                    : 0


Here is all of the debug output for one of the nodes not being rediscovered:


[25-Jul-2017 07:43:35.010] [DEBUG] DiscoveryPoller(): checking potential node 10.0.140.1 at shf00-pdu-00:1
[25-Jul-2017 07:43:35.011] [DEBUG] DiscoveryPoller(): new node queued: 10.0.140.1/22
[25-Jul-2017 07:43:35.011] [DEBUG] NodePoller: processing node 10.0.140.1/22 in zone 0
[25-Jul-2017 07:43:35.011] [DEBUG] GetOldNodeWithNewIP: ip=10.0.140.1 mac=E4:8D:8C:25:49:78
[25-Jul-2017 07:43:35.012] [DEBUG] AcceptNewNode(10.0.140.1): auto filter, flags=0004
[25-Jul-2017 07:43:35.012] [DEBUG] AcceptNewNode(10.0.140.1): auto filter - checking range
[25-Jul-2017 07:43:35.012] [DEBUG] AcceptNewNode(10.0.140.1): auto filter - range check result is 0
[25-Jul-2017 07:43:35.124] [DEBUG] DiscoveryPoller(): checking potential node 10.0.140.1 at shf00-pdu-00:1
[25-Jul-2017 07:43:35.128] [DEBUG] DiscoveryPoller(): new node queued: 10.0.140.1/22
[25-Jul-2017 07:43:35.128] [DEBUG] NodePoller: processing node 10.0.140.1/22 in zone 0
[25-Jul-2017 07:43:35.128] [DEBUG] GetOldNodeWithNewIP: ip=10.0.140.1 mac=00:00:00:00:00:00
[25-Jul-2017 07:43:35.129] [DEBUG] AcceptNewNode(10.0.140.1): auto filter, flags=0004
[25-Jul-2017 07:43:35.129] [DEBUG] AcceptNewNode(10.0.140.1): auto filter - checking range
[25-Jul-2017 07:43:35.132] [DEBUG] AcceptNewNode(10.0.140.1): auto filter - range check result is 0


But still the node is never getting added.  Also, both the Active Discovery Targets and Address Filters sections of the Network Discovery Configuration contain the address range for this IP.

Victor Kirhenshtein

Hi,

from the log it seems that address range filter is not passed. Could you show how address filter is configured?

Best regards,
Victor

lweidig

It is an address range of 10.0.140.1 - 10.0.140.30.  Maybe it is not catching the end IP's properly?

Actually, that is exactly it.  Changed my address range to 10.0.104.0 - 10.0.140.30 and it detected the router.  Either it needs fixing or we need to adjust all of our ranges?  Can you also tell if the upper end of the range has the same issue?

Victor Kirhenshtein

It's a bug in a server - first address of the range always ignored. I just fixed it in development branch (fix will be included into 2.1.1 patch release).

Best regards,
Victor

lweidig


Tursiops

Hi,

I believe I found the source of my problem as well.
A switch which was sending syslog data to a proxy node which was incorrectly configured with Zone ID 0 (default).
For every syslog message received, the logs showed that NetXMS was adding the same IP over and over again to the poller queue (I am not sure why it failed to detect that this IP was already in the queue in this particular instance?)
That node also just happened to be generating a syslog message every few seconds due to another monitoring tool setup and controller by a third party trying to connect to the switch using "public" (and failing, thus generating a log entry).

I fixed the Zone ID and now NetXMS can properly link the incoming messages to the node and our queue is in single digits now. :)

Still leaves the question how the IP could be added to the poller queue over and over?  ???

Cheers

Victor Kirhenshtein

Hi,

there was a bug in server - check for already queued address was not performed for addresses discovered from syslog messages or SNMP traps. Just fixed it in development branch, fix will be included in 2.1.1.

Best regards,
Victor