NetXMS Support Forum

English Support => General Support => Topic started by: Woody on December 31, 2019, 04:50:00 PM

Title: duplicate nodes problem | can't save Nodes
Post by: Woody on December 31, 2019, 04:50:00 PM
Hi there,
I moved NetXMS from an old server to two new servers. I made the move with the nxdbmgr migrate command and with the help of these (https://wiki.netxms.org/wiki/How_to_migrate_to_another_database) instructions. Everything worked successfully, only NetXMS always crashes after some time (approx. 30 - 60 minutes, but sometimes longer). Then I can't log in to NetXMS anymore and I was stuck at Objekte synchronisieren until the timeout.

(https://i.ibb.co/5x1D6YL/Screenshot-2019-12-31-Net-XMS-Management-Console.png)

(https://i.ibb.co/GTG9qkG/Screenshot-2019-12-31-Net-XMS-Management-Console-1.png)

But if you are already logged in at the time of the crash, you do not get kicked, but stay logged in.

The following error messages can then be seen in the NetXMS log file:


2019.12.31 12:48:36.346 *E* [                   ] Thread "Poll Manager" does not respond to watchdog thread
2019.12.31 12:48:56.347 *E* [                   ] Thread "Syncer Thread" does not respond to watchdog thread


Informations about my configuration:

There is a NetXMS server and a database server. In the /etc/netxmsd.conf the database server was entered at DBServer=. Previously (with the old server) the NetXMS server and the database were on the same machine.

Old server:

New server:
NetXMS-Server:

NetXMS configuration file at /etc/netxmsd.conf:

## Logging
# Log file name
LogFile=/var/log/netxmsd

# Increase logging verbosity, 0 (only errors) to 9 (verbose debug)
DebugLevel=7

## Database configuration.
## Uncomment and setup ONE section.

## Option #1 - SQLite (for test installations only):
#DBDriver=sqlite.ddr
#DBName=/var/lib/netxms/netxms.db

## Option #2 - PostgreSQL (recommended):
#DBDriver=pgsql.ddr
#DBServer=127.0.0.1
#DBName=netxms
#DBLogin=netxms
#DBPassword=netxms

## Option #3 - MySQL:
DBDriver=mysql.ddr
DBServer=10.10.11.20
DBName=netxms
DBLogin=******
DBPassword=********************************

## Option #4 - Oracle:
#DBDriver=oracle.ddr
#DBServer=//127.0.0.1:1521/ORCL # Instant Client connection string or SID
#DBLogin=netxms
#DBPassword=netxms

## Option #5 - unixODBC/FreeTDS:
#DBDriver=odbc.ddr
#DBServer=NETXMS_DSN
#DBLogin=netxms
#DBPassword=netxms


MySQL-Server:

I hope someone knows what the problem is.
Thanks in advance!
Title: Re: NetXMS crashes | Thread "Poll Manager" does not respond to watchdog thread
Post by: Filipp Sudanov on January 01, 2020, 03:53:55 AM
If you check running processes on netxms server, is netxmsd present in the list of processes on the moment it hangs? I mean, does it actually creash and terminate, or it hangs?
Title: Re: NetXMS crashes | Thread "Poll Manager" does not respond to watchdog thread
Post by: Woody on January 01, 2020, 02:52:03 PM
Yes, it is present in the list of processes on the moment it hangs.


# ps -A | grep netx
21769 ?        08:01:05 netxmsd



# service netxmsd status
● netxmsd.service - NetXMS Server
   Loaded: loaded (/lib/systemd/system/netxmsd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-12-31 11:59:29 CET; 1 day 1h ago
  Process: 21762 ExecStart=/usr/bin/netxmsd -d (code=exited, status=0/SUCCESS)
Main PID: 21769 (netxmsd)
    Tasks: 498 (limit: 4915)
   CGroup: /system.slice/netxmsd.service
           └─21769 /usr/bin/netxmsd -d

Dez 31 11:59:29 netxms systemd[1]: Starting NetXMS Server...
Dez 31 11:59:29 netxms systemd[1]: netxmsd.service: Can't open PID file /var/run/netxmsd.pid (yet?) after start: No such file or directory
Dez 31 11:59:29 netxms systemd[1]: Started NetXMS Server.
Title: Re: NetXMS crashes | Thread "Poll Manager" does not respond to watchdog thread
Post by: Filipp Sudanov on January 01, 2020, 09:58:23 PM
Ok, let's try to get some debug information with this script: https://github.com/netxms/netxms/blob/master/tools/capture_netxmsd_threads.sh

In order for this to work you should have gdb and all relevant netxms-*-dbg packages installed. Is netxms installed from packages?

Script will produce output file in /tmp. Please attach it here.
Title: Re: NetXMS crashes | Thread "Poll Manager" does not respond to watchdog thread
Post by: Woody on January 01, 2020, 10:46:57 PM
Yes, NetXMS was installed from packages. I used this (https://www.netxms.org/documentation/adminguide/installation.html#installing-on-debian-or-ubuntu) guide to install NetXMS. Before I started the script, I installed gdb and all relevant NetXMS packages with these commands:
# apt install netxms-*-dbg
# apt install gdb

And after that:
# ./capture_netxmsd_threads.sh

At the moment I executed the script, NetXMS hangs.
In /tmp I found netxmsd-threads.21769.20200101-212952. I attached this file here.
Title: Re: NetXMS crashes | Thread "Poll Manager" does not respond to watchdog thread
Post by: Filipp Sudanov on January 02, 2020, 02:36:50 AM
I forgot to mention, that you should wait for netxms to hang first and then launch the script. So can you please wait for it to hang on it's own and then run the script.
Title: Re: NetXMS crashes | Thread "Poll Manager" does not respond to watchdog thread
Post by: Woody on January 02, 2020, 12:38:01 PM
In my post before I executed the script while NetXMS did hang. I didn't restart the netxmsd service since the crash I mentioned in my first post. 2019.12.31 12:48:36.346 *E* [                   ] Thread "Poll Manager" does not respond to watchdog thread
2019.12.31 12:48:56.347 *E* [                   ] Thread "Syncer Thread" does not respond to watchdog thread

So NetXMS still hangs.

I don't know if this can help, but I did it:
Now I restarted the netxmsd service and waited for netxms to hang. Than I launched the script. I have attached this file here.
Log file:

2020.01.02 10:16:52.647 *E* [                   ] Thread "Poll Manager" does not respond to watchdog thread
2020.01.02 10:17:12.647 *E* [                   ] Thread "Syncer Thread" does not respond to watchdog thread


I also attached the log file here. As you can see, i lauched the script 3 minutes after NetXMS hangs. Don't be confused about the time in the filename. It's because of the german time shift of one hour.
Title: Re: NetXMS crashes | Thread "Poll Manager" does not respond to watchdog thread
Post by: Woody on January 05, 2020, 02:37:52 PM
When NetXMS hangs it doesn't save any changes that I make.
Title: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: Woody on January 07, 2020, 12:23:37 AM
I have found out, that there is a diffrence between a full restart and when I only restart the netxmsd service.
For example when I make some changes and do service netxmsd restart my changes are there.
But when I do reboot my changes are gone.
And some changes are removed after some reboots and that is a very big problem.
When I do # nxdbmgr check after a reboot I get this error: Container 9190 contains non-existing child 79719. Fix it? (Yes/No/All/Skip) yes
Now my saved Nodes are away.
Title: Re: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: Victor Kirhenshtein on January 07, 2020, 12:27:02 PM
Yesterday we have fixed bug that cause deadlock on object access. It can be root cause for your issue as well. We will publish new patch release for 3.1 today - please check if it will help.

Best regards,
Victor
Title: Re: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: Woody on January 07, 2020, 04:26:58 PM
Hi,
thanks for fixing the bug. I will try it as soon as I get the new patch.
Title: Re: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: Woody on January 07, 2020, 07:20:05 PM
Could you please update packages?  :)
Title: Re: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: Victor Kirhenshtein on January 07, 2020, 07:26:09 PM
deb build is in progress, will be available within hour or so.

Best regards,
Victor
Title: Re: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: Woody on January 08, 2020, 08:11:53 PM
Hello,
thank you for the update. I think the bug with "Thread "Poll Manager" does not respond to watchdog thread" and  "Thread "Syncer Thread" does not respond to watchdog thread" has been fixed, because there was no error for more than 8 hours uptime now. But the other bug, that rainerh (https://www.netxms.org/forum/profile/?u=59258) also mentioned here (https://www.netxms.org/forum/configuration/cannot-delete-templates-group/), still exists. Here is a copy of his post:

Hello,
I want do delete 2 (old) Template Groups.
After reboot, the 2 groups are still available.
How can I delete theses groups permanent?

Thank you

I have a new problem and I think the reason is the same like above:
When I create a new node, the new node will work fine until I reboot the NetXMS Server.
After reboot the new created node is deletet.


I attached the netxmsd log file here.
Now I checked db writer queue:

# nxadm -i
netxmsd: show queues

I got this output:

netxmsd: show queues
Data collector                   : 459
DCI cache loader                 : 0
Template updates                 : 0
Database writer                  : 0
Database writer (IData)          : 0
Database writer (raw DCI values) : 10321
Event processor                  : 0
Event log writer                 : 0
Poller                           : 0
Node discovery poller            : 0
Syslog processing                : 0
Syslog writer                    : 0
Scheduler                        : 0


When I execute this command multiple times I noticed that this value goes to approximately 12.000 - 20.000 and than quickly goes down to approx. 1000.


netxmsd: show queues
Data collector                   : 0
DCI cache loader                 : 0
Template updates                 : 0
Database writer                  : 0
Database writer (IData)          : 0
Database writer (raw DCI values) : 1263
Event processor                  : 0
Event log writer                 : 0
Poller                           : 0
Node discovery poller            : 0
Syslog processing                : 0
Syslog writer                    : 0
Scheduler                        : 0

netxmsd: show queues
Data collector                   : 0
DCI cache loader                 : 0
Template updates                 : 0
Database writer                  : 0
Database writer (IData)          : 73
Database writer (raw DCI values) : 14532
Event processor                  : 0
Event log writer                 : 0
Poller                           : 0
Node discovery poller            : 0
Syslog processing                : 0
Syslog writer                    : 0
Scheduler                        : 0
Title: Re: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: rainerh on January 08, 2020, 09:49:21 PM
Hello,

I found one of my problems.
When I configure ImportConfigurationOnStartup with "Only missing elements" (by default) and I delete Template "Windows" or "Generic UNIX" then it will come with next Boot, because these Templates belong to NetXMS and are not made by me.
I can change the value to "Never" and after reboot they will not come again.
Templates, which are made by me can be deleted at any time.

Thank you

But the problem, when I create a new node, the node will be deleted after 30 seconds, when I reboot NetXMS.
After reboot I can see the node in Management Console and after some seconds (about 30) the node is removed automatically.
Title: Re: NetXMS hangs | can't save Nodes
Post by: Woody on January 09, 2020, 05:01:53 PM
This shows how my nodes gets deleted:

1. I create a new Node
(https://i.ibb.co/rtSWjtR/Screenshot-2020-01-09-Net-XMS-admin-127-0-0-1.png) (https://ibb.co/LPWGFPX)

2. You see the node is there
(https://i.ibb.co/zFVhMP0/Screenshot-2020-01-09-Net-XMS-admin-127-0-0-1-1.png) (https://ibb.co/0Kyj7Yd)

3. Now I reboot the server
# reboot

4. For a short time the node is still there
(https://i.ibb.co/B6R6RWT/Screenshot-2020-01-09-Net-XMS-admin-127-0-0-1-2.png) (https://ibb.co/5jVjVps)

5. But after about 1 minute the node disappeares
(https://i.ibb.co/3fYDzHR/Screenshot-2020-01-09-Net-XMS-admin-127-0-0-1-3.png) (https://ibb.co/28YDNBZ)

I attached the log of this test here.
You can click on the images for better quality.

I hope someone knows what the problem is.
Thanks in advance!
Title: Re: NetXMS hangs | can't save Nodes | Thread "Poll Manager" does not respond
Post by: rainerh on January 09, 2020, 08:30:54 PM
Hello

I have found the reason, but cannot manage it.
I have 2 identical Networks 192.168.11.0/24
Router 1 at Customer Voelk has on interface X1 192.168.11.1/24
Router 2 at Customer Koelbl has on interface X5192.168.11.1/24
Both have same Adress and I cannot change it.

2020.01.09 17:43:56.175 *D* [poll.conf          ] Primary IP address 192.168.11.1 of node Voelk SonicWALL TZ-500 [1268] found on interface X5 of node Koelbl Service SonicWALL NSA 2650 [80201]
2020.01.09 17:43:56.175 *D* [poll.conf          ] Node Voelk SonicWALL TZ-500 [1268] is a duplicate of node Koelbl Service SonicWALL NSA 2650 [80201]
2020.01.09 17:43:56.175 *D* [poll.conf          ] Removing node Koelbl Service SonicWALL NSA 2650 [80201] as duplicate


How can I resolve this Problem?

Thank you
Rainer
Title: Re: duplicate nodes problem | can't save Nodes
Post by: Victor Kirhenshtein on January 12, 2020, 11:42:59 PM
Hi,

one option is to use zones and put those routers into different zones. Another option is to mark internal interfaces as "exclude from topology". Second option is easier if you are not interested in anything behind those interfaces.

Best regards,
Victor
Title: Re: duplicate nodes problem | can't save Nodes
Post by: rainerh on January 13, 2020, 09:18:07 PM
Hello Victor,

I have tried to "exclude from topolog" only 1 Router. But these did not work.
Then I built some zones. Now It seems to work pretty fine and no node will be deleted again.

Thank you very much
Rainer