NetXMS Support Forum

English Support => General Support => Topic started by: Hybo on November 21, 2012, 12:50:07 PM

Title: Cluster mode - Database is already locked by another NetXMS server instance
Post by: Hybo on November 21, 2012, 12:50:07 PM
Hi,

I have a problem with my NetXMS high availability solution. I have two nodes called server1 (172.16.103.16) and server2 (172.16.103.17) with pacemaker, corosync and DRBD. Database engine is PostgreSQL, with database on shared disk.

When I destroy server1 (such as switching power off ), then the NetXMS on server2 does not start and writes into the log:

[20-Nov-2012 17:49:16] Log file opened
[20-Nov-2012 17:49:16] Platform subagent "/opt/NetXMS/lib/libnsm_linux.so" successfully loaded
[20-Nov-2012 17:49:16] Database driver "/opt/NetXMS/lib/libnxddr_pgsql.so" loaded and initialized successfully
[20-Nov-2012 17:49:16] Database is already locked by another NetXMS server instance (IP address: 172.16.103.16, machine info: server1 Linux Release 2.6.34.7-0.7-xen)


database contents:

ServerID                          | 451B2B5D9290AB50                                    |          0 |                   1
DBLockPID                         | 8375                                                |          0 |                   0
DBLockInfo                        | server1 Linux Release 2.6.34.7-0.7-xen            |          0 |                   0
DBLockStatus                      | 172.16.103.16                                       |          0 |                   1


Is there any solution other than "nxdbmgr check" before starting NetXMS on server2? Because this may cause some problem in my HA solution.

Thanks,
Hybo
Title: Re: Cluster mode - Database is already locked by another NetXMS server instance
Post by: Victor Kirhenshtein on November 21, 2012, 04:02:37 PM
Hi!

Currently there are no other solution then to run nxdbmgr before starting netxmsd. You can simplify it by adding -e command line argument to netxmsd - then netxmsd will call nxdbmgr on startup. I've created a feature request to address this problem more correctly: https://www.radensolutions.com/chiliproject/issues/185 (https://www.radensolutions.com/chiliproject/issues/185). I will add additional command line argument to specify peer IP address - then if database locked by peer, server will check if it's alive, and if not, will automatically remove the lock.

Best regards,
Victor