Cluster mode - Database is already locked by another NetXMS server instance

Started by Hybo, November 21, 2012, 12:50:07 PM

Previous topic - Next topic

Hybo

Hi,

I have a problem with my NetXMS high availability solution. I have two nodes called server1 (172.16.103.16) and server2 (172.16.103.17) with pacemaker, corosync and DRBD. Database engine is PostgreSQL, with database on shared disk.

When I destroy server1 (such as switching power off ), then the NetXMS on server2 does not start and writes into the log:

[20-Nov-2012 17:49:16] Log file opened
[20-Nov-2012 17:49:16] Platform subagent "/opt/NetXMS/lib/libnsm_linux.so" successfully loaded
[20-Nov-2012 17:49:16] Database driver "/opt/NetXMS/lib/libnxddr_pgsql.so" loaded and initialized successfully
[20-Nov-2012 17:49:16] Database is already locked by another NetXMS server instance (IP address: 172.16.103.16, machine info: server1 Linux Release 2.6.34.7-0.7-xen)


database contents:

ServerID                          | 451B2B5D9290AB50                                    |          0 |                   1
DBLockPID                         | 8375                                                |          0 |                   0
DBLockInfo                        | server1 Linux Release 2.6.34.7-0.7-xen            |          0 |                   0
DBLockStatus                      | 172.16.103.16                                       |          0 |                   1


Is there any solution other than "nxdbmgr check" before starting NetXMS on server2? Because this may cause some problem in my HA solution.

Thanks,
Hybo

Victor Kirhenshtein

Hi!

Currently there are no other solution then to run nxdbmgr before starting netxmsd. You can simplify it by adding -e command line argument to netxmsd - then netxmsd will call nxdbmgr on startup. I've created a feature request to address this problem more correctly: https://www.radensolutions.com/chiliproject/issues/185. I will add additional command line argument to specify peer IP address - then if database locked by peer, server will check if it's alive, and if not, will automatically remove the lock.

Best regards,
Victor