nxdbmgr - Question and Failure Mode

Started by zshnet, July 20, 2016, 05:24:40 PM

Previous topic - Next topic

zshnet

Hi all,

We had a very bad crash on our system, and luckily we could recover the database files.

I replaced the files in /var/lib/postgresql/version/main with all the old database files,  I could view the files in the database, and I could run the nxdbmgr check without any errors.

However, when I ran Netxms, it just sat, pouring out tons of select queries. I could not log in, receiving a "connection refused."

After a lot of troubleshooting, I discovered my error was similar to this:

ERROR: could not access status of transaction 4244329
DETAIL: could not open file "/var/lib/postgresql/9.3/main/pg_clog/0004"

I needed to fill the file with zeroes so that postgresql would decide that row was full of zeroes and continue on. Once I had solved all of those errors (I tried to pg_dump a specific table and it told me the files to fix) I was able to run NetXMS without issue.

Is there any way for nxdbmgr to find that sort of error? I don't know much about databases, but I figured it might help to have this error be visible.

Finally, I'm curious what the nxdbmgr means when it says a database is "Locked by the server." I could not find an answer through Googling, though it wasn't the most thorough search. Any suggestions?

Thanks,
Zach

Victor Kirhenshtein

Hi,

nxdbmgr can detect some logical errors in database (mostly inconsistent or missing data), but it does not check that every table can be read for example. What you have seems to be badly corrupted database files - this is beyond the scope of nxdbmgr checks.

Best regards,
Victor

Tatjana Dubrovica

"Locked by the server." means that there is mark that database is locked in NetXMS database in config table. This can be in two situations - if server was stopped incorrectly or if NetXMS server is currently running. This flag is used to prevent change of the database while server is running (this action can be unsafe).