v3 - SQL query failed - lock timeout & data loss

rgkordia · October 03, 2019, 12:57:25 AM

Upon arriving in the office this morning, I'm getting the following error repeating every ~50 seconds:

2019.10.03 08:54:42.739 *E* [db.driver ] SQL query failed (Query = "INSERT INTO idata_12102 (item_id,idata_timestamp,idata_value,raw_value) VALUES (64537,1570043031,'0','0')"): Lock wait timeout exceeded; try restarting transaction

Different values, but always table idata_12102. I'm using MariaDB (latest 10.1.41 on Windows) and I've run "SHOW OPEN TABLES WHERE in_use>0" which always shows the following:

Code Select


Database      Table            In_use        Name_locked
--------------------------------------------------------
netxms_db     idata_12102           2                  0

The server (build 2305) has been up around 18 hours or so. I attempted to shut down the server - it started to shut down but seemed to hang (and the lock message again continues to repeat every 50 seconds). After waiting around 5 minutes I decided to kill the server.

After the server was terminated, the in_use still remained at 2, then after a minute or so it reduced to 1. I waited around 5 minutes but the in_use count never reached 0.

I then shutdown the MariaDB server, which took around 8 minutes, during which the mysql process had high CPU and high disk IO, so I assume there was considerable uncommitted data that is either committing or rolling back. After restarting MariaDB, the in_use query returned no results. Admittedly I didn't check MySQL activity whilst I was waiting for NetXMSd to terminate, so possibly it was attempting to commit these transactions which I inadvertently interrupted.

I then restarted netxmsd and the in_use query returns no results (no active locks). From my graphs via the GUI console I can see that the past 2.5 hours of DCI values are missing, so I assume all those transactions were waiting on the lock to clear which never did and were rolled back. New DCI values are populating as normal.

After doing some investigation, I see from the windows event logs that this error relating to lock on idata_12102 started around the time where my data loss begins (8:08am). However, prior to that I see lock errors relating to different tables dating back to around 2am. I assume these lock issues eventually cleared.

Worth a mention is that since upgrading to v3 I had some errors about the DB connection pool being full so I increased DBConnectionPoolBaseSize to 30 and DBConnectionPoolMaxSize to 100. Although these errors still seem to persist looking at the logs.

Attached is an export of the windows event log for NetXMScore relating to the ~18 hours of runtime during which this lock event occurred.

Regards,
Richard

Victor Kirhenshtein · October 03, 2019, 01:30:56 PM

Hi,

is there some kind of deadlock logging in MariaDB? This can help if we can find out what transaction holds the lock. If it was not logged I think we have to wait for next occurrence and then if I understand correctly command SHOW ENGINE INNODB STATUS should show active transactions and their locks which can provide useful information.

Best regards,
Victor

rgkordia · October 03, 2019, 11:55:18 PM

Hi Victor. Thanks, I'll try doing that. I've turned on deadlock logging also, so hopefully I can capture something.

Regards,
Richard

NetXMS Support Forum

News:

v3 - SQL query failed - lock timeout & data loss

rgkordia

Victor Kirhenshtein

rgkordia