After migration of postgre to timescale the daemon service stucks on start

Started by mibox, July 02, 2020, 02:48:01 PM

Previous topic - Next topic

mibox

Hello,

I am struggling to get 3.4.178 on Windows to work with PostgreSQL 12.1 + timescale running on Centos 8.1. 
I upgraded working system from 3.2 to 3.4.
It can work with "clasic" DB still.
Working db has been migrated to TimeScale type with no error.
Netxms 3.4.178 works if I direct it to use the "old" db.
With the migrated db the daemon service stucks on start.

Could you advise what to check, please.

Diagnostic log is attached.


Thanks
MI

Filipp Sudanov

1. The log file looks incomplete. Seems that you first started the server with low debug level, it ended with
2020.07.01 16:06:19.834 *E* [                   ] Error initializing component locks table

The you started with higher debug, but it did not end up with the same error. Or have the server crashed?

2. Please try to run
nxdbmgr check

What output does it produce?

mibox

Thanks for prompt reply.

nxdbmgr check - no errors.
Both clasic and TS are clean.
I attached the complete output generated by your script.

BR
MI 

Filipp Sudanov

Ok, please set DebugLevel = 7 in server configuration, start the server, wait until it crashes or exits and share the log file.


mibox

With TS db the server does not crash nor exits -  the service simply remains in "starting" state indefinitely.
CPU load it generates during start is almost zero, also postgres is idling. 
I'll attached the log file created with -D 7 likely tomorrow.

Thanks for your support.

BR
MI

mibox

The logs are attached.
_net* is the file I found very surprising.
It seems the service got started eventually - after over 8h.

Regards
Marek

mibox

Update:

Following https://docs.timescale.com/latest/getting-started/configuring
max_locks_per_transaction = 2 * num_chunks
I increased max_locks_per_transaction from 64 (default) to 256.

The server service starts in 4min, however produces at right after about 10k failed SQL queries and the number keeps growing (current count exids 180k).

BTW what value you suggest to set.

Client starts and seems to work.

Wanted to attache info file (2MB in size ) but getting "413 Request Entity Too Large" error.

Regards
MI

mibox


Filipp Sudanov

Did you initialize timescale DB with nxdbmgr init prior to importing data into it?
Can you set debuglevel = 9, run the server for 5-10 minutes and show the log?
Alternative approach would watch from the database side on what queries it's getting stuck.

mibox

I did run nxdbmgr init and selected timescale prior migrating from classic db.
Prior migration I also run nxdbmgr check to make sure the source is clean.
Migration took some 20min but overall went smooth without any error reported.
After migration nxdbmgr check does not complan also.

Attached is log file (.7z) captured with -D 9.   

Thanks
MI

Filipp Sudanov

Can you connect to postgres and show output of these queries:

EXPLAIN ANALYZE SELECT max(event_id) FROM event_log

EXPLAIN ANALYZE SELECT max(alarm_id) FROM alarms

Filipp Sudanov

As for max_locks_per_transaction - I would first check general postgres parameters, first of all probably shared_buffer.
Out of the box Postgres settings are very conservative.
See e.g. https://www.percona.com/blog/2018/08/31/tuning-postgresql-database-parameters-to-optimize-performance/

mibox

"EXPLAIN ANALYZE SELECT max(event_id) FROM event_log;"
Message: Successfully run. Total query runtime: 3 min 42 secs. 26641 rows affected.
Output = attached in 7z file

For the second query: "EXPLAIN ANALYZE SELECT max(alarm_id) FROM alarms;"
Message:  Successfully run. Total query runtime: 9 secs 232 msec. 3211 rows affected.
Output: attached

In 7z I also included running postgresql.conf file.
After applying timescaleDB extension I run their tuning script and accepted the suggested changes.
It means postgresql.conf contains values alterred by tune script + my mod = increased "max_locks_per_transaction".
The original clasic netxms DB also served by the engine works fine with the settings.

Regards
MI

mibox


Victor Kirhenshtein

Hi,

it looks like we setup hypertables in a wrong way, adding unnecessary space partitioning. This causes hypertable to have lots of chunks, so queries like SELECT max(event_id)  ... took lot of time, because each chunk should be queried. For net major release we already implemented conversion of log tables structure. You can wait for it (likely release time is September) or you can try to convert your event_log and alarms tables manually using the following queries:


ALTER TABLE alarms RENAME TO old_alarms;
CREATE TABLE alarms (
   alarm_id integer not null,
   parent_alarm_id integer not null,
   alarm_state integer not null,
   hd_state integer not null,
   hd_ref varchar(63) null,
   creation_time integer not null,
   last_change_time integer not null,
   rule_guid varchar(36) null,
   source_object_id integer not null,
   zone_uin integer not null,
   source_event_code integer not null,
   source_event_id bigint not null,
   dci_id integer not null,
   message varchar(2000) null,
   original_severity integer not null,
   current_severity integer not null,
   repeat_count integer not null,
   alarm_key varchar(255) null,
   ack_by integer not null,
   resolved_by integer not null,
   term_by integer not null,
   timeout integer not null,
   timeout_event integer not null,
   ack_timeout integer not null,
   alarm_category_ids varchar(255) null,
   event_tags varchar(2000) null,
   rca_script_name varchar(255) null,
   impact varchar(1000) null,
   PRIMARY KEY(alarm_id));
CREATE INDEX idx_alarms_source_object_id ON alarms(source_object_id);
CREATE INDEX idx_alarms_last_change_time ON alarms(last_change_time);
INSERT INTO alarms (alarm_id,parent_alarm_id,alarm_state,hd_state,hd_ref,creation_time,last_change_time,rule_guid,source_object_id,zone_uin,source_event_code,source_event_id,dci_id,message,original_severity,current_severity,repeat_count,alarm_key,ack_by,resolved_by,term_by,timeout,timeout_event,ack_timeout,alarm_category_ids,event_tags,rca_script_name,impact) SELECT alarm_id,parent_alarm_id,alarm_state,hd_state,hd_ref,creation_time,last_change_time,rule_guid,source_object_id,zone_uin,source_event_code,source_event_id,dci_id,message,original_severity,current_severity,repeat_count,alarm_key,ack_by,resolved_by,term_by,timeout,timeout_event,ack_timeout,alarm_category_ids,event_tags,rca_script_name,impact FROM old_alarms;
DROP TABLE old_alarms CASCADE;

ALTER TABLE event_log RENAME TO old_event_log;
DROP INDEX IF EXISTS idx_event_log_event_timestamp;
DROP INDEX IF EXISTS idx_event_log_source;
DROP INDEX IF EXISTS idx_event_log_root_id;
CREATE TABLE event_log (
   event_id bigint not null,
   event_code integer not null,
   event_timestamp integer not null,
   origin integer not null,
   origin_timestamp integer not null,
   event_source integer not null,
   zone_uin integer not null,
   dci_id integer not null,
   event_severity integer not null,
   event_message varchar(2000) null,
   event_tags varchar(2000) null,
   root_event_id bigint not null,
   raw_data text null,
   PRIMARY KEY(event_id,event_timestamp));
CREATE INDEX idx_event_log_event_timestamp ON event_log(event_timestamp);
CREATE INDEX idx_event_log_source ON event_log(event_source);
CREATE INDEX idx_event_log_root_id ON event_log(root_event_id) WHERE root_event_id > 0;
SELECT create_hypertable('event_log', 'event_timestamp', chunk_time_interval => 86400);
INSERT INTO event_log (event_id,event_code,event_timestamp,origin,origin_timestamp,event_source,zone_uin,dci_id,event_severity,event_message,event_tags,root_event_id,raw_data) SELECT event_id,event_code,event_timestamp,origin,origin_timestamp,event_source,zone_uin,dci_id,event_severity,event_message,event_tags,root_event_id,raw_data FROM old_event_log;
DROP TABLE old_event_log CASCADE;


Please note that data copy could take significant time. If you are not intended to keep existing event log you can just drop event_log table and create new one.

Don't forget to make database snapshot or backup before trying this!

Best regards,
Victor