NetXMS Support Forum

English Support => General Support => Topic started by: mibox on July 02, 2020, 02:48:01 PM

Title: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 02, 2020, 02:48:01 PM
Hello,

I am struggling to get 3.4.178 on Windows to work with PostgreSQL 12.1 + timescale running on Centos 8.1. 
I upgraded working system from 3.2 to 3.4.
It can work with "clasic" DB still.
Working db has been migrated to TimeScale type with no error.
Netxms 3.4.178 works if I direct it to use the "old" db.
With the migrated db the daemon service stucks on start.

Could you advise what to check, please.

Diagnostic log is attached.


Thanks
MI
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: Filipp Sudanov on July 02, 2020, 03:18:50 PM
1. The log file looks incomplete. Seems that you first started the server with low debug level, it ended with
2020.07.01 16:06:19.834 *E* [                   ] Error initializing component locks table

The you started with higher debug, but it did not end up with the same error. Or have the server crashed?

2. Please try to run
nxdbmgr check

What output does it produce?
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 02, 2020, 11:13:57 PM
Thanks for prompt reply.

nxdbmgr check - no errors.
Both clasic and TS are clean.
I attached the complete output generated by your script.

BR
MI 
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: Filipp Sudanov on July 03, 2020, 11:36:33 AM
Ok, please set DebugLevel = 7 in server configuration, start the server, wait until it crashes or exits and share the log file.

Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 03, 2020, 12:08:13 PM
With TS db the server does not crash nor exits -  the service simply remains in "starting" state indefinitely.
CPU load it generates during start is almost zero, also postgres is idling. 
I'll attached the log file created with -D 7 likely tomorrow.

Thanks for your support.

BR
MI
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 05, 2020, 09:55:11 AM
The logs are attached.
_net* is the file I found very surprising.
It seems the service got started eventually - after over 8h.

Regards
Marek
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 05, 2020, 03:38:35 PM
Update:

Following https://docs.timescale.com/latest/getting-started/configuring
max_locks_per_transaction = 2 * num_chunks
I increased max_locks_per_transaction from 64 (default) to 256.

The server service starts in 4min, however produces at right after about 10k failed SQL queries and the number keeps growing (current count exids 180k).

BTW what value you suggest to set.

Client starts and seems to work.

Wanted to attache info file (2MB in size ) but getting "413 Request Entity Too Large" error.

Regards
MI
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 05, 2020, 03:42:52 PM
so just the log file is attached
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: Filipp Sudanov on July 06, 2020, 07:50:34 PM
Did you initialize timescale DB with nxdbmgr init prior to importing data into it?
Can you set debuglevel = 9, run the server for 5-10 minutes and show the log?
Alternative approach would watch from the database side on what queries it's getting stuck.
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 07, 2020, 12:00:16 PM
I did run nxdbmgr init and selected timescale prior migrating from classic db.
Prior migration I also run nxdbmgr check to make sure the source is clean.
Migration took some 20min but overall went smooth without any error reported.
After migration nxdbmgr check does not complan also.

Attached is log file (.7z) captured with -D 9.   

Thanks
MI
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: Filipp Sudanov on July 07, 2020, 04:51:35 PM
Can you connect to postgres and show output of these queries:

EXPLAIN ANALYZE SELECT max(event_id) FROM event_log

EXPLAIN ANALYZE SELECT max(alarm_id) FROM alarms
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: Filipp Sudanov on July 07, 2020, 06:22:27 PM
As for max_locks_per_transaction - I would first check general postgres parameters, first of all probably shared_buffer.
Out of the box Postgres settings are very conservative.
See e.g. https://www.percona.com/blog/2018/08/31/tuning-postgresql-database-parameters-to-optimize-performance/
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 08, 2020, 04:57:42 PM
"EXPLAIN ANALYZE SELECT max(event_id) FROM event_log;"
Message: Successfully run. Total query runtime: 3 min 42 secs. 26641 rows affected.
Output = attached in 7z file

For the second query: "EXPLAIN ANALYZE SELECT max(alarm_id) FROM alarms;"
Message:  Successfully run. Total query runtime: 9 secs 232 msec. 3211 rows affected.
Output: attached

In 7z I also included running postgresql.conf file.
After applying timescaleDB extension I run their tuning script and accepted the suggested changes.
It means postgresql.conf contains values alterred by tune script + my mod = increased "max_locks_per_transaction".
The original clasic netxms DB also served by the engine works fine with the settings.

Regards
MI
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: mibox on July 15, 2020, 10:15:21 PM
Any tips what may be wrong with timescaleDB?

BR
MI
Title: Re: After migration of postgre to timescale the daemon service stucks on start
Post by: Victor Kirhenshtein on July 16, 2020, 11:44:43 AM
Hi,

it looks like we setup hypertables in a wrong way, adding unnecessary space partitioning. This causes hypertable to have lots of chunks, so queries like SELECT max(event_id)  ... took lot of time, because each chunk should be queried. For net major release we already implemented conversion of log tables structure. You can wait for it (likely release time is September) or you can try to convert your event_log and alarms tables manually using the following queries:


ALTER TABLE alarms RENAME TO old_alarms;
CREATE TABLE alarms (
   alarm_id integer not null,
   parent_alarm_id integer not null,
   alarm_state integer not null,
   hd_state integer not null,
   hd_ref varchar(63) null,
   creation_time integer not null,
   last_change_time integer not null,
   rule_guid varchar(36) null,
   source_object_id integer not null,
   zone_uin integer not null,
   source_event_code integer not null,
   source_event_id bigint not null,
   dci_id integer not null,
   message varchar(2000) null,
   original_severity integer not null,
   current_severity integer not null,
   repeat_count integer not null,
   alarm_key varchar(255) null,
   ack_by integer not null,
   resolved_by integer not null,
   term_by integer not null,
   timeout integer not null,
   timeout_event integer not null,
   ack_timeout integer not null,
   alarm_category_ids varchar(255) null,
   event_tags varchar(2000) null,
   rca_script_name varchar(255) null,
   impact varchar(1000) null,
   PRIMARY KEY(alarm_id));
CREATE INDEX idx_alarms_source_object_id ON alarms(source_object_id);
CREATE INDEX idx_alarms_last_change_time ON alarms(last_change_time);
INSERT INTO alarms (alarm_id,parent_alarm_id,alarm_state,hd_state,hd_ref,creation_time,last_change_time,rule_guid,source_object_id,zone_uin,source_event_code,source_event_id,dci_id,message,original_severity,current_severity,repeat_count,alarm_key,ack_by,resolved_by,term_by,timeout,timeout_event,ack_timeout,alarm_category_ids,event_tags,rca_script_name,impact) SELECT alarm_id,parent_alarm_id,alarm_state,hd_state,hd_ref,creation_time,last_change_time,rule_guid,source_object_id,zone_uin,source_event_code,source_event_id,dci_id,message,original_severity,current_severity,repeat_count,alarm_key,ack_by,resolved_by,term_by,timeout,timeout_event,ack_timeout,alarm_category_ids,event_tags,rca_script_name,impact FROM old_alarms;
DROP TABLE old_alarms CASCADE;

ALTER TABLE event_log RENAME TO old_event_log;
DROP INDEX IF EXISTS idx_event_log_event_timestamp;
DROP INDEX IF EXISTS idx_event_log_source;
DROP INDEX IF EXISTS idx_event_log_root_id;
CREATE TABLE event_log (
   event_id bigint not null,
   event_code integer not null,
   event_timestamp integer not null,
   origin integer not null,
   origin_timestamp integer not null,
   event_source integer not null,
   zone_uin integer not null,
   dci_id integer not null,
   event_severity integer not null,
   event_message varchar(2000) null,
   event_tags varchar(2000) null,
   root_event_id bigint not null,
   raw_data text null,
   PRIMARY KEY(event_id,event_timestamp));
CREATE INDEX idx_event_log_event_timestamp ON event_log(event_timestamp);
CREATE INDEX idx_event_log_source ON event_log(event_source);
CREATE INDEX idx_event_log_root_id ON event_log(root_event_id) WHERE root_event_id > 0;
SELECT create_hypertable('event_log', 'event_timestamp', chunk_time_interval => 86400);
INSERT INTO event_log (event_id,event_code,event_timestamp,origin,origin_timestamp,event_source,zone_uin,dci_id,event_severity,event_message,event_tags,root_event_id,raw_data) SELECT event_id,event_code,event_timestamp,origin,origin_timestamp,event_source,zone_uin,dci_id,event_severity,event_message,event_tags,root_event_id,raw_data FROM old_event_log;
DROP TABLE old_event_log CASCADE;


Please note that data copy could take significant time. If you are not intended to keep existing event log you can just drop event_log table and create new one.

Don't forget to make database snapshot or backup before trying this!

Best regards,
Victor