Show posts

#16

General Support / Re: Valgrind log for 3.1.261-2; server crash

December 12, 2019, 08:34:54 PM

I believe we were simply running out of memory overall. In this dev environment it's sharing space with postgresql. In the latest release, my best guess is that netxms behavior changed, resulting in more concurrent database queries. I dramatically lowered the number of database pools in use, and so far netxms has been stable with no reduction in performance. (I am monitoring threads via DCIs.)

#17

General Support / Re: Valgrind log for 3.1.261-2; server crash

December 12, 2019, 06:04:06 PM

Ran with 1000 threads. Exited. Output log contains only six lines, a splash message from Memcheck (as launched by valgrind).

#18

General Support / Netxms 3.x failing to reconnect to database

December 12, 2019, 06:00:55 PM

I'm running netxms 3.x with Postgresql 10 (timescaledb). Occasionally the database crashes and automatically restarts/recovers. However, netxmsd does not reconnect to the database. Are there any settings to tweak this? My understanding (reading old forum messages) is that netxms *should* reconnect, but it does not, and requires a full restart of netxmsd to reconnect to the db.

#19

General Support / Valgrind log for 3.1.261-2; server crash

December 11, 2019, 06:00:00 PM

Netxms server consuming RAM and crashing. Valgrind log attached.

Options:
valgrind --log-file=/home/jaustin/vg.log --leak-check=full --undef-value-errors=no netxmsd -D3

#20

General Support / Re: netxmsd 3.1 crashing on startup

December 04, 2019, 06:53:41 PM

2019.12.04 07:52:28.425 *D* [ ] Started topology poll for node <snip> [12506]

Thread 495 "$POLLERS/WRK" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff7258a700 (LWP 25764)]
Node::reconcileWithDuplicateNode (this=this@entry=0x7fffb038a000, node=0x7fffa56ed000) at node.cpp:3512
3512 node.cpp: No such file or directory.
(gdb) bt full
#0 Node::reconcileWithDuplicateNode (this=this@entry=0x7fffb038a000, node=0x7fffa56ed000) at node.cpp:3512
i = 6
#1 0x00007ffff79d73c4 in Node::configurationPoll (this=0x7fffb038a000, poller=0x7fff7968bf80, session=0x0, rqId=0) at node.cpp:3230
duplicateNode = 0x7fffa56ed000
reason = L"Primary IP address 23.235.105.154 of node js1.northpointe01.mxu.acsalaska.net [43621] found on interface vlan.1997 of node js1.northpointe.cpe.lec.mgt [45588]", '\000' <repeats 152 times>...
dcr = <optimized out>
hypervisorType = L'\000' <repeats 31 times>
hypervisorInfo = L"Primary IP address 23.235.105.154 of node js1.northpointe01.mxu.acsalaska.net [43621] found on interface vlan.1997 of node js1.northpointe.cpe.lec.mgt [45588]", '\000' <repeats 97 times>
type = <optimized out>
oldCapabilities = <optimized out>
szBuffer = L'\000' <repeats 3494 times>...
modified = 0
__pollStartTime = 1575478327685
__pollState = 0x7fffb038a768
#2 0x00007ffff796e3d2 in DataCollectionTarget::configurationPollWorkerEntry (this=0x7fffb038a000, poller=0x7fff7968bf80, session=0x0, rqId=0)
at dctarget.cpp:1560
No locals.
#3 0x00007ffff793f621 in __ThreadPoolExecute_Wrapper_1<DataCollectionTarget, PollerInfo*> (arg=0x7fff79691420)
at ../../../include/nms_threads.h:1346
wd = 0x7fff79691420
#4 0x00007ffff698d337 in ProcessSerializedRequests (data=0x7fff796913c0) at tp.cpp:466
rq = 0x7fff79691440
#5 0x00007ffff698d10e in WorkerThread (arg=0x7fff7a780390) at tp.cpp:186
rq = 0x7fff796913e0
waitTime = 256899072
p = 0x7fff7dc00000
q = 0x7fff7dc06000
threadName = "$POLLERS/WRK\000\000\000"
#6 0x00007ffff4ec04a4 in start_thread (arg=0x7fff7258a700) at pthread_create.c:456
__res = <optimized out>
pd = 0x7fff7258a700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735111800576, 7851966113022693045, 140736350368030, 140736350368031, 140735111536640, 3,
-7851694838165582155, -7851990439467594059}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0,
cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
---Type <return> to continue, or q <return> to quit---
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#7 0x00007ffff3a15d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.
(gdb)

#21

General Support / Re: netxmsd 3.1 crashing on startup

December 04, 2019, 06:36:01 PM

From packages;

netxms-agent/stretch,now 3.1.242-2 amd64 [installed]
netxms-agent-java/stretch,now 3.1.242-2 amd64 [installed,automatic]
netxms-base/stretch,now 3.1.242-2 amd64 [installed,automatic]
netxms-client/stretch,now 3.1.242-2 amd64 [installed]
netxms-console/now 3.0.2258-1 amd64 [installed,local]
netxms-dbdrv-pgsql/stretch,now 3.1.242-2 amd64 [installed]
netxms-dbdrv-sqlite3/stretch,now 3.1.242-2 amd64 [installed,automatic]
netxms-release/now 1.5 all [installed,local]
netxms-server/stretch,now 3.1.242-2 amd64 [installed]

Segfaults whenever merge detection is enabled.

bt full:

Thread 523 "$POLLERS/WRK" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff7282e700 (LWP 19692)]
0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
(gdb) bt full
#0 0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#1 0x00007ffff79d73c4 in Node::configurationPoll(PollerInfo*, ClientSession*, unsigned int) ()
from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#2 0x00007ffff796e3d2 in DataCollectionTarget::configurationPollWorkerEntry(PollerInfo*, ClientSession*, unsigned int) ()
from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#3 0x00007ffff793f621 in ?? () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
No symbol table info available.
#4 0x00007ffff698d337 in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
No symbol table info available.
#5 0x00007ffff698d10e in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
No symbol table info available.
#6 0x00007ffff4ec04a4 in start_thread (arg=0x7fff7282e700) at pthread_create.c:456
__res = <optimized out>
pd = 0x7fff7282e700
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140735114569472, 6749483827744397886, 140736350368030, 140736350368031,
140735114305536, 3, -6749736120991163842, -6749468538011329986}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0,
0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
pagesize_m1 = <optimized out>
sp = <optimized out>
freesize = <optimized out>
__PRETTY_FUNCTION__ = "start_thread"
#7 0x00007ffff3a15d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
No locals.
(gdb)

#22

General Support / netxmsd 3.1 crashing on startup

December 04, 2019, 05:59:16 PM

I'm getting a segfault on startup:

<snip>
DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.7.{instance} [242469]): instance "548" removed by filtering script
2019.12.04 06:56:55.606 *D* [ ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.8.{instance} [243303]): instance "507" name set to "0"
2019.12.04 06:56:55.606 *D* [ ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.8.{instance} [243303]): instance "507" removed by filtering script
2019.12.04 06:56:55.609 *D* [ ] DataCollectionTarget::doInstanceDiscovery(js2.jber7079.mxu.acsalaska.net [18199]): read 25 values
2019.12.04 06:56:55.610 *D* [ ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.1.{instance} [243281]): instance "514" name set to "

"
2019.12.04 06:56:55.610 *D* [ ] DCObject::filterInstanceList(.1.3.6.1.4.1.2636.3.60.1.1.1.1.1.{instance} [243281]): instance "514" removed by filtering script

Thread 372 "$POLLERS/WRK" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff74e2e700 (LWP 6249)]
0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
(gdb) bt
#0 0x00007ffff79bfbfe in Node::reconcileWithDuplicateNode(Node*) () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#1 0x00007ffff79d73c4 in Node::configurationPoll(PollerInfo*, ClientSession*, unsigned int) ()
from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#2 0x00007ffff796e3d2 in DataCollectionTarget::configurationPollWorkerEntry(PollerInfo*, ClientSession*, unsigned int) ()
from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#3 0x00007ffff793f621 in ?? () from /usr/lib/x86_64-linux-gnu/libnxcore.so.31
#4 0x00007ffff698d337 in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
#5 0x00007ffff698d10e in ?? () from /usr/lib/x86_64-linux-gnu/libnetxms.so.31
#6 0x00007ffff4ec04a4 in start_thread (arg=0x7fff74e2e700) at pthread_create.c:456
#7 0x00007ffff3a15d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

#23

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

October 03, 2019, 12:16:20 AM

I believe I've tracked the pool exhaustion down. It's DCIs that are querying idata, which we've already shown is slow. I can disable them temporarily, but they are ultimately essential.

netxms@netxmsts STATEMENT: SELECT max(idata_value::double precision) FROM idata WHERE item_id=$1 AND idata_timestamp BETWEEN $2 AND $3 AND idata_value~E'^\\d+(\\.\\d+)*$'
2019-10-02 13:13:24.901 AKDT [28860] netxms@netxmsts ERROR: out of shared memory
2019-10-02 13:13:24.901 AKDT [28860] netxms@netxmsts HINT: You might need to increase max_locks_per_transaction.
2019-10-02 13:13:25.043 AKDT [28099] netxms@netxmsts LOG: duration: 10728.421 ms bind <unnamed>: SELECT max(idata_value::double precision) FROM idata WHERE item_id=$1 AND idata_timestamp BETWEEN $2 AND $3 AND idata_value~E'^\\d+(\\.\\d+)*$'
2019-10-02 13:13:25.043 AKDT [28099] netxms@netxmsts DETAIL: parameters: $1 = '219502', $2 = '1570049993', $3 = '1570050293'
2019-10-02 13:13:26.104 AKDT [28108] netxms@netxmsts LOG: duration: 10927.128 ms bind <unnamed>: SELECT min(idata_value::double precision) FROM idata WHERE item_id=$1 AND idata_timestamp BETWEEN $2 AND $3 AND idata_value~E'^\\d+(\\.\\d+)*$'
2019-10-02 13:13:26.104 AKDT [28108] netxms@netxmsts DETAIL: parameters: $1 = '219506', $2 = '1570049993', $3 = '1570050293'
2019-10-02 13:14:31.489 AKDT [28107] netxms@netxmsts LOG: duration: 144850.468 ms statement: SELECT idata_value,idata_timestamp FROM idata WHERE item_id=222029 ORDER BY idata_timestamp DESC LIMIT 1

#24

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

October 03, 2019, 12:10:26 AM

And while we're at it, there are some bogus queries in the query log:
2019-10-02 13:07:00.206 AKDT [28100] netxms@netxmsts ERROR: cannot insert into view "tdata"
2019-10-02 13:07:00.206 AKDT [28100] netxms@netxmsts DETAIL: Views containing UNION, INTERSECT, or EXCEPT are not automatically updatable.
2019-10-02 13:07:00.206 AKDT [28100] netxms@netxmsts HINT: To enable inserting into the view, provide an INSTEAD OF INSERT trigger or an unconditional ON INSERT DO INSTEAD rule.
2019-10-02 13:07:00.206 AKDT [28100] netxms@netxmsts STATEMENT: INSERT INTO tdata (item_id,tdata_timestamp,tdata_value) VALUES ($1,$2,$3)
2019-10-02 13:07:25.950 AKDT [28102] netxms@netxmsts ERROR: cannot insert into view "tdata"
2019-10-02 13:07:25.950 AKDT [28102] netxms@netxmsts DETAIL: Views containing UNION, INTERSECT, or EXCEPT are not automatically updatable.
2019-10-02 13:07:25.950 AKDT [28102] netxms@netxmsts HINT: To enable inserting into the view, provide an INSTEAD OF INSERT trigger or an unconditional ON INSERT DO INSTEAD rule.
2019-10-02 13:07:25.950 AKDT [28102] netxms@netxmsts STATEMENT: INSERT INTO tdata (item_id,tdata_timestamp,tdata_value) VALUES ($1,$2,$3)

#25

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

October 03, 2019, 12:07:50 AM

FWIW, this may be related to what appears to be a DOS re: the database itself in netxms:

2019.10.02 13:06:24.280 *D* [db.cpool ] Database connection pool exhausted (call from dcitem.cpp:1
511) 2019.10.02 13:06:24.280 *D* [db.cpool ] Database connection pool exhausted (call from devdb.cpp:30
)

Pool minimum is 20.

#26

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

October 02, 2019, 05:44:44 PM

Here are full EXPLAIN queries.

#27

General Support / 3.0.2284 and TimescaleDB view 'idata'

October 02, 2019, 01:54:47 AM

Upgraded to 3.0.2284 and ran database upgrade script. DCIs migrated successfully.

Database performance seems much worse:

2019.10.01 14:46:11.901 *E* [db.driver ] SQL query failed (Query = "SELECT idata_value,idata_timestamp FROM idata WHERE item_id=233275 ORDER BY idata_timestamp DESC LIMIT 1"): Internal error (call to PQsendQuery failed)
2019.10.01 14:46:11.901 *E* [db.driver ] SQL query failed (Query = "SELECT idata_value,idata_timestamp FROM idata WHERE item_id=233276 ORDER BY idata_timestamp DESC LIMIT 1"): Internal error (call to PQsendQuery failed)

Graphs load slowly, etc.

Running 'explain select' on the above failed queries result in:

Limit (cost=2080.96..2082.61 rows=1 width=14) (actual time=19.277..20.802 rows=1 loops=1)
-> Merge Append (cost=2080.96..31743.08 rows=18037 width=14) (actual time=19.128..19.128 rows=1 loops=1)
Sort Key: _hyper_7_5752_chunk.idata_timestamp DESC
-> Index Scan Backward using "5752_5752_idata_sc_default_pkey" on _hyper_7_5752_chunk (cost=0.29..13.67 rows=11 width=8) (actual time=0.012..0.012 rows=0 loops=1)
Index Cond: (item_id = 233258)
-> Index Scan Backward using "5755_5755_idata_sc_default_pkey" on _hyper_7_5755_chunk (cost=0.29..13.67 rows=11 width=8) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (item_id = 233258)

and continuing on with index scan backward.

Note that running similar queries on the individual hypertables that make up the idata view are VERY fast.

#28

Announcements / Re: NetXMS 3.0 released

September 11, 2019, 10:49:38 PM

Excellent! Thanks for all the work.

#29

General Support / TimeScale and drop_chunks, DCI disk usage

August 14, 2019, 07:40:43 PM

Anyone else using timescaledb with postgres?

When I look at timescaledb maintenance, typically the drop_chunks function would be used to reclaim disk space. However, drop_chunks requires TIMESTAMP, TIMESTAMPTZ, or DATE column types. For 'idata' and 'tdata', for example, the timestamp column is actually an int4.

Any pointers on how to determine whether NetXMS is actually pruning old entries to reclaim disk space? It's certainly dropping DCIs correctly from an access standpoint -- queries only return the expected time ranges.

#30

General Support / Data Table SELECT possible in Postgresql?

July 17, 2019, 08:47:27 PM

Does anyone have experience querying DC table data directly using Postgres? I'm working in Grafana, and having good success with non-table DCIs. For complicated reasons I need to be able to create custom queries rather than relying on NetXMS' built-in data export features. By parsing dctable.cpp, I believe I get the gist of how tables are constructed using dc_table_columns, with the cells themselves stored in tdata, but I'm not understanding how the
tdata_value field actually works, or whether it's even possible to parse it outside of NetXMS itself.

Another option would be if the Grafana API plugin for NetXMS supported tables, but I'm not sure anyone's working on that.

NetXMS Support Forum

News:

Messages - jermudgeon

General Support / Re: Valgrind log for 3.1.261-2; server crash

General Support / Re: Valgrind log for 3.1.261-2; server crash

General Support / Netxms 3.x failing to reconnect to database

General Support / Valgrind log for 3.1.261-2; server crash

General Support / Re: netxmsd 3.1 crashing on startup

General Support / Re: netxmsd 3.1 crashing on startup

General Support / netxmsd 3.1 crashing on startup

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

General Support / Re: 3.0.2284 and TimescaleDB view 'idata'

General Support / 3.0.2284 and TimescaleDB view 'idata'

Announcements / Re: NetXMS 3.0 released

General Support / TimeScale and drop_chunks, DCI disk usage

General Support / Data Table SELECT possible in Postgresql?