NetXMS Support Forum

English Support => General Support => Topic started by: fldiet on June 16, 2021, 11:29:49 AM

Title: nxpush "Request timed out" on nodes with variable connectivity only.
Post by: fldiet on June 16, 2021, 11:29:49 AM
Hello,

Prerequisite information:
We are pushing data to NetXMS from the machine hosting NetXMS itself with a
series of nxpush calls - one for each target node.

Observation:
We are experiencing a node-specific issue, concerning a set of local nxpush
sessions timing out statistically for specific nodes only.  Those nodes are all
experiencing connectivity issues. There seems to be a correlation between
occurrences of timeouts and network issues.

Question:
At the event of a new nxpush session, are there any internal checks taking
place requiring connectivity of the node? Could a recurring process like a poll
get stuck at the connectivity issue and interfere?


Kind regards,

  Florian
Title: Re: nxpush "Request timed out" on nodes with variable connectivity only.
Post by: Filipp Sudanov on June 16, 2021, 03:04:16 PM
Collection of server log would be beneficial here. Can you set debug level 6 for client.* tag (issue "debug client.* 6" in Tools->Server Console or you can set this in server configuration file).
nxpush is creating the same connection as nxmc, so sessions of both will be visible under this tag. Each new connecting gets a new session id which is visible in the debug tag, e.g. client.session.1234.

Then, when you encounter a nxpush nang, you should be able to find and grep out a corresponding session from the log file.
Title: Re: nxpush "Request timed out" on nodes with variable connectivity only.
Post by: fldiet on June 18, 2021, 12:08:27 PM
I configured the debug level as requested, restarted netxmsd, set up a series of identical nxpush requests sequentially without artificial delay and observed the result of nxpush next to netxmsd's log.

Here are some basic observances: Here is a snippet of the logs in the neighbourhood of an isolated hang ( answering with CMD_KEEPALIVE to CMD_PUSH_DCI_DATA ), including the surrounding successful ones:

2021.06.16 15:30:01.735 *D* [client.session     ] Client session with ID 84 registered
2021.06.16 15:30:01.735 *D* [client.session.84  ] Received message CMD_GET_SERVER_INFO
2021.06.16 15:30:01.735 *D* [client.session.84  ] Server time zone: UTC+00UTC
2021.06.16 15:30:01.735 *D* [client.session.84  ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.06.16 15:30:01.736 *D* [client.session.84  ] Received message CMD_LOGIN
2021.06.16 15:30:01.736 *D* [client.session.84  ] authentication type 0
2021.06.16 15:30:01.736 *D* [client.session.84  ] Protocol level compression is supported by client
2021.06.16 15:30:01.736 *D* [client.session.84  ] User <ourUserForNxpush>@127.0.0.1 authenticated (language=en clientInfo="nxpush/3.8.382 (Linux 4.19.0-16-amd64; libnxcl 3.8.382)")
2021.06.16 15:30:01.736 *D* [client.session.84  ] Sending compressed message CMD_LOGIN_RESP (312 bytes)
2021.06.16 15:30:01.777 *D* [client.session.84  ] Received message CMD_PUSH_DCI_DATA
2021.06.16 15:30:02.545 *D* [client.session.84  ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.06.16 15:30:02.545 *D* [client.session.84  ] readSocket: connection closed
2021.06.16 15:30:02.545 *D* [client.session.84  ] Session closed
2021.06.16 15:30:02.545 *D* [client.session     ] Client session with ID 84 unregistered
2021.06.16 15:30:02.545 *D* [client.session.84  ] Socket closed
2021.06.16 15:30:02.545 *D* [client.session.84  ] Session object destroyed


2021.06.16 15:30:02.560 *D* [client.session     ] Client session with ID 84 registered
2021.06.16 15:30:02.560 *D* [client.session.84  ] Received message CMD_GET_SERVER_INFO
2021.06.16 15:30:02.560 *D* [client.session.84  ] Server time zone: UTC+00UTC
2021.06.16 15:30:02.560 *D* [client.session.84  ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.06.16 15:30:02.561 *D* [client.session.84  ] Received message CMD_LOGIN
2021.06.16 15:30:02.561 *D* [client.session.84  ] authentication type 0
2021.06.16 15:30:02.561 *D* [client.session.84  ] Protocol level compression is supported by client
2021.06.16 15:30:02.561 *D* [client.session.84  ] User <ourUserForNxpush>@127.0.0.1 authenticated (language=en clientInfo="nxpush/3.8.382 (Linux 4.19.0-16-amd64; libnxcl 3.8.382)")
2021.06.16 15:30:02.561 *D* [client.session.84  ] Sending compressed message CMD_LOGIN_RESP (312 bytes)
2021.06.16 15:30:02.602 *D* [client.session.84  ] Received message CMD_PUSH_DCI_DATA
2021.06.16 15:30:05.416 *D* [client.session.84  ] Sending message CMD_KEEPALIVE (32 bytes)
2021.06.16 15:30:05.423 *D* [client.session.84  ] Sending message CMD_NOTIFY (48 bytes)
2021.06.16 15:30:05.495 *D* [client.session.84  ] Sending message CMD_NOTIFY (48 bytes)
2021.06.16 15:30:07.601 *D* [client.session.84  ] readSocket: connection closed
2021.06.16 15:30:07.602 *D* [client.session.84  ] Waiting for pending requests...


2021.06.16 15:30:07.627 *D* [client.session     ] Client session with ID 71 registered
2021.06.16 15:30:07.627 *D* [client.session.71  ] Received message CMD_GET_SERVER_INFO
2021.06.16 15:30:07.627 *D* [client.session.71  ] Server time zone: UTC+00UTC
2021.06.16 15:30:07.627 *D* [client.session.71  ] Sending message CMD_REQUEST_COMPLETED (616 bytes)
2021.06.16 15:30:07.628 *D* [client.session.71  ] Received message CMD_LOGIN
2021.06.16 15:30:07.628 *D* [client.session.71  ] authentication type 0
2021.06.16 15:30:07.628 *D* [client.session.71  ] Protocol level compression is supported by client
2021.06.16 15:30:07.628 *D* [client.session.71  ] User <ourUserForNxpush>@127.0.0.1 authenticated (language=en clientInfo="nxpush/3.8.382 (Linux 4.19.0-16-amd64; libnxcl 3.8.382)")
2021.06.16 15:30:07.628 *D* [client.session.71  ] Sending compressed message CMD_LOGIN_RESP (312 bytes)
2021.06.16 15:30:07.668 *D* [client.session.71  ] Received message CMD_PUSH_DCI_DATA
2021.06.16 15:30:07.989 *D* [client.session.71  ] Sending message CMD_REQUEST_COMPLETED (32 bytes)
2021.06.16 15:30:07.990 *D* [client.session.71  ] readSocket: connection closed
2021.06.16 15:30:07.990 *D* [client.session.71  ] Session closed
2021.06.16 15:30:07.990 *D* [client.session     ] Client session with ID 71 unregistered
2021.06.16 15:30:07.990 *D* [client.session.71  ] Socket closed
2021.06.16 15:30:07.990 *D* [client.session.71  ] Session object destroyed

Title: Re: nxpush "Request timed out" on nodes with variable connectivity only.
Post by: Victor Kirhenshtein on June 18, 2021, 09:56:59 PM
Hi,

it looks that DCI data processing for some reason takes too long. Do you have any transformation scripts or script thresholds on those push DCIs?

Best regards,
Victor
Title: Re: nxpush "Request timed out" on nodes with variable connectivity only.
Post by: fldiet on June 21, 2021, 04:52:17 PM
Hi,

Yes, there are some transformation scripts.  But those are limited in
complexity as well as number ( ~700 DCIs ) and should therefore be manageable.

There are no more than 30 *DCI*() invokes for said client in total.
Thresholds boil down to very basic diff(), last() and average() calls.
These DCIs and their threshold events do not trigger any further scripts.

---

Multiple other nodes under very similar circumstances ( they inherit from the
same DCI templates or more, and are being pushed to at the same frequency ) on
the very same server instance are handled perfectly fine.  We are talking ~5
afflicted and 50 unaffected nodes.

Here is a quick plot I did for navigating/cross-referencing netxmsd's logs for
hangs in my last post.
I added the data of an unaffected node ( taken briefly after ) comparing the
time nxpush took for a set of attempts to an affected nodes behaviour.
https://i.imgur.com/CUrEvin.png

Am I off track for assuming there should be an external variable connecting the affected nodes?


Kind regards,

  Florian
Title: Re: nxpush "Request timed out" on nodes with variable connectivity only.
Post by: Filipp Sudanov on June 22, 2021, 11:59:59 AM
Let's try to exclude the option that transformation scripts get randomly delayed. Try wrapping transformation scripts in something like
trace(0, $object->id . ":" . $dci->id . " tranformation script start");
...
trace(0, $object->id . ":" . $dci->id . " tranformation script end");


first parameter of trace() defines the log level. This way you should get some stats on the actual duration of the transformation scripts.