NetXMS sporadically stops collecting data

Started by adimitrov, September 28, 2016, 11:15:00 AM

Previous topic - Next topic

adimitrov

Hello team ,

We have an issue where we lose data for specific customer. it is sporadic and by our observations it happens at same time everyday. Netxms just stops collecting the data and present it in the console and after a while it fixes itself. I have attached excerpt from the logs for last night when the issue occur and this morning when it started sending data again(debug level 6).

Any help will be useful.

P.S. I have the full log files with issue reproduced if needed.


Victor Kirhenshtein

Hi,

is there any messages in the log around those times related to communication session with the server? Was session disconnected and reconnected only in the morning?

Best regards,
Victor


adimitrov

Hello Viktor,

Yes, a lot of these:

[28-Sep-2016 11:39:21.917] [DEBUG] [session:3] Session with 78.130.143.30 closed
[28-Sep-2016 11:39:21.918] [DEBUG] Incoming connection from 78.130.143.30
[28-Sep-2016 11:39:21.918] [DEBUG] Connection from 78.130.143.30 accepted

It happens everyday, not just in the morning. Usually late evening (about 22:00)  and at mornings (about 9-9:30)

Best Regards
Adrian

adimitrov

Hello Viktor ,

This is happening at the moment:

[03-Oct-2016 09:16:32.467] [DEBUG] [session:16] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.468] [DEBUG] [session:16] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.468] [DEBUG] [session:12] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.468] [DEBUG] [session:12] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.473] [DEBUG] [session:15] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.473] [DEBUG] [session:15] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.476] [DEBUG] [session:9] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.476] [DEBUG] [session:9] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.479] [DEBUG] [session:20] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.480] [DEBUG] [session:20] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.485] [DEBUG] [session:17] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.485] [DEBUG] [session:17] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.487] [DEBUG] [session:21] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.487] [DEBUG] [session:21] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.497] [DEBUG] [session:18] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.497] [DEBUG] [session:18] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.507] [DEBUG] [session:10] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.507] [DEBUG] [session:19] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.508] [DEBUG] [session:10] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.508] [DEBUG] [session:19] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.513] [DEBUG] [session:13] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.513] [DEBUG] [session:13] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.548] [DEBUG] [session:22] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.548] [DEBUG] [session:22] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:32.566] [DEBUG] [session:7] Message receiving error (MSGRECV_CLOSED)
[03-Oct-2016 09:16:32.566] [DEBUG] [session:7] Session with 78.130.143.30 closed
[03-Oct-2016 09:16:33.520] [DEBUG] [session:1] Received message CMD_GET_PARAMETER


It seems that the agent and netxms server cannot connect to each other, usually there are a lot of ReconciliationThreads, but in this situation:

[03-Oct-2016 09:19:33.211] [DEBUG] ReconciliationThread: 35 records to be sent in bulk mode
[03-Oct-2016 09:19:33.248] [DEBUG] ReconciliationThread: 35 records sent
[03-Oct-2016 09:20:03.303] [DEBUG] ReconciliationThread: 93 records to be sent in bulk mode
[03-Oct-2016 09:20:03.411] [DEBUG] ReconciliationThread: 93 records sent
[03-Oct-2016 09:20:33.463] [DEBUG] ReconciliationThread: 31 records to be sent in bulk mode
[03-Oct-2016 09:20:33.486] [DEBUG] ReconciliationThread: 31 records sent

When the agent is working normally there are a lot more records sent,for this reason we have increased DataReconciliationBlockSize to 8000 to solve an issue in the past, the DataReconciliationTimeout is set to 20000.

Best regards,
Adrian

Victor Kirhenshtein

MSGRECV_CLOSED means session closed by server. You could check server log if there are anything about connections to that agent.

Best regards,
Victor

adimitrov

Hello Victor,

I am looking at the logs from the server, but i am not sure for what massage to look for. I tried to search for the ip address of the agent or with connection, disconnect and so on as key words, but with no success. I am checking also by time stamp, but still can't find anything about disconnection, still from one point on, the server says:

         boot time set to 1472549691 from SNMP
    unable to get agent uptime
    unable to get system location

To be honest the issue is getting worse and worse. I observed that it is not just one agent with this issue.

I need help/guidance.

Best regards,
Adrian