Hi victor,
In netxmns 1.1.6. The server is normally work at most time, But sometimes I can not connect server by my own client that I write with java api. Then I try to connect it by the official console,but it also failure,it said:Unable to connect:Request timed out.. ----but the server is keep success get the agent DCI date and write it to database.
After I restart the server,both my own client and official console can connect to the server. So I think That problem may not cause by my own client.
----------------------------------------------
So I use the GDB to debug it. I found that problem cause by the resource competition with the lock:
In Client.cpp: ClientListener()--->if (!RegisterSession(pSession))
When the ClientSessioncreate create , it will register. But the "m_rwlockSessionListAccess" is lock. So the client cann't get the response by the client,and It disappear "Unable to connect:Request timed out."
Who get the "m_rwlockSessionListAccess" and not release it?
I found that it is in the client.cpp-->NotifyClientSessions()-->m_pSessionList->notify(dwCode, dwData); the code is stop in this and it can not continue to run on,so it do not release the "m_rwlockSessionListAccess".
How did the notify() is block?
notify()-->sendMessage()-->SendEx()-->MutexLock(mutex, INFINITE)-->send(nSocket, ((char *)pBuff) + (nSize - nLeft), nLeft, nFlags);
The send() is block(I do not know why).so the lock "m_mutexSocketWrite" is keep locking.
In a world:
Send() is block--> "m_mutexSocketWrite" can not release-->"m_rwlockSessionListAccess" can not release-->client:Request timed out.
-------------------------------------------------------------------
The above is just my speculated, It may need you to check it againg.
Finally,How I should solve that problem? My friend ask me to set the send() work with unblock(It is block default). By I think it may cause other problem. Is it any good idea?
Thanks.
Hi!
Good catch, thanks for detailed debugging! I'll make a fix for that in a next few days.
Best regards,
Victor
After you fix the bus. I hope you can tell me how to fix it detailly in these post
Thanks ;D
Hi!
I have found a system where I was able to reproduce this problem. And yes, it was solved by switching to non-blocking sockets for client connections. I made necessary changes in svn trunk, and version 1.2.2 will contain this fix. Changes are minimal - see diff below:
Modified: trunk/src/server/core/client.cpp
===================================================================
--- trunk/src/server/core/client.cpp 2012-06-20 17:39:09 UTC (rev 6534)
+++ trunk/src/server/core/client.cpp 2012-06-20 20:25:43 UTC (rev 6535)
@@ -195,6 +195,7 @@
}
errorCount = 0; // Reset consecutive errors counter
+ SetSocketNonBlocking(sockClient);
// Create new session structure and threads
pSession = new ClientSession(sockClient, (struct sockaddr *)&servAddr);
@@ -287,6 +288,7 @@
}
errorCount = 0; // Reset consecutive errors counter
+ SetSocketNonBlocking(sockClient);
// Create new session structure and threads
pSession = new ClientSession(sockClient, (struct sockaddr *)&servAddr);
Modified: trunk/src/server/core/session.cpp
===================================================================
--- trunk/src/server/core/session.cpp 2012-06-20 17:39:09 UTC (rev 6534)
+++ trunk/src/server/core/session.cpp 2012-06-20 20:25:43 UTC (rev 6535)
@@ -393,7 +393,8 @@
{
if ((iErr = RecvNXCPMessageEx(m_hSocket, &pRawMsg, m_pMsgBuffer, &msgBufferSize,
&m_pCtx, (pDecryptionBuffer != NULL) ? &pDecryptionBuffer : NULL,
- INFINITE, MAX_MSG_SIZE)) <= 0) {
+ 900000, MAX_MSG_SIZE)) <= 0) // timeout 15 minutes
+ {
DebugPrintf(5, _T("RecvNXCPMessageEx failed (%d)"), iErr);
break;
Best regards,
Victor