Client can not connect to the server(may be a bug)

Started by 165432896, June 11, 2012, 02:01:53 PM

Previous topic - Next topic

165432896

Hi victor,
    In netxmns 1.1.6. The server is normally work at most time, But sometimes I can not connect server by my own client that I write with java api. Then I try to connect it by the official console,but it also failure,it said:Unable to connect:Request timed out.. ----but the server is keep success get the agent DCI date and write it to database.
    After I restart the server,both my own client and official console can connect to the server. So I think That problem may not cause by my own client.

----------------------------------------------
So I use the GDB to debug it. I found that problem cause by the resource competition with the lock:

In Client.cpp: ClientListener()--->if (!RegisterSession(pSession))
When the ClientSessioncreate create , it will register. But the "m_rwlockSessionListAccess" is lock. So the client cann't get the response by the client,and It disappear "Unable to connect:Request timed out."


Who get the "m_rwlockSessionListAccess" and not release it?
I found that it is in the client.cpp-->NotifyClientSessions()-->m_pSessionList->notify(dwCode, dwData); the code is stop in this and it can not continue to run on,so it do not release the "m_rwlockSessionListAccess".

How did the notify() is block?
notify()-->sendMessage()-->SendEx()-->MutexLock(mutex, INFINITE)-->send(nSocket, ((char *)pBuff) + (nSize - nLeft), nLeft, nFlags);
The send() is block(I do not know why).so the lock "m_mutexSocketWrite" is keep locking.

In a world:
Send() is block--> "m_mutexSocketWrite" can not release-->"m_rwlockSessionListAccess" can not release-->client:Request timed out.
-------------------------------------------------------------------

The above is just my speculated, It may need you to check it againg.
Finally,How I should solve that problem?  My friend ask me to set the send() work with unblock(It is block default). By I think it may cause other problem. Is it any good idea?
Thanks.

Victor Kirhenshtein

Hi!

Good catch, thanks for detailed debugging! I'll make a fix for that in a next few days.

Best regards,
Victor

165432896

After you fix the bus. I hope you can tell me how to fix it detailly in these post
Thanks ;D

Victor Kirhenshtein

Hi!

I have found a system where I was able to reproduce this problem. And yes, it was solved by switching to non-blocking sockets for client connections. I made necessary changes in svn trunk, and version 1.2.2 will contain this fix. Changes are minimal - see diff below:


Modified: trunk/src/server/core/client.cpp

===================================================================

--- trunk/src/server/core/client.cpp  2012-06-20 17:39:09 UTC (rev 6534)

+++ trunk/src/server/core/client.cpp  2012-06-20 20:25:43 UTC (rev 6535)

@@ -195,6 +195,7 @@

       }



       errorCount = 0;     // Reset consecutive errors counter

+              SetSocketNonBlocking(sockClient);



       // Create new session structure and threads

       pSession = new ClientSession(sockClient, (struct sockaddr *)&servAddr);

@@ -287,6 +288,7 @@

       }



       errorCount = 0;     // Reset consecutive errors counter

+              SetSocketNonBlocking(sockClient);



       // Create new session structure and threads

       pSession = new ClientSession(sockClient, (struct sockaddr *)&servAddr);



Modified: trunk/src/server/core/session.cpp

===================================================================

--- trunk/src/server/core/session.cpp 2012-06-20 17:39:09 UTC (rev 6534)

+++ trunk/src/server/core/session.cpp 2012-06-20 20:25:43 UTC (rev 6535)

@@ -393,7 +393,8 @@

    {

       if ((iErr = RecvNXCPMessageEx(m_hSocket, &pRawMsg, m_pMsgBuffer, &msgBufferSize,

                                             &m_pCtx, (pDecryptionBuffer != NULL) ? &pDecryptionBuffer : NULL,

-                                                                                           INFINITE, MAX_MSG_SIZE)) <= 0) {

+                                                                                           900000, MAX_MSG_SIZE)) <= 0)  // timeout 15 minutes

+              {

          DebugPrintf(5, _T("RecvNXCPMessageEx failed (%d)"), iErr);

          break;



Best regards,
Victor