Over 32.000 connections from NetXMS server to DB in CLOSE_WAIT status

Started by Marcin, December 04, 2014, 11:02:53 AM

Previous topic - Next topic

Marcin

Hi,

NetXMS was not monitored on development system for several days. During that time oracle DB password has expired:
[04-Dec-2014 09:31:17.278] [ERROR] Unable to establish connection with database (ORA-28002: the password will expire within 5 days)
This issue was not noticed, but it was not the problem itself.

We had a problem with backups of development system as Networker client was failing.
After analysis I found over 32.000 connections from netxmsd towards DB in CLOSE_WAIT status:

153.98.100.55.36421  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT
153.98.100.55.35894  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT
153.98.100.55.33549  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT
153.98.100.55.65128  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT
153.98.100.55.60114  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT
153.98.100.55.53602  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT
153.98.100.55.57421  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT
153.98.100.55.33275  153.98.100.57.1525   49152      0 49152      0 CLOSE_WAIT

NetXMS was opening new connection until Solaris 10 reached the limit of open connections.
As a consequence this caused issues with other applications.

I think that some internal limits of connections within NetXMS should be defined to avoid serious problems when DB is not available or not accessible for some time.

Best regards,
Marcin

Victor Kirhenshtein

Hi,

it's quite strange because NetXMS server closes Oracle session if it cannot login and opens new one. Theoretically closing session in Oracle client should close underlying socket as well. Were those connections on DB server side or on NetXMS server side? I'll also check if we close Oracle session correctly.

Best regards,
Victor

Marcin


On NetXMS server side.
Stopping NetXMS server service took around 15 minutes.
During that time amount of connections in CLOSE_WAIT state was slowly decreasing.