Hello again!
So far our network and its monitoring is behaving quite normally, but one single server (the only one with the Oracle subclient enabled) has been doing strange things lately. Maybe these observations are somehow related to each other:
We're using Oracle 11.2.0.1.0 x64. This is the policy "oracle" (without the original names and passwords), only applied to that server:
The failing database is a test instance. I copied imported a current datapump from the live system and ran an update afterwards. Before, this has been working great for many times, so I'm not sure now where the errors emerge from. It could be the datapump export/import, the update, the suspiciously huge nxagentd.exe or just another strange error on a server that seems to have some severe problem.
Could the (sub-) agent be generating more and more sessions to grow linearly and to finally kill our database? (Why just this one?)
Is there a way to tell where these huge amounts of data come from?
Thanks a lot to anyone who can help a bit here - I'm feeling a little uncomfortable about monitoring that server as I fear the subagent is messing up somehow with that system. As long as it's just the test database, it's ok, but 4 of 6 are our live databases.
Best regards from Germany
So far our network and its monitoring is behaving quite normally, but one single server (the only one with the Oracle subclient enabled) has been doing strange things lately. Maybe these observations are somehow related to each other:
- The server dropped out of our DNS twice. Its IP was still the same, but the name wasn't known anymore by other nodes. A restart solved the problem.
- That restart needed almost 30 minutes. The event log tells that some OracleConsole processes didn't start within 16 minutes, but not all of them. In the meantime, you can't access the desktop (even via VM Console), but the databases are already reachable.
- Between dropping out of the DNS the first and the second time, 2 weeks passed, and in both cases a week before, we couldn't connect via RDP anymore as domain users, just as the local admin.
- Pinging itself, the server uses IPv6 as default, even though that protocol is disabled. When I tried to enable IPv6, it told me that therefore I needed a network card - how the hell did I open up that RDP connection if there's no LAN interface?!
- One of the six databases (4 of them including the special one are monitored using the Oracle.DBInfo.IsReachable parameter) worked fine until last week, but now it keeps crashing within hours. There are different errors, one of them was that the available 150 sessions were full - definitely not by so many users.
- Right then we noticed that the nxagentd.exe process had reached a memory consumption of 2GB. After restarting the service, it went back to about 25MB but kept growing again - see attached screenshot. CPU utilisation is low, though (just a few times rising to 1 or 2%). (Unfortunately, I didn't monitor the agent prior to that finding.)
We're using Oracle 11.2.0.1.0 x64. This is the policy "oracle" (without the original names and passwords), only applied to that server:
Code Select
<config>
<agent>
<subagent>oracle.nsm</subagent>
</agent>
<oracle>
<databases>
<database id="1">
<id>111</id>
<tnsname>111</tnsname>
<username>111</username>
<password>xxx</password>
</database>
<database id="2">
<id>222</id>
<tnsname>222</tnsname>
<username>222</username>
<password>xxx</password>
</database>
<database id="3">
<id>333</id>
<tnsname>333</tnsname>
<username>333</username>
<password>xxx</password>
</database>
<database id="4">
<id>444</id>
<tnsname>444</tnsname>
<username>444</username>
<password>xxx</password>
</database>
<database id="5">
<id>555</id>
<tnsname>555</tnsname>
<username>555</username>
<password>xxx</password>
</database>
</databases>
</oracle>
</config>
The failing database is a test instance. I copied imported a current datapump from the live system and ran an update afterwards. Before, this has been working great for many times, so I'm not sure now where the errors emerge from. It could be the datapump export/import, the update, the suspiciously huge nxagentd.exe or just another strange error on a server that seems to have some severe problem.
Could the (sub-) agent be generating more and more sessions to grow linearly and to finally kill our database? (Why just this one?)
Is there a way to tell where these huge amounts of data come from?
Thanks a lot to anyone who can help a bit here - I'm feeling a little uncomfortable about monitoring that server as I fear the subagent is messing up somehow with that system. As long as it's just the test database, it's ok, but 4 of 6 are our live databases.
Best regards from Germany