Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - Tursiops

#316
Hi Victor,

In my case the nodes are the proxy nodes for a site and the SNMP traffic to them is tunneled through their own agent.
When this happens though, the server will happily collect SNMP data from the node without any problems. It just doesn't cancel/remove the SNMP unreachable alert.

This looks like it's related to https://track.radensolutions.com/issue/NX-1228?

Cheers
#317
Hi,

Looks like our NetXMS server decided to start segfaulting after the upgrade to 2.1-RC1.
Reading through the dump and not being a developer, I have no idea what the underlying cause is, so here goes:

*** Error in `netxmsd': malloc(): smallbin double linked list corrupted: 0x0000000035444410 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f6f596457e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x81d61)[0x7f6f5964fd61]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0xba)[0x7f6f5965221a]
/usr/lib/x86_64-linux-gnu/libnetxms.so.2(_ZN11NXCPMessageC1EP12NXCP_MESSAGEi+0x227)[0x7f6f599c3b47]
/usr/lib/x86_64-linux-gnu/libnxsrv.so.2(_ZN15AgentConnection14receiverThreadEv+0x592)[0x7f6f59c20b72]
/usr/lib/x86_64-linux-gnu/libnxsrv.so.2(_ZN15AgentConnection21receiverThreadStarterEPv+0x9)[0x7f6f59c21039]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f6f57a2d6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6f596d482d]


Anyone else seeing similar crashes? Any idea what's causing them?

Cheers
#318
General Support / Re: agent-to-server connections
May 15, 2017, 01:31:04 AM
Hi Victor,

The node in question was (and is) completely unreachable to the server, unless the tunnel was working.
Inbound active server->agent connections are being rejected by the firewall in that network.

Cheers
#319
General Support / [Bug] Invalid Zone ID
May 12, 2017, 03:23:48 AM
Hi,

I still seem to be having a problem with moving nodes between zones.
I had described my problem before, but do not have a working solution yet other than deleting nodes and re-adding them (and I couldn't find my own, older post to reply to). Some of this is likely related to https://track.radensolutions.com/issue/NX-1148.
Except now I have issues where I am unable to move a node no matter how long I wait.
The error message used to be about IP address conflicts, now it just tells me "Invalid Zone ID".

And it seems to happen in other, more random, situations, so I am not sure if this is still the same issue.

--- EDIT --- The below part is indeed a different issue. --- EDIT ---
The previous issue generally happened when I wanted to change a setup like this:
- Node A sits in Zone Y. Node A is a server and NetXMS Proxy.
- Node B sits in Zone Z. Node B is a firewall/router.
- Both nodes are in the same private network and usually Node B is the default gateway for Node A.
- Node B is configured with it's public IP address in NetXMS for SNMP monitoring.
- Node A is configured with Node B's public IP address in NetXMS for Agent monitoring. Node B has a port forward for 4700 to Node A's private IP.
The above setup works fine, no issues.

Now I want both Node A and Node B to be in the same zone. The main reason being the ability to use Node A as syslog proxy for the entire private network, including the firewall.

Trying to move these into the same zone leads to (or used to lead to) an IP address conflict. The only way I can resolve this is to delete Node B, move Node A into zone Z, then re-add Node B in zone Z. After that everything works fine. Except at that point I usually hit https://track.radensolutions.com/issue/NX-1148, so it can take a while before I can move Node A.

That was my previous issue (although it still is an issue for me).

By now, I do not receive an IP address conflict message. I receive an "Invalid Zone ID" error instead.
That doesn't go away after a couple of hours either and it survives a NetXMS server reboot and nxdbmgr check run.
All I can do now is to delete Node A as well and then re-add it.

In addition, this errors has started appearing when trying to move random nodes which did and do not have IP address conflicts between zones. It also doesn't matter into which zone I want to move them - they always come back with "Invalid Zone ID".

Is this an issue in 2.1-M3?
Or do I have some database corruption somewhere which nxdbmgr check doesn't pick up?

Cheers
#320
General Support / Re: Geolocation & Map Questions
May 12, 2017, 03:00:05 AM
For #2 I was thinking about Google's API, either from location data collected via SNMP (that would actually be difficult considering a number of devices do not allow sufficient characters for a full address in their SNMP location) or from manually entered location data. But I guess that will require a license if you have too many nodes and thus trigger too many polls a day.
#321
General Support / Re: agent-to-server connections
May 11, 2017, 09:51:40 AM
Ok, found the final issue.

Using 0.0.0.0 as the agent IP did not work.
Using the systems actual IP address once the tunnel was bound did work.
#322
Hi,

The more nodes and DCIs we have, and of course the more alerts we have in the system, the slower the NetXMS Console is to respond when for example clicking on "Entire Network" or "Infrastructure Services" (or heavily populated subnets/containers). It seems as if the Console is pulling the data for all tabs at that moment already, instead of when I actually click on the relevant tab.

The delay is quite noticeable for us by now (i.e. around 4-6 seconds).

I suggest not loading the data until the relevant tab is clicked upon, as there is no need to introduce this delay unless one actually wants to look at that data.

Cheers
#323
General Support / Re: Auto unbind
May 02, 2017, 12:44:13 AM
My bad on two parts:

  • The subnet is not something that will change, just because the node is down.
  • NetXMS will not run a configuration poll on a node that's down.

While you could add some code to your auto-bind rule to exclude nodes that are down, the fact that it won't be running another configuration poll while the node is down will be an issue.
Not sure if there's a safe way to work around that (e.g. a manual full configuration poll on a node that's known to be down is a sure way of losing data, so not something I'd call "safe").
#324
Hi,

We have quite a few systems with ICMP polling disabled and only SNMP or Agent enabled.
The only time I have seen the behaviour you describe is when the server can't reach the agent.
Have you confirmed that the server can contact the node on port 4700?

Cheers
#325
General Support / Re: Auto unbind
April 26, 2017, 03:32:21 AM
Hi,

If you tick the checkbox "Automatically unbind nodes from this container when they no longer passes the filter", it should remove them from the container once a node no longer matches the auto bind rule.
But for all I recall it will only do this on a configuration poll, i.e. it's not live.

Cheers
#326
General Support / Re: MIB explorer value table error
April 26, 2017, 03:28:22 AM
Hi,

The right-click on a Node MIB Explorer Walk works in 2.1-M3.

The MIB Walk Results if called from within a DCI are still broken.
While they no longer shows Hex-STRING, they show the SNMP Type in the Value column as per the attached screenshot.

Cheers
#327
General Support / Re: agent-to-server connections
April 21, 2017, 07:13:26 AM
Hi again,

Hope this thread helps other people who might want to try the new experiemental agent->server connection functionality.

After spending some more time on this, I guess it all depends on how you set up your CA.
I used the following guide: https://jamielinux.com/docs/openssl-certificate-authority/index.html
Now, that's where I managed to become confused between the Wiki article and the above guide.

The NetXMS Wiki says to link to the CA certificate and a combined Server certificate/key.
The latter is presumably meant to sign additional certificates, but my server certificate was not authorised to do that.
Instead, I had to put my Intermediate CA certificate and key in as "ServerCertificate" and now the tunnels are up, bound and showing in NetXMS.


What's still not working for me is actual data collection:
- Status polls return unknown status, configuration polls don't run due to node unreachable.
- DCIs all show <ERROR>
- The only exception is a full configuration poll, which happily detects the agent.

I wasn't quite sure if this was Zone related, so I moved the nodes in question into the Default zone "just in case", but it made no difference.
The communication IP is set to 0.0.0.0 as per Wiki.

Not sure what else I might possibly be missing at this stage?

Cheers
#328
General Support / Re: agent-to-server connections
April 21, 2017, 05:55:56 AM
Hi again,

I went through some of the code and think I have a better understanding on how nodes are bound to tunnels now.
Posting some of this information here in case it might be helpful to others (even though my tunnels still are not working due to TLS handshake problems).

The agent connects without a certificate and the server generates one for the agent. The certificate contains the agent's GUID as common name. So whenever it connects afterwards, the server knows which node it's talking to, so there is no need to store any node<->tunnel reference in the database. Which means all I need to do to "start fresh" is to delete the certificate, which by default (on Windows) is stored in %localappdata%\nxagentd\certificates. So for the standard Windows agent which is running as Local System that would be C:\Windows\System32\config\systemprofile\AppData\Local\nxagentd\certificates. Delete the certificates and the tunnel will show up as unbound on the server again.

That basically answered my last two questions. It just doesn't tell me why the TLS handshake is failing.  :-\

Cheers
#329
General Support / Re: agent-to-server connections
April 21, 2017, 04:37:01 AM
Hi Victor,

I've been trying to get this to work, but it appears I'm getting something wrong with the certificates, leading to tunnels not working from the moment I bind them.

After setting up a CA on the NetXMS server, I configured NetXMS as per the Wiki article.
When a node connects, I see an unbound tunnel. I bind it and see the following in the logs:
[DEBUG] IssueCertificate: new certificate request (CN override: <NODE GUID>)
[DEBUG] IssueCertificate: new certificate issued successfully
[DEBUG] [TUN-1] Certificate issued

After that, the tunnel disappears and the client logs just keep telling me the TLS handshake is failing, and the server logs give me "verification error 20 (unable to get local issuer certificate)" and "verification error 2 (unable to get issuer certificate)". I translate that as "I don't trust your intermediate/CA cert, go away".
Next I installed the CA and Intermediate cert on the actual node, but that did not make any difference.
The certificate chain on the NetXMS server verifies ok and the server is not complaining about the certificates at startup either, but confirms they loaded ok.
Not sure if I am missing something with the certificate setup?

And another question, is there a way to "unbind" tunnels I setup for testing?
Once I bind them, I can neither see the tunnels as "Bound" or "Unbound" and cannot find a way to remove that link in the Console or in the database. Short of wiping the nodes and re-adding them, how do I remove those node<->tunnel links?

I also can't see where the client certificates are actually stored on the node?

Cheers
#330
Hi,

In step 5, change the data type to Unsigned Integer 64bit.
In step 7, set the Base SNMP OID as .1.3.6.1.2.1.2.2.1.1 (not .2).
In step 8, you will need this:
return %(true, $1, $2);
Note the % instead of the $ in your list.

Add a step after 8 to:
Click on Transformation and select Average delta per second. As you want Bytes per second, you don't need to add further any transformation (e.g. you could "return $1*8;" to return Bits per second instead. You may also want to implement some kind of transform set to avoid showing funky data caused by counter overflows (i.e. impossibly high values being shown). This part is something they are working on fixing with a new data type for this purpose, but for now, you'll simply have to filter the bad results out in one way or the other.

Poll Configuration will apply the template and Instance Discovery will create the required DCIs afterwards.
You need to run both.

Cheers