Problem creating tunnel between server and agent

Started by dj, May 20, 2020, 03:57:40 PM

Previous topic - Next topic

dj

Hello,

I'm trying to setup a tunnel between agent and the NetXMS-server. Unfortunately without success.

Both, server and agent, are latest version at the time of writing and are running on Window. Server's IP is 192.168.10.6, the agent is at .10.70

Server is running on Win Server 2016, I have tried agent side on Win7x64 (physical machine) and server 2019 (VWware virtual machine)

Here's what I did:

- created a key (rsa 2048) and a CSR (including SAN) on a linux machine using standard open-ssl
- created a server cert using the CSR (on our windows domain ca)
- copied the root cert, the server cert and the server key to the netxms server
- edited the server's .conf to add the file pathes for key, cert and ca

after starting the test agent's log reports:

*D* [tunnel             ] Tunnel manager started
*D* [tunnel             ] 192.168.10.6: Cannot open file "C:\Windows\system32\config\systemprofile\AppData\Local\nxagentd\certificates\897A0B2402C28AD5A07199D8CDD5A37FD445FF87.crt" (No such file or directory)
*D* [tunnel             ] 192.168.10.6: Cannot open file "C:\Windows\system32\config\systemprofile\AppData\Local\nxagentd\certificates\0602060AA8C0000000000000000000000000.crt" (No such file or directory)
*D* [tunnel             ] 192.168.10.6: Server certificate subject is /C=DE/ST=NW/.....
*D* [tunnel             ] 192.168.10.6: Server certificate issuer is /DC=de/DC=.....
...
*I* [                   ] Tunnel with 192.168.10.6 established

The two "cannot open file" lines should be ok as there are no certs yet. The server's certificate subject and issuer are ok.

Checking the tunnel using the management console I see the tunnel as unbound.

I think that everything is fine until here.

Now I tried to create a node and bind the tunnel using the management console.

A key and a cert are created and copied to the agent computer, the files appear in c:\windows\system32\config....

the agent's log:

*D* [tunnel             ] 192.168.10.6: Resetting tunnel
*D* [tunnel             ] 192.168.10.6: Certificate and private key loaded
*D* [tunnel             ] 192.168.10.6: Server certificate subject is /C=DE/ST=NW/.....
*D* [tunnel             ] 192.168.10.6: Server certificate issuer is /DC=de/DC=.....
...
*D* [tunnel             ] 192.168.10.6: Receiver thread stopped (MSGRECV_COMM_FAILURE)
...
*W* [                   ] Tunnel with 192.168.10.6 closed
...
*D* [tunnel             ] 192.168.10.6: Cannot configure tunnel (request timeout)


The unbound tunnel disappears from the mangement console, but there's no bound one now.

The server's log reports:

*D* [                   ] SocketListener/AgentTunnels: Incoming connection from 192.168.10.70
*D* [                   ] SocketListener/AgentTunnels: Connection from 192.168.10.70 accepted
*D* [agent.tunnel       ] SetupTunnel(192.168.10.70): TLS handshake failed (error:00000001:lib(0):func(0):reason(1))

Whatever I have tried - same results.

Anyone out here, who have an idea what I'm missing or doing wrong?

Thanks in advance for any help!

Regards
Detlev

tfines

Please post your agent config file.

What setting did you select on the server for how it handles new nodes?

dj

Thanks for your reply...

Quote from: tfines on May 20, 2020, 10:42:10 PM
Please post your agent config file.

ServerConnection = 192.168.10.6
ConfigIncludeDir = C:\NetXMS\etc\nxagentd.conf.d
LogFile = {syslog}
FileStore = C:\NetXMS\var
SubAgent = filemgr.nsm
SubAgent = ping.nsm
SubAgent = logwatch.nsm
SubAgent = wmi.nsm
DebugLevel = 6
LogFile = c:\netxms\log\agent.log

QuoteWhat setting did you select on the server for how it handles new nodes?

New nodes need to be added manually. I have tried with manually added node and bind as well as with "create node and bind". AgentTunnel.NewNodesContainer is left empty. The node appears in the root folder.

Best...
Detlev

Victor Kirhenshtein

Hi!

What agent versions you are using, and on what platform? We've seen such issue with Windows agents and it was solved by rebuilding agent with newer OpenSSL.

Best regards,
Victor

dj

Quote from: Victor Kirhenshtein on May 25, 2020, 09:52:57 AM

What agent versions you are using, and on what platform? We've seen such issue with Windows agents and it was solved by rebuilding agent with newer OpenSSL.


Hi Victor,

I tried the 3.3.314 windows x64 agent on both Win7-x64 and server2019-x64

Regards
Detlev

Filipp Sudanov

Please try the most recent nxagent-3.3.330 version, adding EnableSSLTrace=yes to agent configuration file and setting DebugLevel=7. Please share agent log for the time when the situation occurs.

dj

Quote from: Filipp Sudanov on June 01, 2020, 05:56:45 PM
Please try the most recent nxagent-3.3.330 version, adding EnableSSLTrace=yes to agent configuration file and setting DebugLevel=7. Please share agent log for the time when the situation occurs.

Hi Filipp,

I have updated the agent as requested and started a clean attempt. Same result as before.

Please find attached agent log

Regards
Detlev

Victor Kirhenshtein

Hi,

all seems going well until this point:

2020.06.02 06:04:06.788 *D* [tunnel             ] 192.168.10.6: SSL_write error (bytes=-1 ssl_err=5 errno=2)

SSL error 5 means underlying socket error, which is strange because SSL negotiation seems to be completed successfully. Are you using server version 3.3.x as well? Can you check what was logged in server log during that attempt?

Best regards,
Victor

dj

Quote from: Victor Kirhenshtein on June 02, 2020, 11:32:09 AM
2020.06.02 06:04:06.788 *D* [tunnel             ] 192.168.10.6: SSL_write error (bytes=-1 ssl_err=5 errno=2)

SSL error 5 means underlying socket error, which is strange because SSL negotiation seems to be completed successfully. Are you using server version 3.3.x as well? Can you check what was logged in server log during that attempt?
Hi,

yes, server is latest 3.3 version.

I ran same procedure again with debuglevel 7 on server. I have cut beginning and end of the log, but it still contains a lot of waste.

I think the most relevant line is at time 12:19:10.069   - handshake failed.

I have attached a zip file containing the server log.

Regards
Detlev

Victor Kirhenshtein

Please check that your server certificate has CA constraint set to TRUE. You can do that by printing certificate in text form with command like this:

openssl x509 -text -noout -in server.crt

and look for "X509v3 Basic Constraints" section. For example, my test server's certificate looks like this (only relevant part of the output):

        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:TRUE


Best regards,
Victor

dj

Quote from: Victor Kirhenshtein on June 03, 2020, 04:24:57 PM
Please check that your server certificate has CA constraint set to TRUE. You can do that by printing certificate in text form with command like this:

openssl x509 -text -noout -in server.crt

and look for "X509v3 Basic Constraints" section. For example, my test server's certificate looks like this (only relevant part of the output):

        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:TRUE



Hi Victor,

there's no "X509v3 Basic Contraints" section in my certificate at all...

I will try to make a new cert during the next days and try again.

Best...
Detlev

dj

Hi Victor,

I have created a new certificate containing the BasicConstaint:

        X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:TRUE


But the result is still the same when I try to bind the tunnel.

Excerpt from agent.log:

2020.06.05 06:02:02.529 *D* [tunnel             ] 192.168.10.6: Resetting tunnel
2020.06.05 06:02:02.532 *D* [tunnel             ] 192.168.10.6: Certificate and private key loaded
2020.06.05 06:02:02.532 *D* [ssl                ] SSL handshake start (before SSL initialization)
2020.06.05 06:02:02.532 *D* [ssl                ] SSL_connect: before SSL initialization
2020.06.05 06:02:02.532 *D* [ssl                ] SSL_connect: SSLv3/TLS write client hello
2020.06.05 06:02:02.532 *D* [ssl                ] SSL_connect: error in SSLv3/TLS write client hello
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: SSLv3/TLS write client hello
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: SSLv3/TLS read server hello
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: TLSv1.3 read encrypted extensions
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: SSLv3/TLS read server certificate request
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: SSLv3/TLS read server certificate
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: TLSv1.3 read server certificate verify
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: SSLv3/TLS read finished
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: SSLv3/TLS write change cipher spec
2020.06.05 06:02:02.534 *D* [ssl                ] SSL_connect: SSLv3/TLS write client certificate
2020.06.05 06:02:02.536 *D* [ssl                ] SSL_connect: SSLv3/TLS write certificate verify
2020.06.05 06:02:02.536 *D* [ssl                ] SSL_connect: SSLv3/TLS write finished
2020.06.05 06:02:02.536 *D* [ssl                ] SSL handshake done (SSL negotiation finished successfully)
2020.06.05 06:02:02.536 *D* [tunnel             ] 192.168.10.6: Server certificate subject is /C=DE/ST=NW/L=Bornheim/O=TILS GmbH/OU=IT/CN=netxms.tilsnet.de
2020.06.05 06:02:02.536 *D* [tunnel             ] 192.168.10.6: Server certificate issuer is /DC=de/DC=tilsnet/CN=tilsnet-CAROOT
2020.06.05 06:02:02.537 *D* [                   ] TlsMessageReceiver: SSL_read error (ssl_err=5 errno=0)
2020.06.05 06:02:02.537 *D* [tunnel             ] 192.168.10.6: Receiver thread stopped (MSGRECV_COMM_FAILURE)
2020.06.05 06:02:02.537 *D* [comm.vs.8          ] Requesting parameter "System.PlatformName"
2020.06.05 06:02:02.537 *W* [                   ] Tunnel with 192.168.10.6 closed
2020.06.05 06:02:02.537 *D* [comm.vs.8          ] GetParameterValue(): result is 0 (SUCCESS)
2020.06.05 06:02:02.537 *D* [comm.vs.8          ] Requesting parameter "System.UName"
2020.06.05 06:02:02.537 *D* [comm.vs.8          ] GetParameterValue(): result is 0 (SUCCESS)
2020.06.05 06:02:12.537 *D* [tunnel             ] 192.168.10.6: Cannot configure tunnel (request timeout)


Do you have any further ideas ?

Thanks & regards
Detlev

Victor Kirhenshtein

How server side log looks like for that tunnel? Also, please try this agent version: https://netxms.org/download/releases/3.3/nxagent-3.3.350-x64.exe - it fixes incorrect error code display in "SSL read error" message so we could see actual socket error code that could provide some clue.
Another thought - could it be that you have some kind of DPI device between agent and server that detects SSL handshake and blocks connection for some reason?

Best regards,
Victor

dj

Quote from: Victor Kirhenshtein on June 05, 2020, 10:40:34 AM
How server side log looks like for that tunnel? Also, please try this agent version: https://netxms.org/download/releases/3.3/nxagent-3.3.350-x64.exe - it fixes incorrect error code display in "SSL read error" message so we could see actual socket error code that could provide some clue.
Another thought - could it be that you have some kind of DPI device between agent and server that detects SSL handshake and blocks connection for some reason?

Hi Victor,

I have updated the agent to 3.3.350 and tried again with Debuglevel 7 on both, agent side and server.

Please refer to attached logs for the errors.

The NetXMS server has 4 network cards (one for each monitored subnet) to prevent passing the firewall. The simple network routing is: Agent-PC -> HPE/Aruba-Switch -> ESXi Host -> Server VM.

I also tried to bind a tunnel between the agent on the server vm and the netxms server. So data is not leaving the vm at all. In this configuration I see a second error in the agent log:

SSL_write_error (bytes=-1 ssl_err=5 errno=2)
TlsMessageReceiver: SSL_read error (ssl_err=5 errno=0)

Thanks & regards
Detlev

StanHubble

Are the subnet masks the same on the agent and the server?
If they are then you may have a problem with a duplicate subnet if they are not 'zoned'.
Most tunnelling cannot work with the same subnet on each end unless it is in a 'bridge' mode.

just my 2 cents.