SSH DCI Collection

Started by lweidig, November 13, 2021, 12:24:52 AM

Previous topic - Next topic

lweidig

We have some scripts on Mikrotik routers that generate a JSON string output and would like to start using that in NetXMS for monitoring.  However, any combination I try to get this working seems to end with the DCI being in << ERROR >> state.  I ran the agent with debug 7 and this is what we get:

2021.11.12 16:16:01.638 *D* [comm.cs.3          ] Requesting metric "SSH.Command(10.0.xxx.xxx:22,"admin","NOTREALPW",":global inPool ""client"";/system script run dhcpPoolMon","",0)"
2021.11.12 16:16:01.638 *D* [ssh                ] AcquireSession: acquired existing session [email protected]:22/2
2021.11.12 16:16:01.657 *D* [ssh                ] SSHSession::execute: read error: Remote channel is closed.
2021.11.12 16:16:01.657 *D* [ssh                ] SSH output is empty


I did a bunch of searching and currently have an ssh agent config file which looks like:

# cat /etc/nxagentd-ssh-config
HostKeyAlgorithms +ssh-dss
KexAlgorithms +diffie-hellman-group1-sha1


As most others I can of course SSH from the command line on this server to the Mikrotik just fine, use the exact same login / command and get the desired results.  Need some assistance getting this going.  We have the server / agent at 3.9.344 and this is running on an Ubuntu 20.04.3 server.

lweidig


Victor Kirhenshtein

Hi!

Could you please share script you are using? I did some tests and otherwise SSH to Mikrotiks seems to be working.

Best regards,
Victor

lweidig

Here is the Parameter we use for the DCI:

:global inPool "client";/system script run dhcpPoolMon

The script simply counts up the total number of leases available / in use in that DHCP pool and reports back with a JSON formatted response.  The script is:

:global inPool

:local ipUsed 0
:local ipTotal 0
:local nextPool $inPool
:local currPool

# Process the pool passed in and any next pools as one block
:do {
  :set currPool $nextPool
#  Each pool can have multiple ranges listed
  :foreach range in=[/ip pool get "$currPool" ranges] do={
    :local rangeString [:tostr  $range]
    :local ipStart [:toip [:pick $rangeString 0 [:find $rangeString "-"]]]
    :local ipEnd [:toip [:pick $rangeString ([:find $rangeString "-"] + 1) 31]]
    :set ipTotal ($ipTotal + $ipEnd - $ipStart + 1)
#   Read through leases and count ones in this pool
    /ip dhcp-server lease
    :foreach i in=[find] do={
      :local ipActive [:toip [get $i active-address]]
      :if (($ipActive >= $ipStart) && ($ipActive <= $ipEnd)) do={
        :set ipUsed ($ipUsed + 1)
      }
    }
  }
  :set nextPool [:tostr  [/ip pool get "$currPool" next-pool]]
} while ([:len $nextPool] > 1)

put "{total:$ipTotal, used:$ipUsed}"


That way pools can be expanded prior to being exhausted.  Thanks!

lweidig

@Victor - Wondering if you have had any opportunity to look into this.  We are running 3.9.420 and still having issues with this. 

Victor Kirhenshtein

Hi,

I just configured it on my home Mikrotik routers and it works, so I suspect it has something to do with Mikrotik firmware version. Mine have 7.1.1 and 6.49.1.
I will also check if I can add additional diagnostic to SSH subagent.

Best regards,
Victor

lweidig

#6
Ours is running 6.48.6 which is in the long term channel where we like to be.  We might have one running 6.49.1 and I will look around, but we have not yet ventured into the 7.x.x line.  You are a brave man - though you did say it was at home.  I might upgrade my home one to that and see as well.  Is your server Ubuntu 20.04 as well? 

Could you post the SSH properties setup on the Node and also the General properties for the DCI.  I just keep thinking something is not right on my side.



Appreciate the assistance!

Victor Kirhenshtein

Hi,

I did some more accurate testing and it seems to be my failure - I manage to get same error with agent 3.9, but it works fine if built from master branch. So looks like we already fixed something, I just have to find out what exactly :)

Best regards,
Victor

Victor Kirhenshtein

So, after more digging and experimenting, it looks like a bug in libssh (that one: https://github.com/ParallelSSH/ssh-python/issues/23, https://bugzilla.redhat.com/show_bug.cgi?id=1849069), said to be present in libssh versions up to 0.9.5. I've tested agent with libssh 0.9.6 and it works stable as expected. Master branch behaves a bit better with older libssh because of different timing (caused by internal structure changes), but still runs into this error periodically.
Unfortunately, Ubuntu 20 comes with libssh 0.9.3, and I suppose they will not update it to 0.9.6 or newer. I will add workaround for that bug (interpret remote channel closure as success if some data was received before), that should fix it for our purposes.

Best regards,
Victor

lweidig

Was hoping this was resolved in 4.0.2157 as I saw this in release notes:

- Improved SSH subagent

The issue still however persists with this release.

Victor Kirhenshtein

Hello,

I tried to reproduce this issue with latest agent, but it works as expected for me. Are you sure agent used as a proxy for SSH connection was upgraded to 4.0.2157? If yes, please try to set debug level to 8 and send me excerpt from the log around request for parameter SSH.Command.

Best regards,
Victor

lweidig

If I actually remove the agent and let it poll directly from the server with the node containing the credentials I seem to be getting the results now.  Will test this on a larger batch of devices, but hoping with that configuration setup it will work.  No agent involved at all.  Thanks for checking into this!

Victor Kirhenshtein

Actually, agent is already involved in SSH polling. If SSH proxy is not specified server will use it's local agent as a proxy.

Best regards,
Victor