Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - jermudgeon

#31
General Support / Re: Grafana and java errors
June 13, 2019, 06:53:34 PM
I narrowed the error down to the included json-20160212.jar (maven dependency?)

I replaced this with json-20180813.jar

json-20180813.jar works for non-table DCIs.
#32
General Support / Re: grafana tables
June 13, 2019, 05:31:37 PM
Is this still true, that table DCIs are still unsupported?

Here's the reason why I'm trying to use table DCI instead of instance discovery.

I'm dealing with a hardware vendor that has an unusual method of handling interfaces. It maintains a fixed interface table size (64) for PTMP radio operations. When a client radio attaches, it does so at the next available ifIndex.

In a situation where the client radio is detaching and re-attaching frequently, the ifIndex might change more rapidly than the server-InstancePollingInterval. I don't believe I can specify a custom InstancePollingInterval for a specific class of devices using a template.

So in order to track the stats of a client radio over time, instance discovery will not work as expected. Let's say I do instance discovery using MAC address, and the ifIndex of a given MAC address changes. A new DCI would be created and the old DCI would be queued for deletion. That would be incompatible with the goal of retaining statistics over a long period of time.

Our proposed workaround is simply to store the entire interface table, deleting rows where there is no client currently attached. This works as expected. We won't use NetXMS for visualization, however, as a given row/cell in the table can only be graphed (over time) by its row index, not by some chosen field (MAC address, in this instance).

Unfortunately without the Grafana connector working for tables, I'm scratching my head of how to get the data back out of the NetXMS database and display it correctly.

I'm open to other ideas -- and perhaps this is an opportunity for development sponsorship.
#33
General Support / Grafana and java errors
June 13, 2019, 03:27:56 AM
Core: 2.2.15-2
Web Svc: 2.2.15-2 running under Tomcat8

Grafana is installed and data collector successfully configured. Alarm queries function.

DCIs are visible in enumeration query, but actually trying to graph a DCI results in an error:
Grafana:
Object
xhrStatus:"complete"
request:Object
method:"GET"
url:"api/datasources/proxy/1/grafana/datacollection"
data:null
params:Object
interval:600000
from:_
to:_
targets:"[{"dci":{"name":"xxxxxxxxxx-removed","id":"190615","$$hashKey":"object:518"},"dciTarget":{"id":"20810","name":"xxxxx-removed"},"legend":"xxxxx-removed","refId":"A","type":"DCI"}]"
response:Object
description:"org.json.JSONArray.iterator()Ljava/util/Iterator;"
error:46


Tomcat:
org.netxms.websvc.WebSvcStatusService                   | Internal error
java.lang.NoSuchMethodError: org.json.JSONArray.iterator()Ljava/util/Iterator;

#34
Thanks, Victor.

I can confirm that this works for both base stations and clients:
[jaustin@jaustin applications]$ snmpwalk -v1 -c xxxxx <mgmt address> 1.3.6.1.4.1.4458.1000.1.1.6
iso.3.6.1.4.1.4458.1000.1.1.6.0 = IpAddress: <mgmt IPv4 address>
[jaustin@jaustin applications]$ snmpwalk -v1 -c xxxxx <mgmt address> 1.3.6.1.4.1.4458.1000.1.1.7
iso.3.6.1.4.1.4458.1000.1.1.7.0 = IpAddress: <mgmt IPv4 mask>

Would a Radwin driver fix the problem of the response to the ping3 driver? Currently I'm having to disable the ping3 driver, as I don't have any ping3 devices, and the Radwins reject the entire SNMP discovery query (currently three OIDs, two in mib2 and one from ping3 private, IIRC).
#35
Thanks, Victor; duplicate detection makes sense. For some reason I had thought that it was enabled by default. Will let you know if we still have issues.
#36

Version:
NetXMS Server Version 2.2.15 Build 9523 (d67b96f) (UNICODE)
NXCP: 4.48.1.18 (AES-256, Blowfish-256, 3DES, AES-128, Blowfish-128)
Built with: g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516


I have some Radwin HSU devices that have odd SNMP characteristics. (For example, the ping3 driver breaks discovery of Radwin devices.)

They don't report their management IP on the correct interface -- the management interface is discoverable but has a loopback IP on it rather than the correct management IP.

One consequence is that the correct network subnet is not created upon device discovery, as NetXMS has no way of determining the correct mask for the IP, and we are not currently discovering the parent router(s) that provide the gateway (and correct mask) for these devices.

Even though these discovered nodes do not appear in "Entire Network", I am able to bind the "missing" devices to a container using the following function:
/* Function utilized to find objects that are missing a parent subnet
* in Entire Network. This should not be happening, but it do.
*
*/

sub BindUnboundNodes() {
parents = GetNodeParents($node);

parentContainers = 0;
foreach(p : parents) {
    // we are only interested in subnets
    if (p->type == 1) {
// if we find any subnet at all, this object is bound
            parentContainers++;
       
    }
}

toBind = parentContainers == 0;

if (toBind) {
    trace(0, "Node '" . $node->name . "' has no parent subnet.");
}

return toBind;
}


Another odd behavior is that if I put the relevant discovery networks into a zone other than Default, devices fail to be added at all.

#37
General Support / Duplicate node detection vs. DNS?
June 05, 2019, 06:16:12 PM
Enabled server flags:
EnableZoning 1
UseDNSNameForDiscoveredNodes 1
SyncNodeNamesWithDNS 1
NetworkDiscovery.EnableParallelProcessing 1

Behavior:
SNMP-capable devices with multiple IP addresses in discovery networks are discovered multiple times. What we want is for the first discovered IP to be the primary host name.
Reverse DNS for each IP is unique.
For example,
ip1.node.some.domain <- primary, wanted
ip2.node.some.domain <- not wanted
ip3.node.some.domain <- not wanted

Shouldn't sysOID duplicate detection prevent the creation of duplicate nodes?
#38
General Support / Re: DCI deletion failure
May 16, 2019, 05:26:49 PM
Thank you, Victor. I may build from source... but is there a timetable or release schedule for 2.2.15?
#39
I'm seeing this same problem in 2.2.14. Thread loads appear normal.

Scenario to reproduce:

1. Create templated DCIs (~80k DCIs in this instance).
2. Disable DCI templates.
3. 80k alarms are created upon disabling DCIs.
4. Terminate alarms (4096 at a time).

I can prevent the client from crashing by immediately closing the alarm window after terminating 4096 alarms. The job will still timeout, but the errors will be terminated.
#40
General Support / DCI deletion failure
May 14, 2019, 10:12:51 PM
Testing DCI templating with Postgres+timescale.

Deleting some objects results in a corrupted SQL call:

https://paste.ee/p/VRduy
#41
General Support / Hook::CreateSubnet
May 14, 2019, 07:16:38 PM
I'm testing Hook::CreateSubnet functionality in 2.2.14 HEAD.

- Is there documentation of the NXSL 'Subnet' class?

- I note that the local server node is automatically discovered and added to the node database. Multiple instances (!) of one of the IPv4 subnets get created. (In this case, there are multiple IPv4 addresses on the server in that subnet.) Is the subnet creation logic not checking for a pre-existing subnet when multiple addresses exist on the same interface?
#42
I have been seeing some strange behavior for a few days. Database performance seems fine, running on flash. Attempting to handle/terminate/resolve more than even a few alarm entries at once results in a pegged CPU core with netxmsd.

netxmsd: show dbstats
SQL query counters:
   Total .......... 2061537
   SELECT ......... 861140
   Non-SELECT ..... 1200397
   Long running ... 0
   Failed ......... 0
Background writer requests:
   DCI data ....... 20263
   DCI raw data ... 20262
   Others ......... 49


netxmsd: show msgwq
0 active queues
Housekeeper thread state is RUNNING


Show pollers shows about half and half in cleanup and awaiting execution.

netxmsd: show queues
Data collector                   : 0
DCI cache loader                 : 0
Template updates                 : 0
Database writer                  : 0
Database writer (IData)          : 0
Database writer (raw DCI values) : 0
Event processor                  : 0
Event log writer                 : 0
Poller                           : 0
Node discovery poller            : 0
Syslog processing                : 0
Syslog writer                    : 0
Scheduler                        : 0



Show stats will time out while the CPU core is pegged.

netxmsd: show watchdog
Thread                                           Interval Status
----------------------------------------------------------------------------
Item Poller                                      10       Running
Syncer Thread                                    30       Sleeping
Poll Manager                                     5        Sleeping
Ad hoc scheduler                                 5        Sleeping
Recurrent scheduler                              5        Sleeping



Stopping the netxmsd process and repairing the DB will resolve the stuck CPU temporarily.

Viewing the logs, I get a lot of "Poll Manager" does not respond to watchdog thread.
Anything further to check?
#43
General Support / Re: Radwin discovery failing
May 13, 2019, 05:58:16 PM
I will provide a tcpdump as soon as I can. I'm running into an odd situation elsewhere and am likely going to start the db again from scratch first.
#44
General Support / Re: Template DCI disappearance?
May 13, 2019, 05:57:07 PM
Thanks, Victor.
#45
General Support / Re: Radwin discovery failing
May 07, 2019, 01:20:57 AM
I also tried manually adding nodes. Even with manual binding of SNMP string and version, discovery tries all configured SNMP community strings. This is surprising.

Interface discovery consequently fails due to the response (ping3 driver):

14:17:45.602526 IP aaaa.35312 > bbbb.snmp:  C="xxxx" GetRequest(59)  system.sysObjectID.0 system.sysDescr.0 E:35160.1.1.0
14:17:45.630758 IP bbbb.snmp > aaaa.35312:  C="xxxx" GetResponse(59)  genErr@3 system.sysObjectID.0= system.sysDescr.0= E:35160.1.1.0=