Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - rgkordia

#1
Upon arriving in the office this morning, I'm getting the following error repeating every ~50 seconds:

2019.10.03 08:54:42.739 *E* [db.driver          ] SQL query failed (Query = "INSERT INTO idata_12102 (item_id,idata_timestamp,idata_value,raw_value) VALUES (64537,1570043031,'0','0')"): Lock wait timeout exceeded; try restarting transaction

Different values, but always table idata_12102.  I'm using MariaDB (latest 10.1.41 on Windows) and I've run "SHOW OPEN TABLES WHERE in_use>0" which always shows the following:


Database      Table            In_use        Name_locked
--------------------------------------------------------
netxms_db     idata_12102           2                  0


The server (build 2305) has been up around 18 hours or so.  I attempted to shut down the server - it started to shut down but seemed to hang (and the lock message again continues to repeat every 50 seconds).  After waiting around 5 minutes I decided to kill the server.

After the server was terminated, the in_use still remained at 2, then after a minute or so it reduced to 1.  I waited around 5 minutes but the in_use count never reached 0. 

I then shutdown the MariaDB server, which took around 8 minutes, during which the mysql process had high CPU and high disk IO, so I assume there was considerable uncommitted data that is either committing or rolling back.  After restarting MariaDB, the in_use query returned no results.  Admittedly I didn't check MySQL activity whilst I was waiting for NetXMSd to terminate, so possibly it was attempting to commit these transactions which I inadvertently interrupted.

I then restarted netxmsd and the in_use query returns no results (no active locks).  From my graphs via the GUI console I can see that the past 2.5 hours of DCI values are missing, so I assume all those transactions were waiting on the lock to clear which never did and were rolled back.  New DCI values are populating as normal.

After doing some investigation, I see from the windows event logs that this error relating to lock on idata_12102 started around the time where my data loss begins (8:08am).  However, prior to that I see lock errors relating to different tables dating back to around 2am.  I assume these lock issues eventually cleared.

Worth a mention is that since upgrading to v3 I had some errors about the DB connection pool being full so I increased DBConnectionPoolBaseSize to 30 and DBConnectionPoolMaxSize to 100.  Although these errors still seem to persist looking at the logs.

Attached is an export of the windows event log for NetXMScore relating to the ~18 hours of runtime during which this lock event occurred.

Regards,
Richard
#2
General Support / v3 - Console bug editing dashboards
October 02, 2019, 02:28:02 AM
When I open a dashboard in Edit Mode, make some changes, but decide not to save, the changes remain persistent until I close and reopen the console again.

Console build 2284.
#3
Hi.

I've just upgraded to v3 (2305).  I'm having a problem accessing some devices in the console, and it appears to be the ones with large numbers of interfaces or DCI's.

I don't recall the issue with v2, although I did add a few more DCI's since upgrading to v3 which could have tipped it over some limit.

Symptoms:

When I right click on a node, sometimes get the hourglass for ~30s and then nothing happens.  Right click again and get a somewhat blank / corrupted popup menu about 1/3rd the size of normal, mostly blank, a few populated items.  If I click on another (smaller) device it seems to clear the problem and I can then right click successfully on the bigger device.  However, attempting to access the DCI configuration, I get the DCI tab pop up but the panel is completely blank.  Not even a blank table, just a grey window with no widgets.

The device in question has 350 interfaces, and I would estimate (can't find out exactly) around 4,000 DCI's

Sometimes when this condition occurs, the console starts acting oddly, and I can't access other things like server console, or edit a dashboard (get blank grey tab).

This is running locally on the server (Windows).  6 CPU, 8GB RAM (mostly dedicated to MariaDB cache).  Also tried console from a remote PC and same issue.  Console is 2284.

Rich
#4
General Support / netxmsd seg-fault
September 14, 2018, 10:53:56 AM
Fairly new install, 2.2.8 on Debian 9.3.

Recently enabled the ping subagent, and configured a few Average Ping and Packet Loss monitors based on a template to around 8 nodes.

Sep 14 19:49:38 netxms1 kernel: [337446.741930] ItemPoller[1969]: segfault at 52e ip 000000000000052e sp 00007f74c70ca3e8 error 14 in netxmsd[564b357d2000+93000]

Anything else I can provide?
#5
Hi,

I have a few years' history in my NetXMS and I now want to apply some different calculations on my past data before it gets graphed.  For example, I'm taking Input/Output readings every 60 seconds on a particular interface and I have around 2 years' worth of data.  I now want to average over 15 and 60 minutes (so loading of the graph doesn't timeout) and also calculate 95th percentile.

I can create scripts to perform the transform/calculation, and I'm thinking I need to write my recalculated data to a separate DCI but I'm unsure how to do this.  I see the PushDCIData function which seems to almost achieve what I want, but I would also need to push a timestamp with the data.

Or am I looking at this the wrong way?

Thanks,
Richard
#6
General Support / Cannot access $dci from script
August 22, 2018, 08:30:47 AM
Hi,

I want to read the DCI's name from a script, and I've seen other posts referencing the $dci->name (or similar) variables.  But when I attempt to access $dci my script was crashing.  Very basically, I was trying to execute trace(1, "dciname=".$dci->name);

To test this I created a very simple script in the script library:

sub main()
{
    if ($dci == null)
        trace(1, "dci is null");
    else
        trace(1, "dci is not null");
}

I then created a DCI on an existing router node, set the type to "script" and selected the above script.

In my logs it shows "dci is null".

How can I get the DCI's name from within my script.

I'm running v2.1 on Windows.

Thanks,
Richard
#7
Hi,

I have an issue when trying to graph a period of time over 1 week long.  I collect DCI's per minute and when I attempt to graph more than 1 week I get timeouts from the client:

    * Cannot get value for DCI routername:"Input Bandwidth on GigabitEthernet1/0/1" (Request timed out)

I'm running MariaDB 10.1 on Windows Server 2012 R2 and NetXMS 2.0.4 all on one box.

I migrated (~3 months ago) from MSSQL 2010, where the DB was on our corporate cluster offbox.  Didn't have the issue with this setup. 

This graph has 4 series (in/out for primary/secondary link) so I appreciate there is a lot of data.  What I want to know is:

a) Is there a way to increase the timeout in the client?
b) Is there a performance tweak I can do with MySQL/MariaDB that will resolve these issues?

Sometimes if the period is not too long it gets the data after a refresh, presumably due to cache, but longer periods just refuse to graph.

Thanks,
Richard
#8
Is there a way to refresh the description that was generated during the instance discovery script? 

We've updated our interface descriptions on the switches and I want them to be reflected in the DCI's that were discovered a long time back, but I don't want to delete the DCI's and lose their history.

Running version 1.2.17.

Rich
#9
General Support / Reconnect DB after failure
February 02, 2016, 02:55:41 AM
Hi guys,

I have a 1.2.17 installation sucessfully running for about a year now.  This weekend our DB server had a non-critical failure which recovered after some minor maintenance.  I saw in the NetXMS console the queues for Database writer, IData and Raw DCI values were high (~10m combined).  After the DB was back up I saw the Database writer queue gradually reduce to zero, then the database writer (raw DCI values) queue reduce to zero, but the IData queue kept increasing.  At this point I was still missing DCI data from for the weekend which I had hoped would catch up. 

I left it for a while but in the end I restarted the NetXMS core service and everything is now working fine, but I lost this weekends worth of data.  IData queue is now zero again.

Is there any way to force a DB reconnect / resync without restarting the NetXMS core (and thus losing the data)?

I'm using MSSQL 2010 for the DB and Windows 2008R2 for the NetXMS server.

Thanks,
Richard
#10
Hi,

I've been using 1.2.17 for a number of months now - fantastic product and generally working very well.

One issue - I notice that we get a lot of SNMP AUTH events from our Cisco 6509 switches every so often.  Data collects fine via SNMP for these devices, so I did a packet capture and found that there's some SNMP get-next-request requests for OID 1.3.6.1.2.1.17.4.3.1.1 that are using an incorrect community.  NXMS is sending our normal community, but suffixed with some extra characters.

eg: If our community were "public" the above mentioned requests would use a community of "public@301" on some requests, "public@302" on others.

It seems this happens on a schedule and can also be triggered by doing a topology poll.  The Layer 2 switch forwarding database for these devices are blank in NXMS.

Only happens on our 6509.  Our other Cisco switches are fine and show valid layer 2 fdb's.

A bug or config issue?

Thanks,
Richard
#11
General Support / Interface names in instance discovery
January 15, 2015, 01:23:07 AM
Hi guys,

I've seen a number of posts around this topic, but I'm having trouble getting this to work. 

Problem: when I create a DCI that uses instance discovery (eg: ifInOctets) I can't seem to get the interface description into the name of the DCI.

I've tried using the %{script:xxxxx} approach in the DCI name, but I can't find out how to pass {instance} as a parameter to the script and $dci appears to not be set in the script.
I've tried using the return %(true,"<interface_name>") method in the instance filter, but this breaks the {instance} value for the OID.

Any help on a working solution?

Here's an example of my current DCI config:

Description: Input Bandwidth (bps) on {instance}
SNMP OID: .1.3.6.1.2.1.2.2.1.10.{instance}
Transformation: Delta/sec, and $1 * 8 in the script
Instance discovery: SNMP walk of OID .1.3.6.1.2.1.2.2.1.10

Thanks,
Richard
#12
Hi,

When I add an F5 BIG-IP node, the IP addresses of the interfaces are not discovered.  When I do a walk of .1.3.6.1.2.1.4.20 I can see all the relevant IP's, but they are not discovered automatically by NetXMS.  SNMPTraps (which the BIG-IP is originating from a different source IP than the primary IP) are not being registered to the correct object and I can't seem to manually add the IP's to the interfaces.

Is there anything I can tune to force the discovery of the IP's? 

Regards,
Richard