Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Messages - jermudgeon

#1
Thanks Filipp.
#2
The grep could afford to be a bit more detailed, for example, 'bin/netxmsd', as any open/tailed netxms log files will pass the grep and cause script failure later on.

#3
I have 82421 collectible DCIs and an additional 25552 that are missing a corresponding node.

nxdbmgr check does not find and purge these.

There do not appear to be any leftover data in the corresponding idata tables.

select
   count(*)
   from items i
   left join object_properties p on i.node_id = p.object_id
   where p.name is null
#4
Thanks, Victor.

[jaustin@jaustin ~]$ nxdbmgr background-upgrade
NetXMS Database Manager Version 3.3.285 Build 3.3-285-gfe2e9b646f (UNICODE)

No pending background upgrades
#5
Postgres backend is functioning, but netxms reports 'Server has lost connection with backend database'.

sh dbcp
netxmsd: sh dbcp
0x7f8745e44780 04.May.2020 17:42:14 dbwrite.cpp:457
0x7f873cbd5d80 04.May.2020 17:42:15 dcitem.cpp:1309
0x7f873cbd5960 04.May.2020 17:42:05 syncer.cpp:238
3 database connections in use

Stuck since yesterday.

sh q

netxmsd: sh q
Data collector                   : 0
DCI cache loader                 : 46673
Template updates                 : 0
Database writer                  : 5
Database writer (IData)          : 3354870
Database writer (raw DCI values) : 28133
Event processor                  : 1334
Event log writer                 : 0
Poller                           : 1319
Node discovery poller            : 1187
Syslog processing                : 0
Syslog writer                    : 0
Scheduler                        : 0


dbcp reset doesn't do much:

netxmsd: dbcp reset
Resetting database connection pool
Database connection pool reset completed
netxmsd: sh dbcp
0x7f8745e44780 04.May.2020 17:42:14 dbwrite.cpp:457
0x7f873cbd5d80 04.May.2020 17:42:15 dcitem.cpp:1309
0x7f873cbd5960 04.May.2020 17:42:05 syncer.cpp:238
3 database connections in use

#6
General Support / 3.3 timescale upgrade procedure
May 05, 2020, 06:53:08 PM
               
WARNING: Background upgrades pending. Please run nxdbmgr background-upgrade when possible.
[jaustin@jaustin systems]$ nxdbmgr background-upgrade                                     
NetXMS Database Manager Version 3.3.285 Build 3.3-285-gfe2e9b646f (UNICODE)
                                                                           
Running background upgrade procedure for version 33.6
Converting table idata_sc_default                   
Converting table idata_sc_7     
Converting table idata_sc_30                                                   
Converting table idata_sc_90                                             
Converting table idata_sc_180                                   
Converting table idata_sc_other                                                                                                                                           
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corru
pted shared memory.                                                                       
HINT:  In a moment you should be able to reconnect to the database and repeat your command.

----

I restarted the background-upgrade process and got this:

RNING: Background upgrades pending. Please run nxdbmgr background-upgrade when possible.
[jaustin@jaustin systems]$ nxdbmgr background-upgrade
NetXMS Database Manager Version 3.3.285 Build 3.3-285-gfe2e9b646f (UNICODE)

Running background upgrade procedure for version 33.6
Converting table idata_sc_default
SQL query failed (42P01 ERROR:  relation "v33_5_idata_sc_default" does not exist
LINE 1: ...stamp(idata_timestamp),idata_value,raw_value FROM v33_5_idat...
                                                             ^):
INSERT INTO idata_sc_default (item_id,idata_timestamp,idata_value,raw_value) SELECT item_id,to_timestamp(idata_timestamp),idata_value,raw_value FROM v33_5_idata_sc_default
Background upgrade procedure for version 33.6 failed


----

So apparently the upgrade process can't handle a resume, as it breaks on the tables that have already been migrated. (It had already dropped successful tables.)

I manually completed the inserts from the remaining tables using ON CONFLICT DO NOTHING, and dropped the remaining v33_5 tables.

However, the same problem exists -- I can't complete the upgrade because the v33_5 tables no longer exist.
#7
Announcements / Re: NetXMS 3.3 released
April 30, 2020, 06:39:26 PM
 :)
#8
General Support / Rate of deleted objects
March 23, 2020, 05:25:57 PM
Is there a way to speed up deletion of objects? syncer.cpp appears to be deleting objects one at a time.

In this particular case, we're removing many individual interfaces prior to deletion of the parent node. The time to delete each interface is approximately 1 second, and then the next interface is queued for deletion.

Is there no way to batch the deletions so they happen faster?
#9
Thanks again, Filipp. That's quite interesting; I'm seeing an average of 10x more objects per node than in your examples. This could be due to device types, of course. I will see if I can determine relative counts for different types of objects.
#10
Thanks Filipp — that's very helpful. Do you have object counts for the three deployments? I'm having more issues with high object counts than with high DCI counts.

I am currently testing in the ~100k DCI range, and that's gone well for a number of months — working with TimeScaleDB.

However, I began adding nodes (without adding significant DCIs) with many interfaces, and as my object count rose over 1 million (1,000,000) I began seeing escalating CPU usage that didn't seem to scale linearly, even with topology and route table scanning turned off.

In addition, I'm having fairly pesky performance deleting objects, but not yet clear enough behavior to engage support.
#11
General Support / Large-scale NetXMS deployments?
March 17, 2020, 08:10:13 PM
Given the lack of a comprehensive NetXMS scaling guide, I'm interested in talking with people who have NetXMS deployed in

1) Deployments with horizontal load balancing of polling, discovery, etc.
2) Deployments with large devices — hundreds of interfaces, thousands of MAC addresses
3) Deployments with at least 5000 nodes, scaling up to 20k-30k nodes

In particular, I'm looking for solid scaling values for:

A) netxmsd RAM usage per node
B) average nodes / poller (thread pool POLLERS) (in the context of avg DCI per node)

I'm having issues around the 100k DCI mark, but it doesn't apear to be DCIs themselves that are the bottleneck.

TIA
Jeremy
#12
General Support / Re: Radwin discovery failing
March 09, 2020, 10:57:25 PM
Saw that, Victor -- you rock! Thanks.
#13
General Support / Re: Radwin discovery failing
March 03, 2020, 08:29:03 PM
Here is the promised capture file. (Regular SNMP polling stripped out.) It shows the response to the discovery packet when ping3 is enabled, and the response when ping3 is disabled.
#14
General Support / Re: Netxms 3.1.261 and device discovery
December 24, 2019, 05:56:22 PM
Thank you, Filipp. Has anyone written such a script yet that you know of?
#15
General Support / Netxms 3.1.261 and device discovery
December 12, 2019, 10:11:19 PM
I have a large class of devices (perhaps mostly Cisco?) that are failing discovery in an odd way.

1) Devices are detected with isSNMP=Yes, GENERIC driver, and added to db
2) Manual inspection shows that devices were added with SNMP 'public' community
3) Devices respond to a snmpwalk using 'public' with the following:
iso.3.6.1.2.1 = No more variables left in this MIB View (It is past the end of the MIB tree)
Note that this is a different response than with an invalid string; with an invalid string, queries just time out. With this (ACLed) string, 'public' simply has no allowed views.
4) Devices (and discovery) are configured with a *different* SNMP string which does actually work via a walk, but not via discovery.

Is there a way to change discovery behavior to try 'public' *last*? There doesn't appear to be an order in the SNMP Configuration that's relevant.

Is there a way to batch change configured SNMP communities on nodes? Batch 'properties' change doesn't seem to exist in the UI. Better yet, can I do this with NXSL? I'm not seeing an attribute that lets me check or set the SNMP community.

Thanks