Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - scomoletti

#1
I've tried nxagent-3.0.2258-linux-x86_64-static.tar.gz, nxagent-3.0.2357-linux-x86_64-static.tar.gz, and nxagent-3.0.2258-linux-x86_64-static.tar.gz all with the same results. Configuration poll crashes the agent 90% of the time. After it worked successfully one time if I do not run another config poll it would stay running and correctly poll dcis but if I click on the 'user sessions' tab it immediately crashes.

Server side is Netxms 3.0.2258 on CentOS 7.5.1804 and has been stable without issue.

abrt output with core and backtrace is attached.

Configuration full output as follows:
[07.11.2019 13:00:03] **** Poll request sent to server ****
[07.11.2019 13:00:03] Poll request accepted
[07.11.2019 13:00:03] Starting configuration poll for node nj-prod-analytics
[07.11.2019 13:00:03] Capability reset
[07.11.2019 13:00:03] Checking node's capabilities...
[07.11.2019 13:00:03]    Checking NetXMS agent...
[07.11.2019 13:00:03]    Connectivity with NetXMS agent restored
[07.11.2019 13:00:03]    NetXMS agent version changed to 3.0.2357
[07.11.2019 13:00:03]    Platform name changed to Linux-x86_64
[07.11.2019 13:00:03]    System description changed to Linux nj-prod-analytics-1 3.10.0-327.28.2.el7.x86_64 #1 SMP Wed Aug 3 11:11:39 UTC 2016 x86_64
[07.11.2019 13:00:15] Capability check finished
[07.11.2019 13:00:15] Checking interface configuration...
[07.11.2019 13:00:15] Unable to get interface list from node
[07.11.2019 13:00:15]    Interface "unknown" is no longer exist
[07.11.2019 13:00:15] Interface configuration check finished
[07.11.2019 13:00:15] Checking node name
[07.11.2019 13:00:15] Node name is OK
[07.11.2019 13:00:15] Reading list of installed software packages
[07.11.2019 13:00:15] Unable to get information about installed software packages
[07.11.2019 13:00:15] Reading list of installed hardware components
[07.11.2019 13:00:15] Cannot read hardware component information
[07.11.2019 13:00:15] Finished configuration poll for node nj-prod-analytics
[07.11.2019 13:00:15] Node configuration was changed after poll
[07.11.2019 13:00:15] **** Poll completed successfully ****


And debug logs from server at the time of the crash:
2019.11.07 18:00:03.226 *D* [agent.conn.48      ] Received message CMD_REQUEST_COMPLETED (4) from agent at 10.172.102.228
2019.11.07 18:00:03.226 *D* [client.session.2   ] Sending message CMD_POLLING_INFO (96 bytes)
2019.11.07 18:00:03.226 *D* [client.session.2   ] Message dump:
  ** 000000: 005A5000000000600000084D00000002 .ZP....`...M....
  ** 000010: 0000001C000000000000001700000000 ................
  ** 000020: 0000006C070000000000002D2020204E ...l.......-   N
  ** 000030: 6574584D53206167656E742076657273 etXMS agent vers
  ** 000040: 696F6E206368616E67656420746F2033 ion changed to 3
  ** 000050: 2E302E323335370D0A00000000000000 .0.2357.........
  ** code=0x005A (CMD_POLLING_INFO) version=5 flags=0x0000 id=2125 size=96 numFields=2
  ** 000000: [    28] INT32       23
  ** 000010: [   108] UTF8-STRING "   NetXMS agent version changed to 3.0.2357^M
"

2019.11.07 18:00:03.226 *D* [agent.conn.48      ] Sending message CMD_GET_PARAMETER (5) to agent at 10.172.102.228
2019.11.07 18:00:03.226 *D* [agent.conn.48      ] Received message CMD_REQUEST_COMPLETED (5) from agent at 10.172.102.228
2019.11.07 18:00:03.226 *D* [agent.conn.48      ] Sending message CMD_GET_PARAMETER (6) to agent at 10.172.102.228
2019.11.07 18:00:03.227 *D* [agent.conn.48      ] Received message CMD_REQUEST_COMPLETED (6) from agent at 10.172.102.228
2019.11.07 18:00:03.227 *D* [client.session.2   ] Sending message CMD_POLLING_INFO (88 bytes)
2019.11.07 18:00:03.227 *D* [client.session.2   ] Message dump:
  ** 000000: 005A5000000000580000084D00000002 .ZP....X...M....
  ** 000010: 0000001C000000000000001700000000 ................
  ** 000020: 0000006C070000000000002A20202050 ...l.......*   P
  ** 000030: 6C6174666F726D206E616D6520636861 latform name cha
  ** 000040: 6E67656420746F204C696E75782D7838 nged to Linux-x8
  ** 000050: 365F36340D0A0000 6_64....
  ** code=0x005A (CMD_POLLING_INFO) version=5 flags=0x0000 id=2125 size=88 numFields=2
  ** 000000: [    28] INT32       23
  ** 000010: [   108] UTF8-STRING "   Platform name changed to Linux-x86_64^M
"

2019.11.07 18:00:03.227 *D* [agent.conn.48      ] Sending message CMD_GET_PARAMETER (7) to agent at 10.172.102.228
2019.11.07 18:00:03.227 *D* [agent.conn.48      ] Received message CMD_REQUEST_COMPLETED (7) from agent at 10.172.102.228
2019.11.07 18:00:03.227 *D* [agent.conn.48      ] Sending message CMD_GET_PARAMETER (8) to agent at 10.172.102.228
2019.11.07 18:00:03.227 *D* [agent.conn.48      ] Received message CMD_REQUEST_COMPLETED (8) from agent at 10.172.102.228
2019.11.07 18:00:03.227 *D* [client.session.2   ] Sending compressed message CMD_POLLING_INFO (168 bytes)
2019.11.07 18:00:03.227 *D* [client.session.2   ] Message dump:
  ** 000000: 005A5040000000A80000084D00000002 [email protected]....
  ** 000010: 000000B078DA2D8EB10AC2300044B309 ....x.-....0.D..
  ** 000020: 82B38BC381734293485BDDC45541A8E2 .....sB.H[..UA..
  ** 000030: 28A109355293D2A4D08EFEB985781CBC (..5R........x..
  ** 000040: E5711C21644352D67FB68BC42F806A0A .q.!dCR...../.j.
  ** 000050: D17CA04DA87BDB45EB1DEA97728DD188 .|.M.{.E....r...
  ** 000060: 1E67EB8611EE4DBBDE6BAA9C6AA768EB .g....M..k..j.h.
  ** 000070: 403924E319CBA81405132513CCB4051B @9$.......%.....
  ** 000080: CBFC99EFB0E5A82E573CE689E3D04082 ........W<....@.
  ** 000090: F3C35CB9C7FD7682C8788E64AE96F389 ..\...v..x.d....
  ** 0000A0: 1F816E2622000000 ..n&"...
  ** code=0x005A (CMD_POLLING_INFO) version=5 flags=0x0040 id=2125 size=168 numFields=2
  ** 000000: [    28] INT32       23
  ** 000010: [   108] UTF8-STRING "   System description changed to Linux nj-prod-analytics-1 3.10.0-327.28.2.el7.x86_64 #1 SMP Wed Aug 3 11:11:39 UTC 2016 x86_64^M
"

2019.11.07 18:00:03.228 *D* [agent.conn.48      ] Sending message CMD_GET_PARAMETER (9) to agent at 10.172.102.228
2019.11.07 18:00:03.228 *D* [agent.conn.48      ] Received message CMD_REQUEST_COMPLETED (9) from agent at 10.172.102.228
2019.11.07 18:00:03.228 *D* [agent.conn.48      ] Sending message CMD_GET_PARAMETER_LIST (10) to agent at 10.172.102.228
2019.11.07 18:00:03.229 *D* [                   ] SQL request queued: DELETE FROM alarm_events WHERE alarm_id=67
2019.11.07 18:00:03.229 *D* [db.cpool           ] Handle 0x7f17f4021c20 acquired (call from dbwrite.cpp:257)
2019.11.07 18:00:03.230 *D* [db.cpool           ] Handle 0x7f17f40284c0 released
2019.11.07 18:00:03.230 *D* [event.proc         ] Event 251 with code 32 passed event processing policy
2019.11.07 18:00:03.230 *D* [                   ] NetObj::expandText(sourceObject=179 template='Node status changed to WARNING' alarm=0 event=252)
2019.11.07 18:00:03.230 *D* [event.corr         ] CorrelateEvent: event SYS_NODE_WARNING id 252 source nj-prod-analytics [179]
2019.11.07 18:00:03.230 *D* [db.cpool           ] Handle 0x7f17f4001220 acquired (call from evproc.cpp:113)
2019.11.07 18:00:03.230 *D* [event.corr         ] CorrelateEvent: finished, rootId=0
2019.11.07 18:00:03.230 *D* [event.proc         ] EVENT SYS_NODE_WARNING [7] (ID:252 F:0x0001 S:1 TAGS:"NodeStatus") FROM nj-prod-analytics: Node status changed to WARNING
2019.11.07 18:00:03.230 *D* [event.policy       ] EPP: processing event 252
2019.11.07 18:00:03.230 *D* [event.proc         ] Event 252 with code 7 passed event processing policy
2019.11.07 18:00:03.231 *D* [db.cpool           ] Handle 0x7f17f4021c20 released
2019.11.07 18:00:03.234 *D* [event.proc         ] EventLogger: DBExecute: id=251,code=32
2019.11.07 18:00:03.237 *D* [event.proc         ] EventLogger: DBExecute: id=252,code=7
2019.11.07 18:00:03.385 *D* [agent.conn.48      ] AgentConnection::ReceiverThread(): communication channel shutdown
2019.11.07 18:00:03.385 *D* [agent.conn.48      ] Receiver loop terminated
2019.11.07 18:00:03.385 *D* [agent.conn.47      ] AgentConnection::ReceiverThread(): communication channel shutdown
2019.11.07 18:00:03.385 *D* [agent.conn.47      ] Receiver loop terminated
2019.11.07 18:00:03.385 *D* [agent.conn.47      ] Closing communication channel
2019.11.07 18:00:03.385 *D* [agent.conn.47      ] Receiver thread stopped
2019.11.07 18:00:03.385 *D* [agent.conn.48      ] Closing communication channel
2019.11.07 18:00:03.385 *D* [agent.conn.48      ] Receiver thread stopped

#2
I have a small netxms instance running in docker used primarily for 3rd party ems integration/alarm consolidation. It has 19 nodes all of which have 1 table dci executing a local script on the server to collect output into the table. It worked fine for about a month when we had a disk space issue. Couldnt identify which process was at fault but restarting netxmsd and mariadb resulted in 15G of disk being recovered. We did not clean up any files manually.

After the restart mariadb repaired the itself and netxmsd started without issue and ran for several hours after which we started having problems again. Initial symptom was inability to login via nmxc web client or full client. looking at logs showed no errors. nxadm from shell worked fine and had no issues with any of the show commands. I set debug to 9 but all I saw in the log was ItemPoller calling queueitems for the 19 nodes once per second followed by agent.conn sending/receiving 7 messages for DCI_DATA. Show watchdog indicated that both Syncer Thread and Poller Manager were not responding. Item poller was running and ad hoc/recurrent schedulers were sleeping as normal. I wasn't able to find anything else useful to indicate what caused it to hang.  A restart of netxmsd corrected the issue.

I'm thinking that there is some db damage which caused housekeeping to hang maybe? I have rescheduled it to run at a time when I'll be around to watch it. Anyone have any ideas how to troubleshoot this one better? My next step if the problem continues is to export the configs and rebuild the db. not worried about the history or anything but I'd much rather know whats going on before I blow everything away.
#3
I have a new install of 2.2.15 on Centos 7.5 running in a docker container. Same architecture I use everywhere aside from the netxms version. It ran fine at first while we imported our standard templates, events, epp, traps, etc etc. About 12 hours later I went to edit one of the epp rules and got a 'Cannot open event processing policy: Request timed out' error when trying to open the window in both the web ui and in the full client. I have 4 other 2.2.15 instances where this does not happen. There are some differences in the templates and epp rules but the other systems have more than this one. it only has a subset of the rules.

I unmanaged everything in infrastructure (it only had the netxms server itself) and set debug to 9 but couldnt find anything in the logs at all that I was able to distinguish from the normal log noise.

Any ideas how to proceed? further debug options? or a way to purge the rules without having access to the config options? I was able to export/import the rules still without issue and the file structure looks correctly formed.

Thanks,

Steve

#4
Love to see agent and log parser policies included in the configuration import/export utilities.
#5
Feature Requests / debug tags for trace in nxsl
June 24, 2019, 03:32:39 PM
Be nice to have debug tags in the nxsl trace function similar to nxlog_debug_tag in src/libnetxms/log.cpp. I've got a lot of custom nxsl on some large servers which are getting near impossible to debug due to the message volume when setting server debug levels which affect everything.
#6
Have a linux guest running a vendor developed custom distribution. Not sure which linux base they used for it. I can start the agent just fine with no errors, discover the node, poll configuration. Process list and active sessions from the right-click tools->info menu function and return data. Standard subagents as listed as expected (using 2.2.12 static linux from netxms.org). It all looks good at first glance however it appears that it is logging no data on poll. Timpestamps update each poll interval as they should under the 'Last Values' tab but all the values are empty. Anyone ever run across this behavior before? I have replicated it on 2.2.7 and 2.2.12, will try 2.2.15 tomorrow.

Debug on the agent side looks as if it is correctly responding.. I see the values in the log for memory/disk/cpu etc like I expect to but it never appears on the server side. No errors in any logs, all the dci are enabled and none listed as unsupported. I'm limited in how far I can dig with the vendor on their custom linux image but I do have root access to a dev instance I can dig around on if anyone has ideas.

Thanks and regards
#7
I have a new install of 2.2.7 which I'm attempting to get working with my older MS 2008R2 AD server.

With the original source it would successfully connect and retrieve all the objects, adds all users and updates them. It failed all groups with:
netxms:2018.07.23 16:24:11.188 *D* LDAPConnection::fillLists(): Found dn: CN=Admins,OU=Groups,DC=MY,DC=COMPANY,DC=COM
netxms:2018.07.23 16:24:11.188 *D* LDAPConnection::fillLists(): Unknown object is not added: dn: CN=Admins,OU=Groups,DC=MY,DC=COMPANY,DC=COM, login name: (null), full name: (null), description: (null)

I noticed that line 619 of ldap.cpp was checking if objectClass matches LdapGroupClass from the server config but also if it had m_loginName defined.. This seemed off to me.. groups have no loginname attribute.. at least in my AD. I removed that check and it did add the groups but crashes the entire application before it reaches UpdateLDAPGroup in userdb.cpp I belive.. still digging through that. It does update users but never reaches the groups. The total number of users added and updated are roughly what I expected for my server.

The crash logs are:
Jul 24 17:13:18 netxms-test kernel: $MAIN/WRK[19997]: segfault at 0 ip 00007f93a0e3aec4 sp 00007f93944cab18 error 4 in libnetxms.so.2.0.0[7f93a0dec000+6c000]
Jul 24 17:13:18 netxms-test systemd: netxmsd.service: main process exited, code=killed, status=11/SEGV
Jul 24 17:13:18 netxms-test systemd: Unit netxmsd.service entered failed state.
Jul 24 17:13:18 netxms-test systemd: netxmsd.service failed.

So now the question.. does anyone have LDAP working to a MS 2008R2 AD server with 2.2.4+?
#8
Is there a way to perform a scripted/silent install of the windows agent? I have ansible handing *NIX but the help menu for the windows agent does not make it clear how to provide the configuration prompted for during the install.

Thanks and regards