Menu

Show posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Show posts Menu

Topics - Tursiops

#21
Hi,

I've been testing passive network discovery via proxy/zones and so far is working quite well.
However, when it detects a node that has neither an agent on it nor SNMP enabled to query for the hostname, it adds the node just by the IP address.
I can see in the logs that it is trying to resolve the IP to a hostname, but this appears to happen on the NetXMS server itself, not the proxy node of the zone through which the discovery happened?
This can lead to a rather large amount of not too useful nodes ending up in the system.

For now I just wrote a hook that basically sets anything to unmanaged that has neither an agent nor SNMP and ignore those systems.
But it would be nice for mapping purposes, if they would show with their correct hostnames as per local DNS.

Cheers
#22
Hi,

Looks like our NetXMS server decided to start segfaulting after the upgrade to 2.1-RC1.
Reading through the dump and not being a developer, I have no idea what the underlying cause is, so here goes:

*** Error in `netxmsd': malloc(): smallbin double linked list corrupted: 0x0000000035444410 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f6f596457e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x81d61)[0x7f6f5964fd61]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0xba)[0x7f6f5965221a]
/usr/lib/x86_64-linux-gnu/libnetxms.so.2(_ZN11NXCPMessageC1EP12NXCP_MESSAGEi+0x227)[0x7f6f599c3b47]
/usr/lib/x86_64-linux-gnu/libnxsrv.so.2(_ZN15AgentConnection14receiverThreadEv+0x592)[0x7f6f59c20b72]
/usr/lib/x86_64-linux-gnu/libnxsrv.so.2(_ZN15AgentConnection21receiverThreadStarterEPv+0x9)[0x7f6f59c21039]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f6f57a2d6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6f596d482d]


Anyone else seeing similar crashes? Any idea what's causing them?

Cheers
#23
General Support / [Bug] Invalid Zone ID
May 12, 2017, 03:23:48 AM
Hi,

I still seem to be having a problem with moving nodes between zones.
I had described my problem before, but do not have a working solution yet other than deleting nodes and re-adding them (and I couldn't find my own, older post to reply to). Some of this is likely related to https://track.radensolutions.com/issue/NX-1148.
Except now I have issues where I am unable to move a node no matter how long I wait.
The error message used to be about IP address conflicts, now it just tells me "Invalid Zone ID".

And it seems to happen in other, more random, situations, so I am not sure if this is still the same issue.

--- EDIT --- The below part is indeed a different issue. --- EDIT ---
The previous issue generally happened when I wanted to change a setup like this:
- Node A sits in Zone Y. Node A is a server and NetXMS Proxy.
- Node B sits in Zone Z. Node B is a firewall/router.
- Both nodes are in the same private network and usually Node B is the default gateway for Node A.
- Node B is configured with it's public IP address in NetXMS for SNMP monitoring.
- Node A is configured with Node B's public IP address in NetXMS for Agent monitoring. Node B has a port forward for 4700 to Node A's private IP.
The above setup works fine, no issues.

Now I want both Node A and Node B to be in the same zone. The main reason being the ability to use Node A as syslog proxy for the entire private network, including the firewall.

Trying to move these into the same zone leads to (or used to lead to) an IP address conflict. The only way I can resolve this is to delete Node B, move Node A into zone Z, then re-add Node B in zone Z. After that everything works fine. Except at that point I usually hit https://track.radensolutions.com/issue/NX-1148, so it can take a while before I can move Node A.

That was my previous issue (although it still is an issue for me).

By now, I do not receive an IP address conflict message. I receive an "Invalid Zone ID" error instead.
That doesn't go away after a couple of hours either and it survives a NetXMS server reboot and nxdbmgr check run.
All I can do now is to delete Node A as well and then re-add it.

In addition, this errors has started appearing when trying to move random nodes which did and do not have IP address conflicts between zones. It also doesn't matter into which zone I want to move them - they always come back with "Invalid Zone ID".

Is this an issue in 2.1-M3?
Or do I have some database corruption somewhere which nxdbmgr check doesn't pick up?

Cheers
#24
Hi,

The more nodes and DCIs we have, and of course the more alerts we have in the system, the slower the NetXMS Console is to respond when for example clicking on "Entire Network" or "Infrastructure Services" (or heavily populated subnets/containers). It seems as if the Console is pulling the data for all tabs at that moment already, instead of when I actually click on the relevant tab.

The delay is quite noticeable for us by now (i.e. around 4-6 seconds).

I suggest not loading the data until the relevant tab is clicked upon, as there is no need to introduce this delay unless one actually wants to look at that data.

Cheers
#25
General Support / agent-to-server connections
April 06, 2017, 01:06:01 AM
Hi,

This little line in the Changelog got me all excited: "Experimental agent-to-server connections (agent tunnels)"
But how do I experiment with these experimental connections? :D

Cheers
#26
Hi,

I seem to be having an issue where instances just disappear.
The reason appears to be that if more than one polling method is enabled, unless all of them fail, the node is not considered to be down.

If the node isn't down, it'll happily run Instance Discovery.
Of course if the agent or SNMP do not respond (but the other does, or ICMP may be working and part of polling), then instance discovery will fail to discover any instances.
That's when it seems to wipe any existing instances.

Has anyone else encountered this?
If that's not just some oddity at our end, maybe instance discovery should only run for agent related items when the agent is considered to be up, same for SNMP. If the agent or SNMP do not respond when instance discovery starts, the results should probably be discarded, to avoid losing all history on existing instances?

Cheers,
Tursiops
#27
Hi,

I did not need to use the MIB Browser in a while as most our templates just chugging along nicely.
The other day I had a new device type to add and when I tried to use the MIB Browser the result was that every single OID in a full SNMP walk returned "Hex-STRING".
I then went and tried walking other devices which have working templates and SNMP DCIs applied to them: same result.

From what I read, Hex-STRING is what NetXMS shows when there are non-printable characters in the result.
Considering I used the MIB Browser extensively in the past to create templates and now none of the dozen devices I tested are returning anything but Hex-STRING - in the MIB Browser only. DCIs happily return correct data.

I would have last used the MIB Browser while running M2.1-M1, so I am guessing this to be related to 2.1-M2.
This happens in the console as well as the web interface.
Has anyone else seen the same issue?

Cheers,
Tursiops
#28
Feature Requests / ExternalParameter as another user
March 20, 2017, 11:30:20 PM
Hi,

I was wondering if it would be possible to add a feature where one can select as which user to call an ExternalParameter.

To give an example, I use a PowerShell script to monitor DFS Replication Backlog on Windows. The script works fine, but it cannot run as "SYSTEM". It requires a user with administrative privileges to run it.
Obviously I would not want to run the NetXMS service as such a user.
Storing credentials on the system being queries does not seem like a good idea.
Setting up specific user and reconfiguring DCOM and other parts of the system to use a locked down account makes setting up monitoring a very manual process.
Same with setting up a scheduled task on the system (which is what I'm doing right now) to collect the data, dump it into a file and then use NetXMS to read the result.

Being able to call the ExternalParameter with an additional username/password which could be configured inside the DCI would allow this to "just work" straight from a template (except for then manually configuring the credentials per system as required - not sure if that would break the Template/DCI relationship).

Cheers,
Tursiops
#29
General Support / netxmsd segfaults
March 06, 2017, 07:07:18 AM
Hi,

On Friday evening, our NetXMS server started segfaulting. I can't pin down what's causing it, but I basically can't run it for more than maybe 15-20 minutes before it crashes again.
I tried to follow the instructions given at https://wiki.netxms.org/wiki/Running_NetXMS_under_debugger to obtain a backtrace, but the result is that the netxmsd service stops responding and so does gdb, i.e. it never returns to the gdb prompt and I can't get a trace. I basically have to open another session and kill the gdp process itself. If I kill netxms, gdb is no longer attached to the process and I can't get a trace either.

Anything else I can try to get that elusive backtrace?

Cheers
#30
General Support / NetXMS Event Log Monitoring
February 22, 2017, 02:45:53 AM
Hi,

I've started testing Windows Event Log monitoring via NetXMS.
Created a basic logwatch file looking for two particular backup related events.
In general, this works ok - except sometimes the agent will just crash when a matching event is found.
The matching event is not recorded on the NetXMS server, but I can see the message in the Windows Event Log.
The agent at that point requires a manual restart, which works just fine.
I've automated the restart part and can confirm that any such crash occurs at the same time as one of the two backup messages is logged.
The same event an hour later then works fine.

I have been unable to find a pattern here, it seems completely random. An agent can run for days or just hours.
If the logwatch syntax was at fault, it should either not work or crash every time - but it doesn't.

Not sure if anyone else has seen something like this?

Cheers
#31
General Support / Geolocation & Map Questions
February 03, 2017, 04:43:55 AM
Hi,

I've started putting some location data into NetXMS and also installed an Android agent to test Geolocation History.
The server is currently running 2.1-M1.

Using this left me with a few of questions:
- When I add containers to a map (I've added locations to containers as they represent entire sites including multiple devices) and right-click on a container in the map, NetXMS allows me to do some very basic things like "Status Map" or "Software Inventory". I can't do Summary Tables, Event Log or Syslog. However, if I do in fact select any of the available options first, the second time around I suddenly have all other options available to me as well. This was reproducible, presumably a bug?

- Is it possible for NetXMS to automatically obtain the longitude/latitude of a device from a given address rather than having to rely on a GPS receiver or manually entering the data? Guess using the Java API would work for this (having it integrated would of course be nice touch)?

- If I do have multiple containers (or nodes) at the same location, I can only see one in the map. I can't hover and maybe select one of the other ones or click through them. I can't even see that there is more than one container/node at the location. Not sure if that's something that could be implemented?

Cheers
#32
General Support / DefaultDCIPollingInterval ignored?
January 24, 2017, 04:12:10 AM
Hi,

Using 2.1-M1, I noticed that DCIs which are set to poll at the default interval are polling every 5 minutes - except our DefaultDCIPollingInterval is configured as 60 seconds.

Originally I assumed this was due to the rather high poller queue, so delays were expected, but after fixing that, the DCI results started coming in exactly every five minutes (as opposed to 5-6 minutes).

Is that some (new?) lower limit on DCI pollers?

EDIT: This is causing some issues for us, as alerts which used to be triggered after 60 threshold violations are now no longer alerting after an hour, but after 5 hours. Changing the DefaultDCIPollingInterval to 120 doesn't seem to have made a difference - all default DCIs are still checking every 5 minutes.

Cheers
#33
Feature Requests / syslog table Index
January 22, 2017, 11:15:06 PM
Hi,

I noticed that when I do a search in Syslog, it appears reasonably fast.
But if I select a device, right-click and select Syslog it can take forever to load anything.

So I had a look at the database and found the table only has an index on the msg_timestamp (and msg_id) column, but not the source_object_id one.
Of course I could add one manually (and did for now, which sped things up significantly), but that might cause issues with future NetXMS database upgrades down the track?

Cheers
#34
General Support / Syslog Monitor - Filter
December 16, 2016, 06:07:20 AM
Hi,

I did some searching to confirm this, but it appears the Filter in the Syslog Monitor (NetXMS Management Console) is very basic in that it does not accept regular expressions or boolean things like "hostname AND deny"?
I know I can do this and a lot more in the actual syslog View, but sometimes just being able to filter things out on the fly as they come through is all that's required (regex would be a winner), as opposed to having to re-execute a search query over and over while looking at the logs as they come in.

Did I miss something in the documentation or is this currently not possible?

Cheers
#35
General Support / Source Host in Syslog Proxy in 2.1-M1
December 14, 2016, 12:21:16 AM
Hi,

Reading that 2.1 includes a Syslog Proxy, I just had to give this a spin. :)

Doing this, I encountered an issue with the server matching the messages to the correct node. I am using zoning, which probably plays a part in this.

Node A is a router at a site.
Node B is the proxy node.
The site has a single public IP address, so to connect the proxy node I have to create a port forward on that IP address.
That also means I cannot add Node A and Node B into the same zone (IP conflict). Node B is therefore in the Default zone, while Node A is in that site's zone.

I reconfigured Node B to act as Syslog proxy and reconfigured Node A to send syslog messages to Node B.

The result was that the messages were linked to Node C - a completely unrelated router which is sitting in the default zone and happens to have the same internal IP address as Node A.

Based on the above, my guess was that any syslog messages coming in from or proxied through Node B are automatically placed in the Default zone and then matched as per the server's SyslogNodeMatchingPolicy (in my case 0, i.e. IP, then hostname - but being in the wrong zone, the order would not matter).

So I moved Node A into its own zone and changed its IP in NetXMS to its public IP, syslog was reconfigured to send directly to the NetXMS server.
Node B was moved into the site's zone.

That should fix all other devices on the network sending syslog through Node B (haven't tested this yet) while the router's syslog goes straight to the NetXMS server.

Next problem now: the server has two systems with the same public IP address and can't tell where the messages are actually coming from.
I changed the SyslogNodeMatchingPolicy from 0 to 1 and restarted the NetXMS server, but that made no difference. Clearly the hostname matching isn't working in this case. I am not even sure which hostname it's comparing? The Object or the Primary host name? Changing the Object name made no difference.
I need to use an FQDN for the Primary host name to be able to query the router, but the router in question only sends the hostname in the syslog message. If I put an FQDN in as hostname on the router itself, it appears to ignore everything from the first "." onwards when it adds it to syslog messages. Other devices do not even allow hostnames longer than maybe 16 characters. Looks like I've hit a dead end?

Is there a way to setup "rules" to handle assigning syslog messages to devices?
How do other users handle this?

Maybe a future solution would be for NetXMS to ignore the actual IP/hostname presented for data collection and only use the interface IP addresses for IP conflict, topology and syslog checks, considering that the IP used to query the proxy node is not actually on the proxy node?

Cheers
#36
General Support / Email alerts in 2.1-M1 not working?
December 07, 2016, 08:24:26 AM
Hi,

Email alerts on our system stopped working after we upgraded to NetXMS 2.1-M1 (which required an upgrade to Ubuntu 16 for JDK8 and was followed by an upgrade of Postgres to 9.5 - so plenty of changes).

The NetXMS configuration was pretty basic and meant to send emails through localhost. Manually sending via telnet works without a problem.
I increased debug logging on the server, monitored Postfix logs as well as NetXMS and found that Postfix does not even see a connection attempt from NetXMS.
NetXMS on the other hand gives me this (email address/server name/DCI data replaced):

[07-Dec-2016 17:13:48.650] [DEBUG] *actions* Executing action 4 ([Email] Standard Alert) of type SEND EMAIL
[07-Dec-2016 17:13:48.650] [DEBUG] *actions* Sending mail to EMAIL_ADDRESS: ""DCI_DESCRIPTION" is in state "NORMAL" - (Parameter: DCI_PARAMETER)"
[07-Dec-2016 17:13:48.655] [DEBUG] SMTP(0x7f5d6c2a3390): Failed to send e-mail, remaining retries: 4
[07-Dec-2016 17:13:48.656] [DEBUG] SMTP(0x7f5d6c2a3390): Failed to send e-mail, remaining retries: 3
[07-Dec-2016 17:13:48.656] [DEBUG] SMTP(0x7f5d6c2a3390): Failed to send e-mail, remaining retries: 2
[07-Dec-2016 17:13:48.656] [DEBUG] SMTP(0x7f5d6c2a3390): Failed to send e-mail, remaining retries: 1
[07-Dec-2016 17:13:48.657] [DEBUG] SMTP(0x7f5d6c2a3390): Failed to send e-mail, remaining retries: 0
[07-Dec-2016 17:13:48.657] [DEBUG] EVENT SYS_SMTP_FAILURE [22] (ID:4307970 F:0x0001 S:1 TAG:"") FROM SERVER_HOSTNAME: Unable to send e-mail to <EMAIL_ADDRESS>: Unable to resolve SMTP server name


I am not sure what there is to resolve for localhost or the IP 127.0.0.1 (I tested that as well). But either way, localhost does resolve locally without problems and telnet to localhost on port 25 works fine.

Has anyone else run into this?
#37
General Support / NetXMS 2.1 Windows Agent - No vmgr.nsm?
December 01, 2016, 02:02:23 PM
Hi,

I thought I'd give the Hypervisor subagent a spin on a Hyper-V server, but then noticed that the vmgr.nsm isn't actually in the (2.1M1) agent package?
Where can I get this from?

Cheers
#38
Hi,

Would it be possible to implement something like a field to enter date/time (or a field for date and a 24h slider for time) to basically go back in time and show all "Last Values" for a node as they were at that time? That would make it possible to just select a random point in time and see the status of everything on that node at that moment as NetXMS saw it.

Cheers
#39
Hi,

I have encountered this several times now:
- A system has all its DCIs applied via templates. The templates are configured to remove the DCIs if the node no longer matches a certain condition
- The device becomes unreachable for an extended period of time (as in hours or even days). Reasons can be internet connection problems, dead routers, the NetXMS agent not running for whatever reason or a server being shutdown "temporarily".
- Eventually the NetXMS server will remove the templates from the node.

I originally assumed that the server would not remove the templates from an unreachable node for that reason: it is unreachable. It therefore cannot determine if the condition that lead to the template being applied is still valid or not. If I try to run a manual Configuration Poll in such a scenario, it actually tells me it is not going to poll, because the node is unreachable - and then it goes ahead and wipes the template anyway. (That one just happened to me today)

I am not sure how I can configure the templates so they are not neither removed nor assigned when a node is actually unreachable?

Any ideas? Or is this a misconfiguration or bug of sorts?

Thanks
#40
Hi,

We are using events to send emails to another system which parses the emails based on a number of rules and puts it all into a ticketing system.
As we are setting up rules, we sometimes find that some alerts are not parsed or ticketed properly. That's obviously not a NetXMS problem. :)

However, I'd like to be able to "force" an alert to be resent. When I try to just resolve the alert (we tend to sticky acknowledge them to avoid being spammed while we look at things) and then terminate it, the system will still not send a new alert out. I would have thought that when an alert is terminated where the underlying condition has not actually changed, that a new alert would be triggered. But that does not seem to be the case.

I can clear the collected data for the DCI in question, but the "Threshold" column will still show me the threshold that was triggered originally. And when new data comes in that still matches that, it doesn't change nor does it send another alert. Is there any way to "clear" that Threshold column for an individual DCI to ensure a new alert goes out?

Any ideas?