Concepts

Architecture overview

The system has three-tier architecture: the information is collected by monitoring agents (either our own high-performance agents or SNMP agents) and delivered to monitoring server for processing and storage. Network administrator can access collected data using cross-platform Destkop Management Client, Web Management Client or Management application for Android. Desktop and Web clients have almost the same functionality and the same user interface.

_images/architecture_scheme.png

Objects

All monitored network infrastructure is represented as a set of objects in NetXMS monitoring system. Each object represents one physical or logical entity (e.g. host or network interface), or group of them (e.g. subnet, container). Objects are organized into hierarchical structure. Each object has it’s own access rights. Access rights are applied hierarchically on all children of object. For example if Read access right is granted to a user on a Container, then user has Read right on all objects that this Container contains.

Every object has set of attributes; some of them exist for all objects (like id and name or status), while other depend on object class – for example, only Node objects have attribute SNMP community string. In addition to the above mentioned attributes, it’s possible to define custom attributes. This can be done by user in the Management Client, from NXSL script or by external application via NetXMS API.

NetXMS has seven top level objects – Entire Network, Service Root (named “Infrastructure Services” after system installation), Template Root, Asset Root, Network Map Root, Dashboard Root and Business Service Root. These objects serve as an abstract root for an appropriate object tree. All top level objects have only one editable attribute – name.

Object Class

Description

Valid Child Objects

Entire Network

Abstract object representing root of IP topology tree. All zone and subnet objects located under it. System can have only one object of this class.

  • Zone (if zoning enabled)

  • Subnet (if zoning disabled)

Zone

Object representing group of (usually interconnected) IP networks without overlapping addresses. Contains appropriate subnet objects.

  • Subnet

Subnet

Object representing IP subnet. Typically objects of this class are created automatically by the system to reflect system’s knowledge of IP topology. The system places Node objects inside an appropriate Subnet object based on an interface configuration. Subnet objects have only one editable attribute - Name.

  • Node

Service Root

Abstract object representing root of your infrastructure service tree. System can have only one object of this class. After system installation it is named “Infrastructure Services”.

  • Cluster

  • Chassis

  • Condition

  • Container

  • Node

  • Sensor

  • Subnet

  • Rack

Container

Grouping object which can contain any type of objects that Service Root can contain. With help of container objects you can build object’s tree which represents logical hierarchy of IT services in your organization.

  • Cluster

  • Chassis

  • Condition

  • Container

  • Node

  • Sensor

  • Subnet

  • Rack

Cluster

Pseudo-object defining any process: technological or logical that aggregates information from several separate nodes. See Cluster monitoring for more information.

  • Node

Rack

Object representing a rack. It has the same purpose as container, but allows to configure visual representation of equipment installed in a rack.

  • Node

  • Chassis

Chassis

Object representing a chassis, e.g. a blade server enclosure. Chassis can be configured as a part of a rack.

  • Node

Condition

Object representing complicated condition – like “cpu on node1 is overloaded and node2 is down for more than 10 minutes”. Conditions may represent more complicated status checks because each condition can have a script attached. Interval for evaluation of condition status is configured in Server Configuration Variables as ConditionPollingInterval with default value 60 seconds.

Node

Object representing physical host or network device (such as a router or network switch). These objects can be created either manually by administrator or automatically during network discovery process. They have a lot of attributes controlling all aspects of interaction between NetXMS server and managed node. For example, the attributes specify what data must be collected, how node status must be checked, which protocol versions to use, etc. Node objects contain one or more interface objects. The system creates interface objects automatically during configuration polls.

  • Interface

  • Access point

  • Network Service

  • VPN Connector

Interface

Interface objects represent network interfaces of managed computers and devices. These objects created automatically by the system during configuration polls or can be created manually by user.

Access point

Object representing wireless network access point. A node can have several access points, e.g. 2.4Ghz and 5Ghz, or in case of thin wireless access points managed by a central controller. These objects are created automatically by the system.

Network Service

Object representing network service running on a node (like http or ssh), which is accessible online (via TCP IP). Network Service objects are always created manually. Currently, the system works with the following protocols - HTTP, POP3, SMTP, Telnet, SSH and Custom protocol type.

VPN Connector

Object representing VPN tunnel endpoint, is used for interfaceless tunnels (like ipsec). Such objects can be created to add VPN tunnels to network topology known to NetXMS server. VPN Connector objects are created manually. In case if there is a VPN connection linking two different networks open between two firewalls that are added to the system as objects, a user can create a VPN Connector object on each of the firewall objects and link one to another. The network topology will now show that those two networks are connected and the system will take this condition into account during problem analysis and event correlation.

Template Root

Abstract object representing root of your template tree.

  • Template

  • Template Group

Template Group

Grouping object which can contain templates or other template groups.

  • Template

  • Template Group

Template

Data collection template. See Data collection section for more information about templates.

  • Mobile Device

  • Node

Asset Root

Abstract object representing root of hardware asset management tree.

  • Asset

  • Asset group

Asset Group

Grouping object which can contain assets or other asset group.

  • Asset

  • Asset group

Asset

Hardware management asset

Network Map Root

Abstract object representing root of your network map tree.

  • Network Map

  • Network Map Group

Network Map Group

Grouping object which can contain network maps or other network map groups groups.

  • Network Map

  • Network Map Group

Network Map

Network map.

Dashboard Root

Abstract object representing root of your dashboard tree.

  • Dashboard

  • Dashboard Group

Dashboard Group

Grouping object which can contain dashboards or other dashboard group

  • Dashboard

  • Dashboard Group

Dashboard

Dashboard. Can contain other dashboards.

  • Dashboard

Business Service Root

Abstract object representing root of your business service tree. System can have only one object of this class.

  • Business Service

  • Business Service Prototype

Business Service

Object representing single business service. Can contain other business services or business service prototypes.

  • Business Service

  • Business Service Prototype

Business Service Prototype

Prototype from which business service objects are automatically populated.

Object status

Each object has a status. Status of an object calculated based on:

  • Polling results

  • Status of child objects (e.g. interfaces of node, nodes under container)

  • Active alarms, associated with the object (after an alarm is resolved or terminated, it no longer affects object status)

  • Value of status DCIs (DCI that has Use this DCI for node status calculation property enabled)

There are multiple options for status calculation, see Status calculation for more information.

For some object classes, like Report or Template, status is irrelevant. Status for such objects is always Normal. Object’s status can be one of the following:

Nr.

Status

Description

0

NORMAL Normal

Object is in normal state.

1

WARNING Warning

Warning(s) exist for the object.

2

MINOR Minor

Minor problem(s) exist for the object.

3

MAJOR Major

Major problem(s) exist for the object.

4

CRITICAL Critical

Critical problem(s) exist for the object.

5

UNKNOWN Unknown

Object’s status is unknown to the management server.

6

UNMANAGED Unmanaged

Object is set to “unmanaged” state.

7

DISABLED Disabled

Object is administratively disabled (only applicable to interface objects).

8

TESTING Testing

Object is in testing state (only applicable to interface objects).

Unmanaged status

Objects can be unmanaged. In this status object is not polled, DCIs are not collected, no data is updated about object. This status can be used to store data about an object that is temporary or permanently unavailable or not managed.

Maintenance mode

This is special status, that’s why it is not included in above status list. This status prevents event processing for specific node. While this node in maintenance mode is still polled and DCI data is still collected, but no event is generated.

Event Processing

NetXMS is event based monitoring system. Events can come from different sources (polling processes (status, configuration, discovery, and data collection), SNMP traps, and directly from external applications via client library). All events all are forwarded to NetXMS Event Queue.

NetXMS Event Processor can process events from Event Queue in either sequential or parallel mode. In sequential mode events are processed one-by-one. Parallel processing mode allows to process events in several parallel threads, thus increasing processing performance. See Event processing for more information.

Events in the Event Queue are processed according to rules defined in Event Processing Policy. As a result of event processing, preconfigured actions can be executed, and/or event can be shown up as alarm.

Usually alarm represents something that needs attention of network administrators or network control center operators, for example low free disk space on a server. NetXMS provides one centralized location, the Alarm Browser, where alarms are visible. It can be configured which events should be considered important enough to show up as alarm.

_images/event_flow.png

Event flow inside the monitoring system

Polling

For some type of objects NetXMS server start gathering status and configuration information as soon as they are added to the system. These object types are: nodes, access points, conditions, clusters, business services, zones (if a zone has more then one proxy, proxy health check is being performed). This process called polling. There are multiple polling types, each having specific execution intervals (set by server configuration variables). In the end of polling process hook script is being executed.

Type

Purpose

Interval server configuration variable

Hook script

Status

Determine current status of an object

Objects.StatusPollingInterval

Hook::StatusPoll

Configuration

Determine current configuration of an object (list of interfaces, supported protocols, etc.) By default executes auto bind scripts for templates and containers, use “Objects.AutobindOnConfigurationPoll” server configuration variable to disable.

Objects.ConfigurationPollingInterval

Hook::ConfigurationPoll

Configuration (full)

Same as usual configuration poll but resets previously detected capabilities and detects them again. (can only be executed manually)

Interface Names

Updates names of the interfaces. This operation also happens during Configuration Poll. (can only be executed manually)

Topology

Gather information related to network link layer topology

Topology.PollingInterval

Hook::TopologyPoll

Routing

Gather information about IP routing (cannot be executed manually)

Topology.RoutingTableUpdateInterval

ICMP

Ping nodes and gather response time statistics (cannot be executed manually)

ICMP.PollingInterval

Instance Discovery

Perform Instance Discovery to add/remove DCIs

DataCollection.InstancePollingInterval

Hook::InstancePoll

Automatic Binding

Checks and bind or unbind Containers, Templates and Context Dashboards to nodes according to auto-bind script.

Objects.AutobindPollingInterval

Network Discovery

Searches for new nodes by polling information about neighbor IP addresses from known nodes

NetworkDiscovery.PassiveDiscovery.Interval

Hook::DiscoveryPoll

Polling intervals can be set for specific objects by adding a custom attribute named SysConfig:nnn, where nnn is the name of server configuration variable e.g.: SysConfig:Objects.ConfigurationPollingInterval.

Data Collection

From each node NetXMS can collect one or more metrics which can be either single-value (e.g. “CPU.Usage”), list (e.g. “FileSystem.MountPoints”) or table (e.g. “FileSystem.Volumes”). When new data sample is collected, it’s value is checked against configured thresholds. This documentation use term Data Collection Item (DCI) to describe configuration of metric collection schedule, retention, and thresholds.

Metrics can be collected from multiple data sources:

Source

Description

Internal

Data generated inside NetXMS server process (server statistics, etc.)

NetXMS Agent

Data is collected from NetXMS agent, which should be installed on target node. Server collect data from agent based on schedule.

SNMP

SNMP transport will be used. Server collect data based on schedule.

Web service

Data is objained from JSON, XML, or plain text retrieved via HTTP

Push

Values are pushed by external system (using nxpush or API) or from NXSL script.

Windows Performance counters

Data is collected via NetXMS agent running on Windows machine.

Script

Value is generated by NXSL script. Script should be stored in Script Library.

SSH

Data is obtained from output of ssh command executed through SSH connection.

MQTT

Data is obtained by subcribing to MQTT broker topics.

Network Device Driver

Some SNMP drivers (NET-SNMP, RITTAL as of NetXMS v. 3.8) provide parameters for data collection. E.g. NET-SNMP provides information about storage this way.

Discovery

Network discovery

NetXMS can detect new devices and servers on the network and automatically create node objects for them. Two modes are available – passive and active.

In passive mode server will use only non-intrusive methods by querying ARP and routing tables from known nodes. Tables from the server running NetXMS are used as seed for passive discovery.

In active mode in addition to passive scan methods configured address ranges are periodically scanned using ICMP echo requests.

NetXMS can also use SNMP trap and syslog messages as seed for discovery.

Instance discovery

NetXMS can create metrics names for Data Collection Item automatically. Instance discovery collects information about node instances like disk mountpoints, device list, etc. and automatically creates or removes DCIs with obtained data.

Security

All communications are encrypted using either AES-256, AES-128, or Blowfish and authenticated. As additional security measure, administrator can restrict list of allowed ciphers.

Agent authenticate incoming connections using IP white list and optional preshared key.

User passwords (if internal database is used) as hashed with salt with SHA-256.

All shared secrets and passwords stored in the system can be obfuscated to prevent snooping.