Acceptable values for server queues and their dependencies

Started by Lukas, March 04, 2014, 02:59:42 PM

Previous topic - Next topic

Lukas

Hi,

Are there any known or recommended values for DCis monitored on the management node? Is any of the following parameter worth watching and having a threshold alert defined?

So far I was able to come up with the following "rules", but I not sure if this is correct. Also some data are missing:
- "Configuration poller queue for last minute" --  depends on server configuration value NumberOfConfigurationPollers
- "Status poller queue for last minute" -- depends on server configuration value NumberOfStatusPollers that should be 1/10th of Nodes.
- "Average time to queue DCI for polling for last minute" -- will increase if DCI polling takes more time - rule should be that (avarage time * number of DCIs) / NumberOfDataCollectors < DCI polling interval
- "Database writer's request queue (other queries) for last minute" -- this and following two are influenced by NumberOfDatabaseWriters server configuration value, but I am not sure when this value needs to be increased.
- "Database writer's request queue (DCI data) for last minute"
- "Database writer's request queue for last minute"
- "Data collection poller's request queue for last minute" -- probably depends on NumberOfDataCollectors and "Average time to queue DCI ..."

Can someone add more info from larger deployments about acceptable values?

Thanks and Regards,
Lukas

Victor Kirhenshtein

Hi!

In general, all those values should be kept around zero, or on some stable level. It is normal to have occasional spikes on any of them, but average for 10-15 minutes should be low and not growing over time. Long time high or growing values for those DCIs could indicate:

"Configuration poller queue for last minute" - there are not enough configuration pollers or configuration polls take unusually long time. One of the common reason is timeouts because some communications are blocked by firewalls. It is recommended to disable agent and/or SNMP polling for nodes where they are not needed. Since 1.2.13 it is also possible to disable Check Point SNMP agent check globally - this could speed up configuration polling.

"Status poller queue for last minute" - there are not enough status pollers, or status polling interval should be increased.

"Average time to queue DCI for polling for last minute" - usually caused by excessive waits on different internal locks. Sometimes this is a sign of a bug in a server.

"Database writer's request queue for last minute" (all 3 variants) - indicates that there are more outstanding SQL INSERT or UPDATE requests then database can handle. When database by itself can handle more load but database connection is a bottleneck increasing number of database writers will help.

"Data collection poller's request queue for last minute" - server cannot collect data fast enough. This may happen when monitored nodes responds too slowly, or data collection interval set for too low.

Best regards,
Victor


Lukas

Thank you Victor for a detailed explanation, it's very helpful.