News:

We really need your input in this questionnaire

Main Menu

about housekeeper

Started by multix, June 25, 2015, 03:33:09 AM

Previous topic - Next topic

multix

Hi.
If I don't misunderstand, housekeeper deletes old records of DCIs which are older then retention time.

I have a big system and like to record nearly all possible information  about my system. So, there are lots of DCIs and polling frequency is high.

My nextxms installation is working so good. But after 02:00 (Housekeeper's start time) system's db writing performance is getting too slow. When I look up to queues, I see lots of data at Database Writers sections (IData, RawDataand other). This queue sometimes going down in short time. But sometimes, Queue can go down to 0 nearly 20:00 pm. So, till queues go down to 0, all of my dashboards are not showing full.

I and using postgresql, linux. Postgresql tuning is ok. My server's configuration is Hp proliant DL380 gen 6. 2 CPUs, 32 GB ram and hp 410i raid card with 256 MB of cache. I am using 4 hdd with raid. Virtual platform on this pc and only one VM (nextms server) (sometimes I am trying wrong things and in VM, I can quickly backup and restore :) )
I am looking to disc performance of pc, and can go about 120 MBps but usage is about 5,5MBps.

Is this problem about housekeeper or my hardware or anything else?
Does anybody has any idea about?

Thanks.

Alex Kirhenshtein

What version of NetXMS you are using? It was a known bottleneck and we did a major optimisation in 2.0-M3.

multix

my server version is 2.0-M5

Alex Kirhenshtein

#3
Can you please enable slow query logging (>1sec, for example) and post explain here (or send it to [email protected])?
If it's delete query (most likely), you can do it in transaction and roll it back:
BEGIN;
EXPLAIN ANALYZE DELETE FROM idata_...;
ROLLBACK;

multix

these are some of them :

LOG:  duration: 38916.787 ms  statement: DELETE FROM idata_4087 WHERE (item_id=4715 AND idata_timestamp<1420069290) OR (item_id=4716 AND idata_timestamp<1420069290) OR (item_id=4835 AND idata_timestamp<1420069290) OR (item_id=799 AND idata_timestamp<1433029290) OR (item_id=4830 AND idata_timestamp<1420069290)
LOG:  duration: 25737.620 ms  statement: DELETE FROM idata_4090 WHERE (item_id=4699 AND idata_timestamp<1420069329) OR (item_id=4698 AND idata_timestamp<1420069329) OR (item_id=4909 AND idata_timestamp<1420069329) OR (item_id=4904 AND idata_timestamp<1420069329) OR (item_id=800 AND idata_timestamp<1433029329)
LOG:  duration: 30253.926 ms  statement: DELETE FROM idata_4093 WHERE (item_id=4752 AND idata_timestamp<1420069355) OR (item_id=4751 AND idata_timestamp<1420069355) OR (item_id=4881 AND idata_timestamp<1420069355) OR (item_id=4888 AND idata_timestamp<1420069355) OR (item_id=801 AND idata_timestamp<1433029355)
LOG:  duration: 25967.450 ms  statement: DELETE FROM idata_4094 WHERE (item_id=4621 AND idata_timestamp<1420069385) OR (item_id=4622 AND idata_timestamp<1420069385) OR (item_id=4900 AND idata_timestamp<1420069385) OR (item_id=4890 AND idata_timestamp<1420069385) OR (item_id=802 AND idata_timestamp<1433029385)
LOG:  duration: 31979.821 ms  statement: DELETE FROM idata_4095 WHERE (item_id=4675 AND idata_timestamp<1420069411) OR (item_id=4674 AND idata_timestamp<1420069411) OR (item_id=4878 AND idata_timestamp<1420069411) OR (item_id=4867 AND idata_timestamp<1420069411) OR (item_id=803 AND idata_timestamp<1433029411)
LOG:  duration: 11051.021 ms  statement: DELETE FROM idata_4103 WHERE (item_id=2013 AND idata_timestamp<1420069443) OR (item_id=2082 AND idata_timestamp<1420069443) OR (item_id=804 AND idata_timestamp<1433029443) OR (item_id=2050 AND idata_timestamp<1420069443) OR (item_id=2012 AND idata_timestamp<1420069443)
LOG:  duration: 6661.263 ms  statement: COMMIT
LOG:  duration: 7484.775 ms  statement: COMMIT
LOG:  duration: 6031.884 ms  statement: COMMIT
LOG:  duration: 3175.104 ms  statement: UPDATE thresholds SET current_state=0 WHERE threshold_id=46785
LOG:  duration: 36858.129 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621607,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086
LOG:  duration: 3999.100 ms  statement: COMMIT
LOG:  duration: 1374.970 ms  statement: COMMIT
LOG:  duration: 1874.300 ms  statement: COMMIT
LOG:  duration: 2345.613 ms  statement: UPDATE thresholds SET current_state=0 WHERE threshold_id=46783
LOG:  duration: 2345.286 ms  statement: UPDATE thresholds SET current_state=0 WHERE threshold_id=49659
LOG:  duration: 2345.726 ms  statement: UPDATE thresholds SET current_state=0 WHERE threshold_id=49047
LOG:  duration: 26901.621 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621607,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086
LOG:  duration: 2021.212 ms  statement: COMMIT
LOG:  duration: 17929.845 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621607,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086
LOG:  duration: 2733.314 ms  statement: COMMIT
LOG:  duration: 2649.563 ms  statement: COMMIT
LOG:  duration: 2898.573 ms  statement: COMMIT
LOG:  duration: 77500.642 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621607,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086
LOG:  duration: 7623.949 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621614,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086
LOG:  duration: 33038.037 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621607,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086
LOG:  duration: 9872.400 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621614,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086
LOG:  duration: 22095.260 ms  statement: UPDATE alarms SET alarm_state=2,ack_by=0,term_by=0,last_change_time=1435621607,current_severity=3,repeat_count=564,hd_state=0,hd_ref='',timeout=0,timeout_event=43,message='In_142_EOBC0/0__EO0/0 veri alma hatasi olustu',resolved_by=0, ack_timeout=0 WHERE alarm_id=86086

when I try to explain analyse, pgsql says me
query result with .... rows discarded (usually 15). So I could see nothing.



Alex Kirhenshtein

Even simple queries are taking ages (e.g. 10 seconds for update by PK).
What PostgreSQL version are you using? Please share your postgresql.conf.
Do you have battery installed on 410i? Is it in write-back or write-through mode?
What's your read-ahead settings for drives? Please also show output of "iostat -x 5 5".

multix

#6
410i card has battery.
I double checked about write cache of 410i card, and sure that it is enabled.
My Postgresql version is  9.2.10
iostat output at 22:05 pm (Turkey)
Linux 3.10.0-229.4.2.el7.x86_64 (netxms)        07/01/2015      _x86_64_        (12 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.42    0.00    7.44    5.63    0.00   81.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.15    14.33   32.14   95.79   916.54  1585.75    39.12     1.76   14.24   25.56   10.45   4.40  56.33
dm-0              0.00     0.00    0.00    0.01     0.00     0.03     8.00     0.00  337.80   41.81  383.36   7.20   0.01
dm-1              0.00     0.00   32.30  110.13   916.52  1585.69    35.14     3.81   26.08   29.87   24.97   3.96  56.36

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.19    0.00    2.87    0.17    0.00   93.77

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              12.60     5.60    4.80  179.80    70.40  1769.60    19.93     0.34    1.84   11.38    1.59   0.35   6.42
dm-0              0.00     0.00   14.40    0.00    57.60     0.00     8.00     0.13    9.22    9.22    0.00   1.15   1.66
dm-1              0.00     0.00    3.00  185.40    12.80  1769.60    18.92     0.33    1.76   12.67    1.58   0.25   4.72

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.99    0.00    4.78    0.05    0.00   89.18

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.20   17.00   51.80   172.00   672.00    24.53     0.25    3.57   13.75    0.23   1.08   7.42
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00   17.00   53.00   172.00   672.00    24.11     0.25    3.51   13.76    0.22   1.05   7.38

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.14    0.00    2.72    0.02    0.00   94.12

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    58.20    0.80   98.80     3.20  1088.00    21.91     0.08    0.79    8.75    0.72   0.12   1.24
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.80  157.00     3.20  1088.00    13.83     0.13    0.82    8.75    0.78   0.08   1.26

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.81    0.00    5.43    0.05    0.00   87.71

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00    43.40    0.40  169.60     1.60  2665.10    31.37     0.23    1.38   18.00    1.34   0.14   2.46
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.40  212.60     1.60  2664.80    25.04     0.31    1.47   18.00    1.44   0.11   2.44


and attached postgresql.conf

I was using mysql, and this problem was occurring nearly always. Then, I tried to use postgresql. I don't have much information about postgresql tuning and try to find some information over internet. I see http://pgtune.leopard.in.ua web page, and made tuning with info on this site. There was no problem about 2-3 months. And now again occurred.


I will update iostat while queues are increasing.

02:35 am
[root@netxms ~]# iostat -x 5 5
Linux 3.10.0-229.4.2.el7.x86_64 (netxms)        07/02/2015      _x86_64_        (12 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.41    0.00    7.34    5.53    0.00   81.72

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.16    14.41   33.02   96.21   932.96  1590.23    39.05     1.80   14.38   26.43   10.24   4.30  55.51
dm-0              0.00     0.00    0.00    0.01     0.01     0.03     8.00     0.00  331.62   37.99  383.36   7.09   0.01
dm-1              0.00     0.00   33.19  110.63   932.94  1590.18    35.09     3.80   25.79   30.53   24.37   3.86  55.54

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.90    0.00    4.56    7.86    0.00   81.68

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               8.80     4.80  667.40   99.20 21368.00  2549.20    62.40    39.54   51.76   59.03    2.89   1.30 100.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00  673.60  104.00 21347.20  2549.20    61.46    40.31   52.02   59.57    3.12   1.29 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.03    0.00    3.10    8.34    0.00   84.54

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               1.20    68.80  313.80  259.20 10504.00  2950.90    46.96    12.79   22.02   36.72    4.21   1.75 100.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00  315.20  328.00 10828.80  2950.90    42.85    12.96   19.94   36.84    3.69   1.55 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.97    0.00    4.71    8.18    0.00   81.14

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               2.40    23.40  260.00  271.00 10802.40  2856.00    51.44    19.58   35.37   65.30    6.65   1.88 100.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00  287.60  294.40 11120.80  2856.00    48.03    20.09   33.06   59.49    7.24   1.72 100.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.72    0.00    2.93    8.22    0.00   85.13

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              11.40     5.40  656.20   52.60 28537.60  2595.20    87.85    41.57   59.58   62.26   26.21   1.41  99.96
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00  644.20   58.00 28371.20  2595.20    88.20    42.56   61.57   64.59   28.00   1.42  99.96

I attached a picture from vmware disk performance. This morning queue did not increased too much.

multix

I saw that I did not configure auto vacuum in postgresql. Then I configure db to auto vacuuming, and now performance is much better.
If anything changes, I will inform again.