Agent reports "High CPU utilization" when running inside LXC

Started by Borgso, February 26, 2020, 08:46:04 AM

Previous topic - Next topic

Borgso

We have a few nodes inside LXC on Proxmox.
They get stuck on "High CPU utilization (100.000000%)" when CPU load hit that, but does not leave 100% before shutdown LXC and start again.

Looking at load avg, it isnt having much work todo.
How does NetXMS agent measure the CPU load?

16 CPU cores 32 Threads
@proxmox:~$ w
07:39:43 up 15 days,  9:28,  1 user,  load average: 0.59, 0.64, 0.72

1 vCPU
@lxc01:~$ w
07:36:43 up 1 day, 12:02,  1 user,  load average: 0.64, 0.73, 0.77

1 vCPU
@lxc02:~$ w
07:38:35 up 1 day, 13:18,  2 users,  load average: 0.59, 0.64, 0.73



Filipp Sudanov

Looks interesting. Does it happens for all LXC containers on Proxmox or only for some of them? What is Proxmox version and what are the systems that are insde the containers?

Borgso

Latest proxmox 6.1 and up2date with apt.
Containers are Ubuntu 18.04.4 also up2date with apt.

Does not happen with all lxc in the cluster.
Had 2 out of 3 on same proxmox node having this when reporting.
It happens randomly, but it seems to hang after the containers have had a big load.

Filipp Sudanov

NetXMS takes CPU load information from /proc/stat. We just read this file once per second, calculate deltas and divide the deltas by sum of them.
Please run below command on a container that got into that state and share the output:

n=1; while [ $n -le 60 ]; do cat /proc/stat | grep cpu0; sleep 1; n=$((n+1)); done

Then restart the container and get these stats again so we could have some data for comparison.