Why you should worry if CPU utilization based policy does not work for you

Imagine you are in the process of migration of a legacy system from the data center to the cloud to make the system scalable.

If containerization is not an option you will try cloud-native autoscaling. If the basic policies and metrics are allowing you to scale in and out effectively - all good, but what if not?

I have gone through such a situation and I have experienced numerous issues with autoscaling of different backend services (written in Python, Groovy, Java, PHP) so that I learned some lessons I'd like to share.

In this post, I am describing why you should worry if CPU utilization based policy does not work for you.

Usually, the proper utilization of instance resources will cause 70+% CPU utilization in case of high load. That is why it is usually recommended to scale the system out when CPU utilization reaches 50%.

Here are several possible scenarios of resource utilization based on the application type:

  • - CPU critical applications - obviously higher workload should result in higher CPU utilization percentage, as CPU is the main resource they use;
  • - Memory critical applications: if it is an application with automatic garbage collection, i.e. JVM based, at some point when the application has not enough memory it will run garbage collection more often, which will increase CPU consumption;
  • - Multiple threads applications - the more threads are started - the more resources they need from CPU to get managed and processed.

So if the instance is failing health-checks without CPU consumption reaching 70% it may be caused by some application or configurations restrictions which blocks it from using all resources more effectively.

One of the common examples is the Redis instance with several CPU cores i.e. 4 cores, which has performance problems but CPU usage on the level of 25%. This happens due to Redis is a single thread and can use only one core, but the CPU utilization metric is calculated for all cores together. So 100% + 0 +0 +0/ 4 =25%

The other case may be if there is some server configuration restricting the number of threads. In this case, if at some points the application has more requests than allowed threads so that it can not respond to health-check of LB and will be removed.

Another important case is the CPU credit limit: it is applied in such a way that in case of heavy load during one hour the instance can consume all credits for the 24 next hours. In case if CPU credit is used, the CPU consumption will be on the level of 5%.

My advise, in case of any unusual behavior of the application under high load, which was not experienced in the data center check the CPU credit limit and Block storage I/O limit. Please mind that the CPU credit limit is enabled for generic instance types (T) by default.

So any time some instance is observed to be dying before reaches significant CPU utilization it is good to check if one of the described situations takes place and whether to improve application configuration or to change the instance type to use instances more effectively.

Sources & More details:

AWS documentation