Cloud-Native autoscaling in AWS

Imagine the situation, you have a legacy system partially migrated to multiple services and used by the relatively small amount of concurrent users. And one day stakeholders want to create an open API and expose the system to external clients with the load significantly higher than the existing system can handle, and it should be scalable to handle in the future even more.

Of course, the better solution would be to perform containerization and use orchestration i.e. Kubernetes, but what if this is not an option due to business reasons (there's no time to migrate to containers, we want it to run tomorrow)? This article is intended to help those who may get in a similar situation and will reveal some aspects of AWS Cloud-native autoscaling for the multiple-services applications.

I have gone through such a situation and I have tried different autoscaling policies and metrics so that I learned some lessons I'd like to share.

In this post, I am describing general details about autoscaling policies available in AWS and basic metrics for those policies. I will also highlight some possible issues, which may cause the basic policies won't be applicable and custom metrics based policies will be needed.

To add Scaling policy to the Auto-scaling group in AWS Console go to EC2->AutoScaling and select a group you want to scale.

There are 3 possible scaling policies types:

  • - Target Tracking Policy
  • - Simple Policy
  • - Scaling policy with steps

Target Tracking policy will try to keep the selected metric close to the target value by adding or removing instances. It can be applied only to predefined metrics:

  • - ALB Request Count Per Target
  • - Average CPU Utilization
  • - Average Network In
  • - Average Network Out

Simple policy and Scaling policy with steps are based on alarms, which you can create on the same page as Target or in Cloud Watch. Creating Alarm from Scaling policies limited to

  • - CPU Utilization
  • - Disk Reads/Writes
  • - Network In/Out

In Cloud Watch you can create an alarm based on any metric including custom ones. For the Simple policy, it is allowed to specify how many instances should be added or removed when certain Alarm is on. In Step it's possible to specify several steps, which would add or remove the different number of instances based on the metric value, i.e. to add 2 times more instances if metric twice more than the limit set in alarm. Even more customized policies can be created using lambdas.

If Target, Simple or Step policy works well for you - better to use them but there might be a situation when they don't, for instance, in case if alarms on which these metrics are based(CPU Utilization, Disk Reads/Writes, Network In/Out,) does not reflect the real load of the application.

In my case, it was the situation, when without reaching any significant level of CPU usage application stopped responding, the number of connections to this application was growing, causing a chain-effect of all the system goes down. And as the cloud-native autoscaling was used for some of the instances it took up to 5 minutes to recover so this caused downtime 15 to 30 minutes.

In such a situation, it would be required to check what is going wrong with the application and use a more sophisticated approach based on custom metrics.

Sources & More details:

AWS documentation