In this post we’ll take a look at one of the key benefits of cloud computing: scalability. We’ll explore the different scaling options available for your cloud-based workloads, and then we’ll take a look at specific services in AWS that can help you to achieve scalability for your AWS-hosted applications.
Whether you’re starting out with a single instance and hoping to grow, or you have a globally distributed application, you always need to be factoring scalability into your application architecture.
Most businesses are looking to grow, so you need to make sure that your application infrastructure can grow in line with customer demand. But, not only grow—you may also want to be able to shrink resources again during quieter periods to keep infrastructure costs in line with your usage patterns.
Cloud Computing Scalability
Scalability is one of the key benefits of cloud computing. Having access to seemingly limitless resources does to some extent take away the headache of how to scale your application infrastructure in line with demand.
However, you need to ensure that your application is designed to leverage the cloud infrastructure in the most efficient way possible in order to allow your infrastructure to grow and shrink along with your business requirements.
Let’s first take a look at the different ways in which a cloud-hosted application can scale: vertically, horizontally or diagonally.
Vertical scaling, also known as ‘scaling up’, is simply adding resources to your server to cope with increased demand. This could be CPU cores, additional RAM, extending disk volumes, etc.
No changes are made to the application code and no additional servers are added, you are just making the server you have more powerful. Or, in the case of scaling back down again, less powerful.
This is a very commonly used scaling method, but it does of course have very finite limits to how far you can scale—the limit being the largest cloud instance you can use.
Nowadays, you can get some pretty huge instances with lots of cores and terabytes of RAM, but you are of course using a single instance, which is a single point of failure for your application. You will also require downtime (usually just a reboot) to scale up or scale down.
Horizontal scaling, also known as ‘scaling out’, is adding infrastructure to the application. Horizontal scaling requires your application to be broken into ‘tiers’ or ‘microservices‘ and is therefore more complex and costly than vertical scaling, but with the benefit of almost limitless scaling.
Consider a simple three-tier web application, with web, application logic and database tiers. As the load on the site increases, the first part of the application to take the load will be the web tier.
This can therefore be scaled independently of the app logic and database tiers, by simply adding additional web servers and then load balancing the traffic across them.
Diagonal scaling is a combination of using both horizontal and vertical scaling in the same application. Take again the example of our three-tier web application.
While it may be simple to add web servers to cope with additional web traffic, it may not be possible to split the application logic over multiple servers. In that scenario, you could apply vertical scaling to the application logic tier, while still utilizing horizontal scaling for the web tier.
So, now that we understand a little more about the different types of cloud computing scalability, let’s look at what services AWS has available to facilitate scaling in the AWS cloud.
Vertical Scaling in AWS
Let’s take a look at some of the limits of vertical scaling in some of the more commonly used AWS services.
EC2 instances are virtual servers in the AWS cloud. AWS has a huge range of EC2 instances for different workload types. The largest Compute Optimized EC2 instance (c5d.metal) has 96 vCPUs. The largest Memory Optimized instance (u-24tb1.metal) has 24TB (yes, Terabytes) of RAM! That’s some pretty serious compute.
EBS volumes are the hard disk drive volumes which can be attached to EC2 instances. A single General Purpose (GP2) EBS volume can scale to 16TB and 10,000 IOPS. A Provisioned IOPS EBS volume can scale to 16TB and 64,000 IOPS. Crazy performance for a single volume.
EFS (Elastic File System) is a shared storage volume that can be mounted via NFS to a Linux operating system, enabling multiple instances to see the same disk volume. With EFS storage you only pay for what you consume, but an EFS volume is virtually infinitely scalable.
In fact, in our monitoring system our customer EFS volumes show as having 8 exabytes of storage available! I have not checked to see what it would actually cost to store 8 exabytes in EFS—although I think you would probably want to put some of that data elsewhere, such as S3!
S3 Object Storage
Amazon S3 (Simple Storage Service) is the AWS Object Storage Solution—think documents, videos, audio files, etc. Again, S3 is almost infinitely scalable and you only pay for what you store in it. A single object can be up to 5TB in size, and there is NO LIMIT to the number of objects that you can store in a bucket.
That’s a very bold claim, and I guess one that nobody really wants to test the limit of, as that would be a very expensive experiment!
Horizontal Scaling in AWS
AWS has some crazy vertical scaling limits, but in order for your application to have truly limitless scaling, it needs to be able to scale horizontally. Here are some of the AWS services that facilitate horizontal scaling.
Application Load Balancer
Application load balancers operate at layer 7 of the OSI model—the application layer. They are application aware and can load balance HTTP and HTTPS traffic. You can create advanced request routing to distribute load to specific EC2 instances.
Application load balancers do have a few default limits set, some of which can be raised on request. The default limits include 1,000 targets per ALB, 50 listeners per ALB, 50 ALBs per region and 3,000 targets per region. So, although there are limits, they are pretty generous.
Network Load Balancer
Network load balancers operate at layer 4 of the OSI model—the transport layer. They can load balance tens of millions of requests per second at very low latency. Again, you can have 50 NLBs per region and 3,000 target groups per region by default.
Regions and Availability Zones
Regions and availability zones enable you to horizontally scale your application across datacenters and geographies to ensure resilience and proximity to users. An availability zone consists of one or more datacentres in a geographic region, which are physically isolated from one another in terms of power, network and security.
When scaling horizontally, it is best practice to spread workloads across multiple availability zones to mitigate the risk of hardware or facility failure. A region is a geographic area containing two or more availability zones. At the time of writing, AWS had 76 availability zones across 24 geographic regions.
Scaling an application across multiple regions can help ensure the best low latency experience for users of the application.
Autoscaling groups enable fleets of EC2 instances to grow and shrink in line with application traffic or demand. An autoscaling group is defined by a launch configuration on a load balancer.
The launch configuration defines the minimum and maximum number of EC2 instances in the group, and the metrics which trigger the launch of new instances. The triggers can be based on instance health checks, CPU load, and network traffic in or out or the number of load balancer requests per target.
By default, AWS allows you to create 200 launch configurations and 200 scaling groups per region. Again, increases to these limits can be requested. Don’t forget, the default quota for EC2 instances is only 20 per region, so it’s likely you’ll hit this scaling limit first.
Elastic Beanstalk enables you to create simple web applications that scale automatically without you having to think about any of the underlying infrastructure, such as load balancers, EC2 instances and databases.
Elastic Beanstalk supports web apps written in PHP, Ruby, Tomcat, .Net and IIS. Simply upload your application code and Elastic Beanstalk takes care of the rest!
Elastic Container Service (ECS)
Amazon ECS is a fully-managed container orchestration service. Containerized applications lend themselves very well to horizontal scaling. By default, ECS can manage 10,000 clusters per region, with 1,000 services per cluster.
Related reading: Optimizing and Horizontally Scaling Laravel on AWS
AWS Scalability vs. Elasticity: What’s the Difference?
In the context of Amazon Web Services (AWS), scalability and elasticity are both important features that enable customers to adjust their computing resources to meet changing demand.
But, while they are similar in concept, there is a key difference between the two. Elasticity is a more automated and dynamic process, where resources are adjusted in real-time based on demand. With scalability, adding or removing resources in response to changing demand may require more human intervention.
So, there you have it, just some of the reasons why we love AWS scalability. We’ve not touched on diagonal scaling in AWS—suffice to say, you can leverage a combination of both the horizontal and vertical scaling options as outlined above.
And, of course, this is not an exhaustive list—we have not touched on the scalability of AWS databases such as Amazon Aurora or DynamoDB, nor have we considered the scalability of serverless services such as AWS Lambda. Perhaps more on that in a later post!