Processing 5mil events for $7 – Serverless Cloud AWS Part 1
This blog post is one of three where we share some of our insights into migrating a legacy application to the AWS cloud, implementing a Serverless Architecture.
The Business Need
We were faced with the challenge of having to replace a legacy service that was available to over a million customers via mobile interfaces. As a lot of the customers were using USSD, we had very strict latency limitations (as USSD will time out if there is no response within 15 seconds). We expected very little documentation, in reality there was no usable documentation. Most of the information on hand had become redundant during the past few years when undocumented changes were made. We needed to interpret the business needs and rely on our experience to build in additional non-functional requirements.
Mobile Traffic – Big spikes in data
As users wake up, they start to transact, using their mobiles, causing a very steep increase in traffic in the morning. The traffic volumes go from a few hundred to over 12,000 in a couple of hours, requiring processing capacity to double every twenty minutes.
To complicate the capacity management further, we have peak days were traffic is double the traffic that we would expect on a normal day. For most of the month we need to process about 12k transactions per hour with a reserve capacity of 20k. The peak load day and the month-end reporting coincide, so we have to run reporting on the day where we have the biggest customer demand.
If we had done a typical capacity management with dedicated resources, we would have been using less than half the capacity when the system was busy and then idle for about another 12 hours a day. It was clear that a Serverless architecture would be ideal as we would get instantaneous, automated scaling, whereas waiting for minutes for resources to spin up could cause cascading failures.
AWS Serverless Architecture
AWS Lambda
The architecture we chose to use to solve the customer driven transactions was to go with a Serverless implementation on AWS Lambda; also described as a Function as a Service as the code is executed in a stateless AWS Lambda container. The Lambda functions are event-triggered and AWS manages the number of concurrent functions available, so we just had to write the code and AWS would scale it. The AWS Lambda business model is that we only pay for the execution time, so when we have low demand we incur no costs.
AWS Lambda Cold Starts in a VPC
We needed to have the AWS functions inside a Virtual Private Cloud (VPC) for added security, so we had to deal with the challenge of Lambda “cold starts” and the additional latency of the VPC.
There are nightmare stories of just the VPC adding 10 seconds in test cases (remember we had a total transaction limit of 15 seconds), so we knew from the start we had to optimize our lambda functions as much as possible to ensure that we had as few timeouts as possible. As this was a migration of a legacy application that was failing during peak loads, we had time constraints to migrate as soon as possible.
AWS Lambda can scale to hundreds of concurrent transactions, but we needed to achieve the transaction latency constraints (max 15 second per session) so we had to be sure we could manage the cold starts effectively. Relying on our knowledge of AWS Lambda and experience in creating efficient software, we were able to implement an innovative optimization strategy which reduced the number of timeout errors by 99%. To top it off, with an extra bit of clever configuration in API Gateway, we were able to smoothly integrate the handful of remaining timeouts into the USSD flow, thereby completely eliminating dropped USSD sessions and ensuring a great user experience.
AWS Lambda Cost Optimization
AWS Lambda costs calculated are calculated in Gigabyte Seconds so in order to get a cost effective Lambda implementation we needed to have code that made efficient use of CPU and memory resources. We spent some time optimizing the memory required as there is a balance between adding more memory and not getting any reduced execution time.
Lambda Pricing Details
Requests – $0.20 PER 1M REQUESTS THEREAFTER $0.0000002 per request
Duration – $0.00001667 FOR EVERY GB-SECOND
AWS Serverless Results
In production, we are able to achieve an average Lambda function latency of 1 second (on average every 7th transaction includes an API call to a third party). On a normal day the concurrent Lambda functions scale on demand from 2 through to 18. The default AWS Lambda capacity is 1,000 concurrent Lambda functions.
The Result
On average we do between 5 and 6 million transactions a month and the biggest AWS Lambda bill we have paid was $8.
Our core transaction processing costs are very stable and predictable. We achieve near linear increases in cost as we add more load to the platform. The software we replaced would have increased latency during peak load, where we are able to achieve decreased latency during peak demand.
If you would like to know how we can assist you with migrating a legacy application to the AWS cloud, you can contact us by clicking here.