In this post I’m going to introduce Amazon’s Elastic Beanstalk PHP environment as a platform for Magento. In particular I’ll cover the mechanics and economics of hosting Magento, along with it’s benefits and limitations as a platform. The goal here is to create an auto-pilot environment providing high availability and scalability.
But first, the background. With World Wide Access, we’ve always run our own EC2 instances, ELBs, database servers and memcached. We scale up the instance sizes or counts manually when required. When we started using AWS (in 2008) Elastic Beanstalk was not yet on the scene so we had no choice but to do it that way. But now we do have a choice and, thanks to some downtime in the last week, I’m prompted to gather some thoughts on a migration to a fully auto-pilot set-up. This post is my notes on Elastic Beanstalk and Magento with git for deployment. I’ll add a more detailed setup guide and some benchmarks in a future post – this one will be a bit more abstract, so go make yourself a cuppa.
About the Amazon’s Elastic Beanstalk
Beanstalk brings together various parts of Amazon’s infrastructure: AWS servers, scaling, load balancing and high availability, to give your applications an automated environment to run in with flexible server sizes and instance counts that make growing easy. You can do all the things Elastic Beanstalk does, by combining the separate parts yourself, but this is much easier, trust me.
The case for scalable environments (and cloud computing in general) is pretty well understood these days. But to reiterate, if you have 100 customers 95% of the time, and 1000 customers the day you send out your monthly newsletter, guess how much server capacity you have to pay for all month in a traditional hosting operation? That’s right, enough to handle 1000 customers. With cloud-based solutions, you can ramp up the number of servers when you need them, and stop paying for them when you don’t need them. It also means you can distribute your servers in several data centers (or availability zones), to reduce chances of an outage.
So this sounds great right, whats the catch?
Elastic Beanstalk forces you to build your app in a specific way, and adopt certain development practices. They’re actually good things to do, but if you’re not currently set-up to operate that way, it can be painful. The first is that to use Elastic Beanstalk effectively you need to use git for deployment, you should have version control anyway, but if you don’t this can be a burden. It also means you cannot make changes on the server directly – again, you shouldn’t do that, but it happens to the best of us. Using Elastic Beanstalk requires you to be very disciplined in your development.
Elastic Beanstalk (or any cluster, really) also requires your application to not rely on the filesystem for data sharing between nodes – data can be shared in the database, and via Memcached and thankfully Magento is reasonably well setup to run with those limitations. Admittedly these limitations can be removed by using custom AMI’s, but I’m mainly interested in keeping this solution as low-contact as possible, and doing software upgrades on an AMI is not my idea of fun.
Beanstalk also doesn’t hide the AWS complexity from you completely. You’ll need to manage backups, CDN, Memcached node discovery, Email sending (via a service, not your servers) and a bunch of other things outside the scope of this introductory post but worth bearing in mind.
Magento on the Beanstalk
In this section I’ll cover running Magento on Elastic Beanstalk, I won’t get into too much nitty gritty detail here, we’ll save that for a follow up post, this will just cover the higher level considerations. This is solely based on my initial experimenting with Beanstalk, so any input from others is totally welcome.
My goal with this setup is to keep the environment and the Magento install as vanilla as possible so that the environment isn’t creating a maintenance burden in any way. The default AMI’s are managed by AWS, so they’re perfect candidates.
In a normal Magento install, caching and sessions are handled by the filesystem, data is stored in the database and media such as product images are stored within the Magento installation (/media). This definitely won’t work on Beanstalk. Each instance is isolated, so they do not share disk – that means we need to do something about the cache, sessions and media.
Caching and sessions can be shifted to a separate memcached server (or in high availability situations, a cluster of memcached servers) with the database server for slow second layer caching and tag support. For media, Magento’s built in ability to store media in the database and serve it from there allows us to push any backend media uploads into the database where they can be shared. Other media/template/theme changes will get deployed to the filesystem of each node via git, where they’ll be picked up by the CDN.
Again depending on the store size, you can put these into a completely different database server. Also, once a CDN is in front of the store, the actual database should not be serving very many image requests, they’ll get served from edge locations of the network.
There are other considerations here, such as using alternative caching methods (e.g Redis), front end page caching (e.g Varnish) and general store maintenance such scheduling and running cron, checking log files in a clustered environment and monitoring server health. That’s all outside the scope here, I’ll cover more details in a follow-up post.
Economics of the AWS Cloud for Magento
If you, or someone on your team, doesn’t have an opinion on bash vs zsh and vi vs emacs – don’t read any further in this section. In addition to paying for servers, which I’ll outline below, you’re going to need someone on staff to run this setup for you. If you don’t have someone to do that, or cannot access someone, stick with one of the expert Magento hosts. This is not a one-click managed solution (yet, though I’m sure Amazon is heading in that direction) so you’ll need someone familiar with how it works, and how to diagnose issues and fix them on call.
Still with me? OK, so how much will this cost you, on top of the expertise to run it? Let’s see.
I’m going to run through 3 vague pricing scenarios, from very small store, to higher capacity cluster. I don’t have benchmarks to quantify these levels in this post, but will add them in a follow-up post. If my early findings are way way off base, I’ll update the below scenarios to reflect it.
Please note these are back-of-an-envelope calculations, don’t mortgage the house on these numbers – do your own calculations and testing!
These are the workhorses, running the PHP code and generating the pages.
|Small||1 small, 24×7||$45|
|Medium||1 medium, 24×7 and +2 medium, 50% of month||$210|
|Large||2 xlarge, 24×7 and +4 xlarge 50% of month||$1670|
This varies by how much you need to cache and if you want redundancy.
|Small||1 micro, 24×7||$16|
|Medium||2 small, 24×7||$108|
|Large||2 medium, 24×7||$223|
Magento can be unkind to a database server. As with memcached, depending on storage requirements and redundancy these will vary.
|Small||1 small, 24×7||$65|
|Medium||1 medium, 24×7||$130|
|Large||1 xlarge, 24×7||$525|
Other items: Bandwidth for CDN and ELB, EBS storage, Amazon SES, extra servers for test environments…
I’m putting this section here as a reminder that it exists, but costs here will depend on your store. If I was plucking a number out of thin air, I’d say add 15% of your base monthly costs to buffer for these sorts of extras, this depends massively on your traffic profile and site content though.
Rough totals, not including 15% buffer from above.
Without the benchmarks, it’s hard to know if that’s good value or not – certainly a good Magento host will be cheaper than $100/month for a small store, and it’d be fully managed and likely high performance. The anticipation is killing me!
1) It’s possible to reduce these costs by reserving instances – if you know you’ll need certain capacity for a specific amount of time. Use these as a guide only.
2) I’ve used US region pricing.
3) If you want more redundancy you’ll need to add extra servers/availability zones.
4) I’m ignoring frontend caching for now, we’ll look at that in the benchmark post later
(not really even) Benchmarks
In the process of setting up a test environment, I couldn’t resist doing a little performance testing – albeit on a very very small scale. I mainly wanted to test the auto-scaling thresholds worked, and the impact of the gruntier EC2 instance types. I’m not even going to allow myself to call these benchmarks. I ran siege against an (almost) empty vanilla Magento install on small cache and DB nodes.
Here’s what I saw.
3 Medium instances: 20 concurrent users 8.5 transactions/s @ 2.3s each
2 XLarge instances: 20 concurrent users 15 transactions/s @ 1.3s each
A big increase in performance from the bigger hardware. Can’t wait to fully test these scenarios.
Phew! I’ll add a blog post outlining the basics of a Beanstalk setup and run some thorough benchmarks against different server sizes/instance counts to get some idea of the flexibility and scalability of this solution.
PS: If anyone wants to use a copy of their store as a guinea pig on this platform, let me know your product/order counts and if suitable, I’ll replicate it for testing.