Back in June last year I wrote about creating a WordPress cluster on Amazon’s EC2. In this post I’ll run through a couple of the problems with that cluster, I’ve experienced, and how I solved them with Amazon’s Auto Scaling service.
The problems with the cluster
A couple of things were not ideal with the cluster that I’ve been putting up with for far too long but finally set aside some time this afternoon to fix.
1) The price of micro-sized spot instances spikes, to crazy prices, sometimes.
This has meant that although the price of micro instances while they’re running is cheap, when the price spikes they all die off and leave the cluster vulnerable. Unfortunately to set up auto-scaling in combination with spot-priced micro instances would require coding up a hybrid solution with shell scripts, and although I enjoy tinkering with this setup, I can’t justify that much effort when an out-of-the-box solution exists, Auto Scaling.
2) Even when using on-demand instances, they just die sometimes
I had set up a CloudWatch alarm to email when the number of healthy instances drops below my minimum level (2 instances). This normally means the cluster is getting a bit weak, and more often than not shortly after getting this email, I get my monitoring email to say the site is down. This was happening infrequently enough for me to tolerate, but frequently enough to be a hassle – I’d have to keep stopping the unhealthy instance and firing up a new one once every 7-10 days.
So… with those two pain points nagging me, I present the next iteration of the micro instance WordPress cluster, the self healing, self scaling cluster – all thanks to Amazons Auto Scaling and CloudWatch services.
Setting up Auto Scaling
It’s actually not too hard to set all this up here’s roughly 4 steps that should do it.
0) Make sure you have the right AWS tools installed and setup.
You’ll need the Auto Scaling tools and if you haven’t got them already, the latest EC2 tools – though not strictly needed, it’s worth setting them both up.
Chuck them in some sensible location like ~/bin/AWS/ and then if you’re bash user, a ~/.bashrc with these in it, will help:
export KEY_HOME=/Users/you/bin/AWS/your-aws-certs export EC2_PRIVATE_KEY=$KEY_HOME/pk-ABCDEFG123746293642325354.pem export EC2_CERT=$KEY_HOME/cert-ABCDEFG123746293642325354.pem export EC2_HOME=/Users/you/bin/AWS/ec2-api-tools-1.5.2.4/ export JAVA_HOME=/Library/Java/Home export AWS_AUTO_SCALING_HOME=/Users/you/bin/AWS/AutoScaling-1.0.49.0/ export PATH=$EC2_HOME/bin:$AWS_AUTO_SCALING_HOME/bin:$PATH
You should see your current instances by running ec2-describe-instances – if you do, then everything appears to be in order.
1) Create the Auto Scaling launch configuration and group
as-create-launch-config $YOUR_CONFIG_NAME --image-id ami-123456 --instance-type t1.micro --group $YOUR_SECURITY_GROUP -monitoring-disabled
Note: I use -monitoring-disabled because I want the basic monitoring, not the premium detailed monitoring. If you’re a real cloud high-roller, splash out on detailed monitoring by using the flag -monitoring-enabled
as-create-auto-scaling-group $YOUR_GROUP_NAME --availability-zones us-east-1a --launch-configuration $YOUR_CONFIG_NAME --desired-capacity 2 --min-size 2 --max-size 4 --load-balancers $ELB_NAME --health-check-type ELB --grace-period $MAX_TIME_IN_SECONDS_IT_TAKES_TO_BE_HEALTHY
Note: $MAX_TIME_IN_SECONDS_IT_TAKES_TO_BE_HEALTHY can be whatever suits for your servers and application, I use 5 minutes and it seems fine so far.
In my example group above I have a max of 4, a min of 2 and a desired capacity of 2. Auto Scaling will ensure your instance count remains within those parameters. Desired capacity means, the number of instances you normally have running.
Your numbers will be different depending on your traffic requirements. I suggest you set a maximum above your desired capacity, so that you can add an alarm (see step 3 below) that increases your instance count when you get featured on reddit or slashdot…
2) Create the policy for scaling up
as-put-scaling-policy $YOUR_POLICY_NAME --auto-scaling-group $YOUR_GROUP_NAME --adjustment=1 --type ChangeInCapacity
This policy will add 1 extra instance, but there are other policy types, that can do things like ensure a specific number of instances. The –help options for the command line tools will guide you on this well, they’re very useful and I like them.
3) Create the actual alarms that invoke the policy in 2)
This is easiest in the actual CloudWatch web-based console. It will step you through a wizard. I suggest using both the policy, but also to add a notification, so that you get an email when the autoscaling happens – that way you can know if it’s running amok.
Here’s my setup with 2 alarms, one for self healing, and one for ramping up for big traffic:
Alarm: Unhealthy
Threshold: HealthyHostCount < 2 for 5 minutes
Actions:
in ALARM state –
Use policy “AddInstances (Add 1 instance)” for group “your-group”
Send message to topic “addingInstance” (you@gmail.com)
Alarm: HighTraffic
Threshold: RequestCount >= 100 for 5 minutes
Actions:
in ALARM state –
Use policy “AddInstances (Add 1 instance)” for group “your-group”
Send message to topic “HighTraffic” (you@gmail.com)
4) Test your handy work
Choose your favorite way of breaking a server, this will probably be sufficient on most Debian systems:
sudo /etc/init.d/apache2 stop
Within 5-10 minutes the auto scaling should have kicked in, started up a new instance and killed your failing one.
I hope it goes without saying, don’t do this on your mission critical production server…
A couple of cool side effects of doing this the AWS way, rather than rolling your own shell scripts to start/stop instances and add them to the ELB.
- Auto Scaling will terminate unhealthy instances and remove them from the load balancer cluster automatically.
- Auto scaling will add newly created instances to the cluster automatically.
I’ll update this article if I find any problems with the setup I have described here. If you try this on your own cluster, please let me know your results.
Update 31 Jan 2012: So one obvious problem is that it increases instances, and never decreases them, doh!
You’ll want something like this give it a name like removeInstance:
as-put-scaling-policy $YOUR_POLICY_NAME --auto-scaling-group $YOUR_GROUP_NAME --adjustment=-1 --type ChangeInCapacity
Then set an alarm for ‘normal’ traffic and have it reduce the instances.
Another issue is I found, the minimum number of instances in the configuration group, should also be your desired capacity (probably, your setup may vary). To do that run a as-update-auto-scaling-group like so:
as-update-auto-scaling-group $YOUR_GROUP_NAME --availability-zones us-east-1a --launch-configuration $YOUR_CONFIG_NAME --desired-capacity 2 --min-size 2 --max-size 4 --load-balancers $ELB_NAME --health-check-type ELB --grace-period $MAX_TIME_IN_SECONDS_IT_TAKES_TO_BE_HEALTHY
I have also updated the above original creation of the group command to reflect the minimum = desired capcity change.
You might also be interested in:
Hi! I'm Ashley Schroder, a
Could this work for magento? I guess you would need to split the db and the web servers on to different instances? And does autoscaling require any server configuration? How does AWS know how to configure the new instance? E.g. With linux users, ftp passwords etc… This sounds a whole lot easier than paying for rightscale and building servertemplates… What about the ELB, is that set up automatically? How would it distribute traffic, round-robin? Sorry for all the questions!!!
Yes totally can work with magento. It uses an existing machine image (AMI) to b the template for each server that stops and starts.
Hi Ashley,
I’ve been following your blog for awhile, always good stuff! So we are currently trying to load balance our Magento site but have run into problems with a good Admin solution that will allow us to propagate images to all nodes. I don’t know if you’ve actually tried this yet, but if you set up a load balancer and give it a domain name, ie load.com, and access the Admin panel through load.com/admin, then you never know which node you might randomly be using for admin access. That said, setting up a new item through a random node and uploading an image means only that node gets an image. We have other issues with server side files not propagating, but that’s a custom extension–similar problem, nonetheless. Any ideas how to put the images in one place? S3 bucket? I’ve seen One Pica, but that doesn’t solve problems with shared writeable files as well as images… and I’m not even sure it works that well for images in a load balanced or auto scaled environment. On the other hand, and this relates to another post of yours, we are looking at using MageMojo instead of AWS to hopefully reduce the monthly hosting bill and avoid the need to load balance at all. Thanks in advance for your insight!
Hi, I’d consider mounting an NFS point and symlinking var/media (and related folders) so that all the servers share the same images (and even cached versions of those images).
Once an image is loaded the first time by your CDN (if you use one) then it won’t hit the cluster again for the image.