How To: Recover from Failed Amazon EC2 Instances (and fail they will)
8 Comments Published by boris February 7th, 2011 in aws, technologyOne of the things that’s not immediately obvious about Amazon EC2 instances is that they could fail, in fact Amazon says:
It’s inevitable that EC2 instances will fail, and you need to plan for it. An instance failure isn’t a problem if your application is designed to handle it.
The EC2 forum posts are littered with users whose EC2 instances have become unresponsive and can not be stopped or restarted. Instances can get “stuck” in “stopping” mode for 24 hours or more. Amazon generally recommends issuing a forced stop via the client tools “ec2-stop-instances –force” command, but this actually doesn’t seem to work in most cases.
Luckily, Eric Hammond wrote a post about how to move EC2 instances to new hardware if such a problem were to occur (as it did to me). Eric’s solution relies on the client tools under Linux.
It turns out that its possible to replicate these steps directly in the Amazon panel and quickly recover from a failed instance. I recommend everyone follow these steps to prepare for a failure scenario:
- In the “Instances” panel: create a new instance using the same AMI as your production instance. This is your backup instance. “Stop” the instance after it is created. (Amazon will not charge you for any stopped instances).
- In “Volumes”: detatch and then delete the drive that was created as part of this new instance.
- Still in Volumes: create a spapshot of your production drive.
- Go to the “Snapshots” section of the panel, select your new snapshot and choose “create volume from snapshot.” Be sure to choose the same availability zone as your instance. I’ve seen some caching issues here, so if you don’t see your snapshot when selecting this menu, be sure to refresh.
- Go back to “Volumes” and choose “attach volume” on your new available volume. Choose your stopped backup instance and type in the same device as your original volume (visible under “attachment information” for the volume)
- Go ahead and start your backup instance, it should be an exact copy of your production instance.
- Sleep better at night.