Dealing With Business Emergencies
15 Feb 2008
Fun morning for me today. I use Amazon S3 service for storage of various files, including the static media for most of my websites. This morning there was a major outage across the AWS farm. After noticing the errors I began to scramble to figure out what was going on. Obviously the issue was on the remote side (Amazon) so I had to sit and wait it out.
The outage seemed to last about 3 hours (for me, others are still reporting issues.) While S3 is an awesome service and very affordable, this shows that nothing is fail proof. Even with this outage, their service falls into their 99.9% SLA.
These things happen
I've spent a lot of time on the other side of this fence. Trying to figure out what is causing a major outage, dealing with pissed users and keeping the bosses away long enough to get my job done. I understand that "these things happen," but how do you convey that to your customer who's service is effected as a result?
It's also difficult to accept when you are losing money as a result of an outage such as this. Luckily I keep backups of all my sites so if I needed to I could upload the static content and change the templates to reflect that. But that is a lot of work and money will still be lost during the down time.
How to deal?
There really is no need for huffing and puffing in the beginning. Especially since the service is still operating under their 99.9% uptime SLA. Once it begins operating outside of that SLA, is when the yelling may start. Obviously the situation will be different for everyone. There are a lot of startups who depend on Amazon AWS services right now.
You have to make a rational decision when is it time to quit the service. Do you have a back up plan? If not, you should. If your history with the service in question is a very positive one, then forgive them for their "bad" and continue on. If they have a bad or unstable history then look into an alternative and take your business else where.
A lot of times switching companies/services is easier said than done. It will probably come down to which scenario loses more money. Putting up with the bad service or the amount of man hours required to switch. In the long run, it's almost always smarter to dump the bad service.
In the case of Amazon, I've been very happy with their service and this outage today wasn't that big of an issue though I'm sure many would disagree.