About this site

Welcome to the official Tumblr of Cloudability.

We cover your *aas.

PaaS, IaaS, SaaS, we've got you covered. Because the cloud is infinitely scalable but your dollars are not.

Find me on

Liked on Tumblr

More liked posts

we cover your *aas

Three cloud ops lessons you should learn before your next outage « Cloudability by Jeremy Wagner-Kaiser
It’s a quiet Thursday night in Portland. It’s about 9:45 PM. Your humble author is preparing to meet a friend for drinks.
These rather promising plans are rudely interrupted when the on-call engineer informs me that our alerting systems are doing their best christmas tree impression and it doesn’t seem to be stopping. Hilarity ensued.
When the dust settled and the systems were purring along again, it was time to look back and draw a few lessons.
The first lesson of cloud ops is simple: things break.
This doesn’t sound all that bad. It’s how and when things break that gets rough.
The recent AWS outage is a perfect example. No instances were rebooted by the service going down, but that wasn’t the problem. The problem was the cascading failures.
The AWS outage meant that Heroku failed. This means that our app fell over. Moreover, it meant that many of the services we depend on fell over. Some of them were responsible for logging, monitoring, or exception handling. While those were important, they weren’t critical to continued operation.
No, the real problem was that our Redis service died. The Redis service was used to connect our tightly secured backend boxes to our frontend. When AWS and Heroku came back, our Redis service didn’t. This caused an interesting array of internal errors and we learned quite a bit from it.
Chiefly, though, we always approach our systems as if something can break. Because it can, and it will.
The second lesson of cloud ops is also simple: things change.
[Keep reading on our blog!]

Three cloud ops lessons you should learn before your next outage « Cloudability by Jeremy Wagner-Kaiser

It’s a quiet Thursday night in Portland. It’s about 9:45 PM. Your humble author is preparing to meet a friend for drinks.

These rather promising plans are rudely interrupted when the on-call engineer informs me that our alerting systems are doing their best christmas tree impression and it doesn’t seem to be stopping. Hilarity ensued.

When the dust settled and the systems were purring along again, it was time to look back and draw a few lessons.

The first lesson of cloud ops is simple: things break.

This doesn’t sound all that bad. It’s how and when things break that gets rough.

The recent AWS outage is a perfect example. No instances were rebooted by the service going down, but that wasn’t the problem. The problem was the cascading failures.

The AWS outage meant that Heroku failed. This means that our app fell over. Moreover, it meant that many of the services we depend on fell over. Some of them were responsible for logging, monitoring, or exception handling. While those were important, they weren’t critical to continued operation.

No, the real problem was that our Redis service died. The Redis service was used to connect our tightly secured backend boxes to our frontend. When AWS and Heroku came back, our Redis service didn’t. This caused an interesting array of internal errors and we learned quite a bit from it.

Chiefly, though, we always approach our systems as if something can break. Because it can, and it will.

The second lesson of cloud ops is also simple: things change.

[Keep reading on our blog!]

Tags cloud cloud computing geek portland startup tech cloudability

 Source blog.cloudability.com