Tag Archives: SLA

What is an SLO?

It means that you should work carefully and SLOwly…

Nah, I’m just kidding that’s not what it means at all. It actually stands for Service Level Objective, but what does that even mean? Is it like an SLA? What’s an SLA? Is that like an SLI? What the hell is an SLI…?

Don’t sweat any of it as this is the first part in an upcoming mini-series on what the hell all of the SL(insert letter)’s really are. Let’s dive in!



An SLO represents a level of service that a business intends to meet for it’s customers. In particular, it is an objective, a goal, or a bench mark. It is the target that the company has set to aim for and reach for and it is the mark the customers and clients will come to expect. So what goes into an SLO?

Defining an SLO can be done in a number of ways. Some of the easier ways to define and set an SLO directly relate to technology. For example, a company such as AWS may set an objective at having their services up and running 99.99% of the time. That is their objective and goal. It is what they work towards maintaining and being at, at all times.

photo cred: Christian Wiediger

If AWS has an outage, let’s say the power goes out somewhere, and their system goes down for a couple of hours they would no longer be at their objective of being up for 99.99% of the time. This would let the AWS team know they need to create and invest in ways to mitigate such outages like routing traffic to a different data center.

AWS just so happens to provide an SLA (Service Level Agreement) which states some of their SLO’s, you can view it here: Amazon Computer Service Level Agreement. An SLA is merely the agreement between AWS and their customers so that if they are not meeting their SLO they can provide credit in return for the lack of service they have agreed to meet. Think of it as a way of saying, “hey we’re sorry we didn’t do what we said we were going to do. Here’s a refund.

Obviously missing their SLO’s and having to offer up credits is not something AWS wants to do, which is why you’ll notice there is rarely a service outage for AWS. IT does however let customers and clients know that AWS is committed to providing top-tier service. I wonder if that has anything to do with why they are so widely used…. 😉

Advertisements

If you take a peek at the AWS SLO link above you can see that they don’t actually target having their systems up and running 100% of the time. Why is that? The reality is that 100% is not realistic.

Consider the following example, in a single day there are 1440 minutes and let’s say that there is one tiny, minor, little hiccup in the internet. Let’s say it’s so tiny that it doesn’t even take up a full second. Instead it takes up milliseconds… like… .0144 seconds. That little blip would cause AWS to miss their 100% mark. Perfection is the enemy of progress. Remember that.

Instead, most services aim for somewhere that’s more acceptable. In some cases it can be 99.999% and in other cases it can be 80% (think of an internal service that provides customer data back to AWS. It’s not a critical system so if it fails 20% of the time, it’s not the end of the world). The point is that an objective is set and the company strives to achieve it.

Advertisements

Now I know we dove in a little deep there and the turns got twisty. That tends to happen when you start talking SL(insert letter here)’s because there is no hard and fast right way, BUT there are some best practices and I’ll continue this series and dive in a little deeper each time.

Hopefully you learned a little bit about what an SLO is and how it relates to the service a company is aiming to achieve for it’s customers. I recommend taking a look at another SLA from Google to help paint the picture. (remember SLA is the agreement between company and customer, the SLO is the actual target the company is aiming for, the 99.99%): Google Computer Engine Service Level Agreement.

Read the next article in the series: What is an SLI?