Tag Archives: SLO

What Is An SLI?

In a previous post I covered what an SLO is: What is an SLO?



To provide a brief summary, an SLO is a Service Level Objective, or rather a target that we aim for when providing a service to customers. It allows us to draw a line in the sand in regards to how well we are operating.

Once we know where we want to aim with our SLO, we have to dive deeper into knowing if we are getting there. While the SLO defines what our objective is, we need a measuring stick to truly know if we are meeting that objective or not. Insert the SLI.

An SLI, or Service Level Indicator, is that measuring stick. But how does it actually work? I think the best way to understand the relationship between an SLO and an SLI is with an example.

Advertisements

Let’s say we’re at a carnival and we happen to stroll by a few games. One of the games is a basketball game. Make as many baskets as you can in 1 minute. Well there’s your objective right there, make baskets.

But it’s not quite an SLO yet. To be an SLO, we need to have a target, so let’s be really aggressive in this example and say that we want to make 99% of our shots. Two 9’s of bucketability.

photo cred: Markus Spiske

Now that we have our SLO (99% of buckets made) we need an SLI… Something to measure how well we are doing. We could of course count the number of shots made by hand but let’s say we’re a single carnival vendor running 3 or 4 booths. Go even bigger, let’s say we’re a multi-million dollar company trying to put an SLO on system Response Times. You gonna count those by hand?

Me either.

So let’s automate this and have our basketball game install a little flipper inside the basket so every time a shot goes in, the flipper gets pressed down, and the score on the game goes up by one. That little flipper is our SLI. It is our tool for measuring how well we are doing. How close we are to being at two 9’s of bucketability.

Now let’s bring in a player. Our player comes in and starts shooting. They play quite a few games and they end up making 194 out of 200 shots (ballin‘) which translates into a 97% shot rate. Since our SLO is 99% of buckets made, this would fall below our SLO.

Our SLI, or indicator, should then properly alert and/or notify us that we are not meeting our objective and that we may need to make changes. So let’s send our player out to training for a few days. Let’s improve the system that makes buckets per se.

Once our player returns, we line them back up, and have them start shooting again. This time around they make 198 out of 200 shots nailing the 99% SLO!

Advertisements

We’ve just used our SLI to track how many shots were made and then to notify us when we were below our SLO. We used our SLO to recognize we need to provide more training for our player and after doing so, we’re able to re-utilize our SLI to confirm the improvement. Buckets!

While this example was very simple, in a world of technology where counting goes from 200 buckets to 2 million web requests over an hour, having an automated SLI becomes critical to properly measuring where your system is at in regards to the service it is providing to your customers.

The only thing worse than not providing great service with your products is not knowing the service you’re providing at all.


What is an SLO?

It means that you should work carefully and SLOwly…

Nah, I’m just kidding that’s not what it means at all. It actually stands for Service Level Objective, but what does that even mean? Is it like an SLA? What’s an SLA? Is that like an SLI? What the hell is an SLI…?

Don’t sweat any of it as this is the first part in an upcoming mini-series on what the hell all of the SL(insert letter)’s really are. Let’s dive in!



An SLO represents a level of service that a business intends to meet for it’s customers. In particular, it is an objective, a goal, or a bench mark. It is the target that the company has set to aim for and reach for and it is the mark the customers and clients will come to expect. So what goes into an SLO?

Defining an SLO can be done in a number of ways. Some of the easier ways to define and set an SLO directly relate to technology. For example, a company such as AWS may set an objective at having their services up and running 99.99% of the time. That is their objective and goal. It is what they work towards maintaining and being at, at all times.

photo cred: Christian Wiediger

If AWS has an outage, let’s say the power goes out somewhere, and their system goes down for a couple of hours they would no longer be at their objective of being up for 99.99% of the time. This would let the AWS team know they need to create and invest in ways to mitigate such outages like routing traffic to a different data center.

AWS just so happens to provide an SLA (Service Level Agreement) which states some of their SLO’s, you can view it here: Amazon Computer Service Level Agreement. An SLA is merely the agreement between AWS and their customers so that if they are not meeting their SLO they can provide credit in return for the lack of service they have agreed to meet. Think of it as a way of saying, “hey we’re sorry we didn’t do what we said we were going to do. Here’s a refund.

Obviously missing their SLO’s and having to offer up credits is not something AWS wants to do, which is why you’ll notice there is rarely a service outage for AWS. IT does however let customers and clients know that AWS is committed to providing top-tier service. I wonder if that has anything to do with why they are so widely used…. 😉

Advertisements

If you take a peek at the AWS SLO link above you can see that they don’t actually target having their systems up and running 100% of the time. Why is that? The reality is that 100% is not realistic.

Consider the following example, in a single day there are 1440 minutes and let’s say that there is one tiny, minor, little hiccup in the internet. Let’s say it’s so tiny that it doesn’t even take up a full second. Instead it takes up milliseconds… like… .0144 seconds. That little blip would cause AWS to miss their 100% mark. Perfection is the enemy of progress. Remember that.

Instead, most services aim for somewhere that’s more acceptable. In some cases it can be 99.999% and in other cases it can be 80% (think of an internal service that provides customer data back to AWS. It’s not a critical system so if it fails 20% of the time, it’s not the end of the world). The point is that an objective is set and the company strives to achieve it.

Advertisements

Now I know we dove in a little deep there and the turns got twisty. That tends to happen when you start talking SL(insert letter here)’s because there is no hard and fast right way, BUT there are some best practices and I’ll continue this series and dive in a little deeper each time.

Hopefully you learned a little bit about what an SLO is and how it relates to the service a company is aiming to achieve for it’s customers. I recommend taking a look at another SLA from Google to help paint the picture. (remember SLA is the agreement between company and customer, the SLO is the actual target the company is aiming for, the 99.99%): Google Computer Engine Service Level Agreement.

Read the next article in the series: What is an SLI?