So after my little experiment with Predicting Stock Prices with Python Machine Learning I came across the AWS Deep Racer League. A machine learning racing league built around the concept of a little single camera car that uses Reinforcement Learning to navigate a race track. I’ve decide to sign up and Compete in upcoming the October race and share what I learn along the way.
Let me give a little bit of background on how the Deep Racer works and the various pieces. There are two ways to train your little race car, virtually and physically. If you visit the Deep Racer site you can purchase a 1/18th scale race car that you can then build physical tracks for, train, and race on. There are also various models you can purchase with a couple of different input sensors so you can do things like teach the car to avoid obstacles or actually race the track against other little cars.
The good news is you don’t actually have to buy one of these models to race or even compete. You can do all of the training and racing virtually via the AWS console. You also get enough resources to start building your first model and training it for free!
Now let’s get into how this actually works. What AWS has done is built a system that does most of the heavy complex machine learning for you. They offer you the servers and compute power to run the simulations and even give you video feedback on how your model is navigating the track. It becomes incredibly simple to get setup and running and you can follow this guide to get started.
When you get setup you’ll be asked to build a reward function. A reward function contains the programming logic you’ll use to tell your little race car if it’s doing it’s job correctly. The model starts out pretty dumb. It basically will do nothing or drive randomly, forward, backwards, zig-zags…. until you give it some incentive to follow the track correctly.
This is where the reward function comes in. In the function you provide the metrics for how the model should operate and you reward it when it does the job correctly. For example, you might reward the model for following the center line of the race track.
On each iteration of the function you’ll be handed some arguments on the model’s current status. Then you’ll do some fancy work to say, “okay, if the car is dead center reward it with 1 point but if it’s too far to the left or the right only give it a half point and then if it’s off the track give it zero.”
The model will then run the track going around, trying out different strategies and using that reward function to try and earn a higher reward. On each lap it will start to learn that staying in the middle of the track offers the highest, most valuable reward.
At first this seems pretty easy… until you start to realize a couple of things. The model truly does test out all types of various options which means it may very well go in reverse and still think it’s doing a good job because you’ve only rewarded it for staying on the center line. The model won’t even take into account speed in this example because you’re not offering a reward for it.
As you start to compete you’ll find that other racers are getting even more complex like finding ways to identify how to train the model to take the inside lane of a race track or how to block an upcoming model that is attempting to pass it.
On the surface, the concepts and tools are extremely easy to pick up and get started with, however the competition and depth of this race are incredible. I’m looking forward to building some complex models over the next couple of weeks and sharing my results. If you’re interested in a side hobby check out the AWS Deep Racer League and maybe we can race against each other!