Tag Archives: Stocks

Predicting Stock Prices with Machine Learning in Python: Part I

Over the last few weeks I’ve been keying away at building an application that can analyze stock prices and make use of Python Machine Learning libraries to predict stock prices.

This is the first part of a series of diving into machine learning and building this application. I’ve uploaded the entire project thus far to my personal GitHub repo at: https://github.com/Howard-Joshua-R/investor

I invite anyone and everyone to take a look at the project, fork it, add to it, point out where I’m doing something stupid, and build it with me! If you help, you’re more than welcome to use it for your own advantage.

photo cred: Shahadat Rahman

For this first post, I’ll walk through what I’ve built so far and how the meat and potatoes work.

If you drill into the directories and find the ‘spiders’ folder you’ll find the ‘lstm.py’ file. This particular spider is using Scrapy and an LSTM model to predict the stock price of any stock ticker you pass to it. Let’s take a look at the first piece of this tool, the scraper:

    def start_requests(self):                       

        ticker = getattr(self, 'ticker', None)      
        if (ticker is None):
            raise ValueError('Please provide a ticker symbol!')

        logging.getLogger('matplotlib').setLevel(logging.WARNING) 
        logging.getLogger('tensorflow').setLevel(logging.WARNING)  

        apikey = os.getenv('alphavantage_apikey')                   
        url = 'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY
               &symbol={0}&apikey={1}&outputsize=full'.format(ticker, apikey) 

        yield scrapy.Request(url, self.parse) 

This first function uses Scrapy to reach out to AlphaAdvantage and pull down stock information in JSON format. AlphaAdvantage provides fantastic stock data on open and close prices over the last decade or longer. All it requires is that your register with them to obtain an APIKey. Best part, it’s’ free!

Now let’s break down each piece.

def start_requests(self):                       

        ticker = getattr(self, 'ticker', None)      
        if (ticker is None):
            raise ValueError('Please provide a ticker symbol!')

Here we define the name of our first function ‘start_requests(self)’ and allow Scrapy to know where to start our spider. From their we look to grab the ‘ticker’ argument which is what we are passing through to tell the spider what stock data to collect. I’ve currently tested TSLA = Tesla, AMZN = Amazon, and TGT = Target. Simply providing the ticker in the ‘main.py’ file is enough to set the target ticker. The final 2 lines here simply validate you’ve passed in the ticker argument.

photo cred: Sigmund

The next two lines suppress logs from two of the libraries we’ll use later for predicting our model. MatPlotLib is used to plot points on a graph and TensorFlow is used to help us implement the LSTM training model.

logging.getLogger('matplotlib').setLevel(logging.WARNING) 
logging.getLogger('tensorflow').setLevel(logging.WARNING)  

The following lines set our AlphaAdvantage APIKey and the URL we’re going to hit for our stock data. In this case you’ll want to store your AlphaAdvantage APIKey in your environment variables on your machine under the name ‘alphavantage_apikey’.

apikey = os.getenv('alphavantage_apikey')                   
url = 'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY
&symbol={0}&apikey={1}&outputsize=full'.format(ticker, apikey)

The final piece kicks off the Scrapy spider and once it completes passes the results to our parse function. We provide our newly built URL, which contains our target ticker and APIKey, and our parse function.

yield scrapy.Request(url, self.parse) 

So far just this piece uses a Scrapy spider to reach out to AlphaAdvantage and download stock data in JSON format. In the next piece I will dive more into the parsing and building the machine learning model.

In the meantime feel free to jump out to my GitHub Repo and read through the comments of the lstm.py file. I’ve attempted to include as many notes as I could and left some open ended questions as well. If you have any feedback I’d be more than happy to discuss! If you’re feeling brave and want to submit your own pull request please do!