Friday, January 11, 2013

Predicting the Stock Market

I've got a new business venture called "StockCast.it". We provide a new way of analyzing the news of thousands of different publicly traded stocks. I was drawn to this project for the usual reasons (read: money), and also because the technology is incredibly interesting. I'm going to explain the design philosophy of this project, and why I think it works.

We all know that people who predict the future professionally are overwhelmingly really bad at it. Financial analysts, political consultants and pundits, sports commentators, and so on usually do no better than chance. When people do have a good track record, it's usually over a fairly short time, and they're subject to dramatic reversals. Myron Scholes, who won the Nobel Prize for his work on the Black-Scholes Theorem, started the hedge fund "Long-Term Capital Management". Unfortunately, "long-term" wasn't a very good description. Although it had a good start, the fund lost $4.6 billion in a few months. That's over four-hundred dollars per second for four months. Oops! How quickly could you shovel twenty-dollar bills into a furnace? I think I could do it that quickly, but only if I didn't need to rest.

Myron Scholes is smart. Very smart. So are lots of other people with equally abysmal (although not quite as spectacular) track records. What are they doing wrong?

In my opinion, the answer is that they're outsmarting themselves by doing a few things incorrectly:

  • They factor in too many variables. If you're trying to predict a system, and you do so by accounting for a huge number of distinct variables, you'll be guaranteed to fit the data. But then, when you try to extrapolate from your model, it won't work. This is a common phenomenon called "overfitting". It consists in trying so hard to fit the past data that the model becomes overly complex and loses any predictive power.
  • They make excuses for their failures. You can easily go back over your predictions, and find reasons for all the bad ones. It's human nature to say, "My method was okay, but there were special one-time circumstances that caused that specific prediction to go wrong."
  • The lose track of the common-sense justification for their predictions. As the model becomes more complicated, whatever plausible justification you once had gets lost. That's a sign that you're now failing to pay attention to the most important features of the system, and that you're getting swamped by small, accidental features that will probably turn out to be misleading over the long run.


Financial markets are complicated, and I'm not an economist. So I tend to think about markets in an overly simple way. As it turns out, there happens to be a well-known family of economic models that operate with the same assumptions. Sometimes, they're called "heterogenous investor models". In these models, there are two kinds of investors. One kind is the Warren Buffet type of investor. This type of investor does a lot of research into a company's fundamentals. They don't particularly care what they price of the company's stock happens to be right now, or how it's moving. And they pay no attention at all to the small day-to-day (or second-to-second) price movements. They look for companies that are doing the right things, have solid fundamentals, and then this type of investor buys and holds for the long term.

The other type of investor is the exact opposite. This investor doesn't care about the company's fundamentals, or even what industry they're in. They look at technical data and try to predict what direction the price will move over the short run. "Day traders" are often like this. They have lots of charts and numbers and try to get ahead of small-term momentum and swings in the market. They buy and sell the same stocks very quickly.

By this way of looking at the market, these two types of investor determine the prices of stocks. When the price of stock has moved far away from the company's fundamentals, we have the day traders to blame. When the price corrects, and is in line with fundamentals, that's due to the Buffet-style investors. What these models predict is that when the day traders have moved the price of a stock too far from its fundamentals, we expect a price correction. So if you see a stock whose price is too high or too low, you can make an informed prediction for how it will move in the near- to medium-run.

Of course, if it were that easy to tell when the price of a stock was "too high" or "too low", investing would be easy. Ideally, what you'd do if you wanted to make such a judgment would be to read everything you possibly could about every company. If you could learn every detail about everything that happened to a company (or entire sector of an economy), you could determine whether you should be optimistic or pessimistic about it. Then, you could tell if the price was too high, too low, or just right.

This isn't revolutionary, by any stretch of the imagination. Investors talk about "market sentiment" all the time, and they have lots of ways of measuring it. Some people try to aggregate the judgments of large numbers of professional investors, newsletters, or news stories. Others look at technical indicators like the number of long vs. short positions outstanding, or the number of put vs. call options. There are hundreds of these measures.

What we do is a little different. We collect thousands of news stories every day for thousands of different companies that are publicly traded. Our computers analyze the text and determine the overall mood of the news stories. Right now, we have about a quarter of a million news stories in our database, and our machines have assigned scores to all of them. This allows us to graph a stock's mood over time. When we find that there's a significant difference the mood and the price, we can make a prediction that the price will move in the general direction of the mood over the next few days. When the price is high but the mood is negative, we conclude that the day traders have pushed the price too high, and that it's due to come back down. When the price is low but the mood is very positive, we predict that the price will go up.

Unlike most financial projections, it's based on two variables only: price and mood. Of course, calculating the mood is not easy -- that's our "secret sauce" and I won't be describing it here (or anywhere). When the price and the mood are too far apart, we predict that they'll come together soon.

Now the question is how to validate this method. For one thing, you'd expect that overall, the price of a stock and the mood of its news stories would correlate. So as a first test, we can pick a stock and graph its mood vs. its price. In this chart, we've picked Goldman Sachs. The red line is the mood, the blue line is the price of its shares.
Over and over again, this is what we find. There is an obvious correlation between price and mood. Of course, there's a lot of noise in these systems, so the correlation isn't ever perfect. But they track each other nicely, the vast majority of the time. This is particularly striking when you keep in mind that the mood was calculated using only the text of news stories, automatically analyzed by computer. No information about price or any other financial data was used to generate that line. So the fact that it correlates with the price is very strong evidence that we're measuring something real.

And what happens when the blue line and the red line come apart? We predict that the price will move in the direction of the mood. Our data indicates that this happens on average about two or three days after their distance has reached a particular threshold. We put our predictions online (and we send out free daily emails to anyone who wants to sign up). So far, we are about 90% accurate when we predict that the price will go up, and about 60% accurate when we predict that it will go down. Why we're so much better for the former is an interesting question that we're exploring now.

For the curious, here are a few more charts.

Netflix is a good examples of a chart that shows a strong correlation between price and sentiment.
The obvious feature of the Netflix data is the big jump in sentiment in early December, which correlated with an increase in stock price. A quick news search shows that Netflix acquired the rights to distribute Disney content on December 4th, so this jump makes perfect sense.

Here's a chart in which the price has come apart from the sentiment in an interesting way.



At the time I write this, there's a lot of debate out there about whether Apple is a good value or not. The price (as you can see by looking at the blue line) has been falling for a while now. But interestingly, the news about Apple has been trending positive. In our data set, we usually see the price move in the direction of the mood at this point. So we'd predict that Apple is due for an upward price movement soon, probably within the next week. We'll see how this prediction turns out.