Chapter 09 – The Text Frontier – AI, IA, and the New Research

Hunting Investment Alpha and Trading Alpha from Online News, Social Media, and Rumors

Alpha hunters are always looking for new territory. When a strategy becomes known and used by too many players, the collective market impact of getting in and getting out will squeeze out all the profit juice, and only the lowest-cost transactors (large sell-side firms and hedge funds) will be able to use it. The pack needs to move on.

The Web is promising new territory, but while it is full of information it is also something of a pain to deal with, so a case can be made that there’s an economic rent (alpha) to be earned by doing a good job at using the Web effectively. This was suggested at the end of Chapter 4, “Where Does Alpha Come From?” This chapter gets into the specifics, showing real examples relating textual patterns and events (not just individual documents) to excess returns.

Bill Gross, of the PIMCO investment management company, described equity valuation as that mysterious fragile flower where price is part perception, part valuation, and part hope or lack thereof.” (1) An old Wall Street proverb says, more tersely, “Stocks are stories, bonds are mathematics.”

Sources of Investment News and Securities Trading Rumors

This has enough truth in it that looking for the right stories is a worthwhile activity. With hundreds of billions of pages available on the surface Web (the portion covered by search engines), and even more information stashed in proprietary databases and other deep Web locations, there are plenty of places to look. It makes sense in looking at these to break them into vaguely more manageable categories. A useful way to do this is to consider four broad classifications:

1. News. This is the old standby, and we all know this when we see it. It is often called the mainstream media (MSM). It is written by reporters, edited by editors, and published by more or less reputable sources. News was once exclusively disseminated on paper, radio, and television, and later via expensive dedicated electronic feeds. It is now ubiquitous on the Web, and news vendors are trying to move upscale, with tagged news that is more amenable to machine understanding using intelligence amplification (IA) and artificial intelligence (AI) tools. These deluxe feeds go for deluxe prices, tens of thousands of dollars per month.

2. Pre-news. Pre-news is the raw material reporters read before they write news. It comes from primary sources, the originators themselves: the Securities and Exchange Commission (SEC), court documents, and other government agencies. Not every reporter knows Deep Throat, but they all talk with people who might have something newsworthy to say. In pre-Web days, primary source information was much harder to come by, so we were far more dependent on reporters and established news organizations to find it for us. Today, in yet another instance of disintermediation by the Internet, many information middlemen have been eliminated.

3. Rumors. Here is content with a slightly to dramatically lower pedigree than reputable, signed news reporting, or the primary source material that goes into MSM news. Internet advertising has created a means to monetize spreading rumors, and spawned a new segment of the information industry. Some blogs and web sites are driven entirely by rumors, with little or no regard for truth. Others have much higher standards, closer to the highest-minded bastions of reputation-driven journalism, and may be spawned by those news organizations as they evolve. Others have an “all Britney, all the time, except for shark attacks” attitude, but keep people coming back by breaking an occasional legitimate true story overlooked by the mainstream media.

4. Social media. The barriers to entry at the low end of the “news” business on the Web are vanishingly small. Anyone can send spam, create a blog, or post on message boards for stocks or other topics. A great deal of this is genuinely useful — think of the product reviews on Amazon — and some is just noise. On stock message boards, there have been CEOs who reveal valuable information; but for the most part, the typical posting still reads like it came from some guy sitting around in his underwear in Albania at 3 a.m. , on vodka number nine. A great deal of research has gone into trying to sort out the legitimate sources from the louts. Some seems to have promise, at least in identifying future volatility. But you may be better off looking for the words of the prophets on the subway walls. The fi rst two items on this list are the subject in this chapter; the second two are the subject of the next one.

Extracting Investment Information from Wall Street News and Rumors

This chapter reviews research and ideas relating to extracting investable information from news and pre-news sources. A recurring theme is molecular search : the idea of looking for patterns and changes in groups of documents, rather than just characterizing atoms of information , the individual documents or stories we find as the result of conventional search engine queries. The choice of molecules and atoms instead of the usual “forest and trees” metaphor is not just some fancy science talk; it’s because there is only one basic relationship between trees and forests — spread out a bunch of the first to make the second.

Molecules made from atoms have infinite variety and complexity, as do the relationships we can infer across groups of documents.

Ten Pounds of Financial News in a Five-Pound Bag

The tagline in the corner of the paper version of the New York Times is “All the News That’s Fit to Print.” In fact, this was never true. The size of the paper is limited by many factors: press speeds, cost, and the limits of physical delivery. Editors have to pick and choose. This is true for all paper and ink publications. The Wall Street Journal index of companies mentioned rarely includes more than 300 names. But on the same day, Web sources will have news on thousands of firms. International and specialized news sources used to be costly and difficult to come by. Now they are as accessible as the local paper.

News is a time-honored source for investment information, and there is more of it than ever before, more than a person can handle. With the relentless march of technology to the beat of Moore’s law, previously impractical computationally intense approaches to natural language can be used to parse, categorize, and understand the onslaught of news. Reporters help the process along by tagging story elements at their point of origin. They inject some valuable wetware into the mix of hardware and software involved in the modern production, dissemination, and consumption of news. There is a great deal of commercial effort in this area, applying language and Web technologies to gather, filter, and rank individual news by type, sentiment, or intensity. Some are available to try on the Web.(2)

Stock Market Manipulations — Accidental and Intentional Stock Manipulations

The purest of efficient market purists once claimed that all news was already incorporated in prices. Someone always knew the news before you did, so there was no point in paying attention to it. This is another case of someone having to pick up that $ 100 bill on the sidewalk first, but those hundreds do get picked up pretty fast.

An example in the fall of 2008 shows how truly unexpected news can impact prices dramatically. At 1:37 a.m. EDT on Sunday, September 7, 2008, Google’s newsbots picked up a 2002 story about United Airlines possibly filing for bankruptcy. Apparently, activity at 1:36 a.m. on the web site of the Orlando Sentinel caused an old story to resurface on the list of “most viewed stories.” In Orlando, in the middle of the night, with Mickey sound asleep and Gatorland closed, a single viewing of the story was enough to do this, and attract the attention of the newsbot, one of many search agent programs that populates Google’s news database. In a cascade of errors, the story was picked up by a person, who, failing to notice that the date on the story was six years gone, put it on Bloomberg, which then set off a chain reaction on services that monitor Bloomberg news. This remarkable ability of the Internet to disseminate “news” resulted in the stock of United’s parent, UAL Corporation, dropping 76 percent in six minutes, with a huge spike in volume, as seen in Figure 9.1.

Figure 9.1 UAL Corporation on September 8, 2008. The Old news rises from the news crypt. Source: Google Finance.

Figure 9.1 UAL Corporation on September 8, 2008. The Old news rises from the news crypt. Source: Google Finance.

This looks like (at the very least) an accidental manipulation. The SEC announced an investigation into the incident by the end of the week. Trades made during the period when the price dropped were not reversed. This example, though based on what turned out to be false news, underscores the point about time acceleration of the effects of news on markets. This is an update on the “time isn’t what it used to be” lesson seen in comparing pre-and postmodern Web-era market reaction to earnings surprise news, shown in Chapter 4.

In fact, it didn’t take long for another major manipulation based on false news to occur. A legitimate news story followed a falsely planted one that had hammered Apple stock down by 5.4 percent, less than a month after the UAL presumed accident: CNN’s plunge into online citizen journalism backfired yesterday when the cable-news outlet posted what turned out to be a bogus report claiming that Apple Inc. Chief Executive Officer Steve Jobs had suffered a heart attack. Apple shares fell as much as 5.4 percent after the post on CNN’s …

>>>>>> READ MORE HERE < <<<<<<

All notes for this chapter about Investment Alpha and Trading Alpha from Wall Street Rumors, News, and Social Media:

1. PIMCO December 2008 Market Commentary,

2. Relegence was an early first-wave company that did this. It was acquired by AOL, but remains in the news machine business ( ). Newcomers in 2007 and 2008 include Skygrid ( and StockMood ( ). aggregates a wide range of services.

3. James Callan, “CNN’s Citizen Journalism Goes ‘Awry’ with False Report on Jobs,” Bloomberg News, October 4, 2008.

4. Paul Tetlock, Maytal Saar-Tsechansky, and Sofus Macskassy, “More Than Words: Quantifying Language (in News) to Measure Firms’ Fundamentals,” Journal of Finance 63 (June 2008): 1437–1467. (An earlier working version is available at the Social Science Research Network, )

5. General Inquirer is found at

6. Event study charts group similar events together that are actually spread out in time. In this case, the vertical line in the middle of the chart is the day the story appeared; the region to the left shows the basis point (hundredths of percent) price changes prior to publication; and the region to the right shows price changes afterward.

7. “Abnormal returns” often refer to returns in excess of the market over the period in question. This study used a slightly fancier definition of abnormal based on the widely used Fama-French three-factor model, a more modest version of the multifactor “Barr’s better betas” approach described in Chapter 4. In addition to broad market moves, it adjusts for large-capitalization and small-capitalization companies, and for the value/growth style of the stocks, measured by book-to-price ratio.

8. “Mining of Concurrent Text and Time Series,” by Victor Lavrenko, Matt Schmill, Dawn Lawrie, Paul Ogilvie, David Jensen, and James Allant, Department of Computer Science, University of Massachusetts – Amherst, 2001.

9. “Technical Interface and Operational Specification for Public Dissemination Subscribers,” TRW/SEC specification by Craig Odell (TRW), May 3, 2001.

10. “Do Stock Market Investors Understand the Risk Sentiment of Corporate Annual Reports?” by Feng Li, University of Michigan, April 2006. Available at the Social Science Research Network, (paper number 898181).

11. Zhen Deng, Baruch Lev, and Francis Narin, “Science and Technology as Predictors of Stock Performance,” Financial Analysts Journal 55, no. 3 (May/June 1999): 20–32.

12. Majestic Research specializes in this sort thing (

investment alpha, trading alpha,
stock market trading, stock market trading strategy, capital market efficiency, capital markets, inefficient market, insider stock trading, insider trading, insider trading report, insider trading reports, investment information, market anomalies, market anomaly, market efficiency, market manipulation, market volatility, stock manipulation, stock market, day traders, day trading, daytrading, pump and dump, stock market manipulation, stock market message boards, stock market recommendation, stock market recommendations, stock market trading tip, stock market tips, equity trader, equity trades, equity trading, wall street stock market, wall street stocks, wall street trading

Easily find all of the
lowest cost no load
mutual funds and ETFs
Available at major ebook stores:

Amazon -- Kindle/MOBI
index funds list investing books

Apple iBookstore -- iPad/EPUB
index mutual funds investment guide

Barnes & Noble -- Nook/EPUB
no load mutual funds investment guide

Smashwords -- EPUB, MOBI, PDF
low cost mutual funds investment guide

no load index mutual funds
by David Leinweber