A measure of the bullishness / bearishness of the language used in media coverage of a given stock on a given day. Ranges from -5 (extremely negative coverage) to +5 (extremely positive coverage); a score of 0 indicates an absence of articles for that day.

Alternatively, scores may also be expressed on a 0 to 10 scale with 0 still indicating an absence of articles for that day, 1,2,3 a negative score, 4,5,6,7 a neutral score, and 8,9,10 a positive score.

The overall sentiment score is computed as a weighted average of sentiment scores over news titles, headers and company specific phrases. A significant correlation between news sentiment and stock prices is observed across most listings.

We can also provide sentiment scores from 1 – 100 on demand.

The scores are weighted averages of the individual scores of news / blogs / twitter during the day. Upon request, we can provide in-depth details on our computation.

To put it simple, we handle textual financial data using Data Mining and Text Mining methods. Our engine does the following steps in order to generate sentiment score:

  1. Filtering and selection of relevant information per topic
  2. Computation of relevance score per article
  3. Computation of sentiment score per article
  4. Computation of weighted average score of the individual topic

Sentiment score is being generated in real time. A benchmark clocked it at over a million tuples processed per second per node. Latency is now fully real time.

Sentiment scores are computed at various levels. We draw information from the headline, header, body and even footer. Different (customisable) weights are assigned to each section.

Three main methods can be provided:

  1. A statistical approach (indeed mainly relying on positive/negative word counts).

The statistical approach itself is not binary. It is model based and can be easily calibrated.

Below is a non-exhaustive list of the outputs:

  • Count of positive words
  • Count of negative words
  • Weight of the sentence positions
  • Relevance score
  • Volume of news available
  • Buzz score

It still allows for word disambiguation. This method is fast, scalable (especially across languages), but lacks indeed the precision that the grammar based approach can have.

  1. A grammar based approach.

By far the largest selection of technologies for exploiting grammar in sentiment analysis come from the use of HMM- or CRF-type sequence modeling, and consequently, this will be a major component of the course. This type of machine learning uses syntactic and other features as binary-valued functions in learning to label windows of text.

  1. Machine /deep learning based approach.

The deep learning model actually builds up a representation of whole sentences based on the sentence structure. It computes the sentiment based on how words compose the meaning of longer phrases. This way, the model is not as easily fooled as previous models.

For English content we are using a customised version of the Harvard dictionary. Natural Language Processing software from Stanford NLP Group.

In order to manage this issue, we modify the engine slightly:

  • Processing Financial content only
  • Detecting synonyms, acronyms and other nicknames of entities, topics and people
  • Identifying corpus for relevant entities, topics and people. It can be calibrated using Machine Learning (for instance to recognize documents talking about a given sector or industry)

News analytics reaction could be more violent than spot price reflection/ movements. That could be one effective way to identify market signals.

Economics research analysts, journalists and bloggers use various forms of reports to follow the trendy topics. Basically these are the topics that will increase their sales or number of visitors if we talk about websites.

Because many medium and small companies do not have daily news activities, it could explain the sensitivity. And it creates a particular news analytics that helps to detect market patterns; it is being called “Buzz”.

Content of buzz activities can be provided on demand.

Yes, we have more than 15 years of historical data. For personal analysis, the daily sentiment can be extracted in excel format from the website itself.

  • Investors/Traders / Brokers / Sales
  • Economics research cells
  • Market and Credit Risk Managers
  • Compliance cells

At the macro level, we aid in real time decision making for the following groups of people:

  • Hedge Fund owners
  • Asset Management companies
  • Financial Institutions (Central Banks, Investment Banks, Private Banks)
  • Market Infrastructure (Exchanges, Depositaries)
  • Market Data providers

FinSentS is computing sentiment scores for:

  1. 50,000+ Stocks and Major global stock indices
  • 15,000+ North American stocks
  • 8,000+ European stocks
  • 4,000+ Japanese stocks
  • 14,000+ Asian stocks excluding Japan
  • Chinese stocks: http://tushare.org/
  • bahasa  stocks:
  • 3,000+ Australia / New Zealand stocks
  • 1,000+ South American stocks
  1. Commodities

More than 181 kinds of Commodities including:

  • Base and Precious Metals
  • Soft
  • Energy
  1. FOREX and Currencies

Around 400 different kinds of forex and currencies:

  • AUD
  • EUR
  • GBP
  • HKD
  • INR
  • JPY
  • NOK
  • NZD
  • SEK
  • SGD
  • USD
  1. Interest rate
  2. Islamic Banking financial products
  3. Politics events

More than 250 topic names, such as:

  • Syria War
  • Elections
  • Shut Down
  • Management appointment
  1. Misc. topics
  • Regulation (Dodd Frank, EMIR, FATCA, Tobin)

Yes. The engine shall be really multi lingual even if for an entry price FinSentS processes only English. For instance, we could have Bahasa. In addition, we will be adding new languages such as Mandarin, German, French, Spanish, Japanese and Korean soon.We are also open to suggestions on new languages to explore.

For our Financial News and Sentiment Screener, FinSentS, we use:

  1. Newspaper websites (350+ sources for sentiment processing and more than millions of sources for news tracking)

i.e. 4- Traders, Equities.com, CNBC, Bloomberg, Business week, Street Insider.

  1. Financial Blogs (100 sources)

i.e. ZeroHedge, Washington Post, Paul Krugman, Naked Capital.

  1. Twitter company account (500 companies part of SP500 Index)

i.e. Apple Inc., Microsoft Corp, Exxon Mobil.

  1. Multilanguage

Chinese

i.e. Sina Weibo, Shanghai Stock Exchange, ShenZhen Stock Exchange.

Bahasa

We already include private sources such as Economics research papers and Bloomberg’s News feed. Additional private sources of information can be processed on demand.

Additional data and feeds from:

  1. Premium sources,  like Bloomberg, Reuters or Dow Jones..
  2. Social media, such as Weibo,  Twitter, StockTwits…
  3. Chat rooms, Companies and Analysts research, Reports etc  SEC and global fillings, broker research, conference calls, investor’s relations presentations, social media, real time news and press releases ..

And explore voice and video on a project basis.

Our system can indeed be configured to assign different weights in the sentiment computation for different corporate actions like earning announcements, M&A, dividends etc.

This is generally reviewed during the calibration process, where we can apply various numerical models and machine learning techniques.

The quality control of InfoTrie’s data set (news, blogs, and social media) relies on 2 fundamental bricks:

  1. The sources are primarily handpicked among a selection of:
  • Web sites (free press, financial blog, regulator website, exchange’s website)
  • Dedicated economic research from Investment Banks
  • Dedicated news feed “Bloomberg real time content”, “Dow Jones real time content
  • ”Reliable Twitter accounts or important hashtags
  1. Automatic relevance computation and noise filtering
  • “Relevance” scores are computed automatically and real-time for any topic or entity on news, tweets or blogs.
  • Multiple additional “noise filters” are in place to ensure the quality of the data (for instance on Twitter’s hashtags).
  1. Topic classification
  • News articles topics are automatically labeled based on their titles and contents with machine learning algorithm

Additional datasets (chat, forum, text, analyst and company reports, transcripts, PDF, excel, recorded phone conversation, video, etc) can be considered if required.