Top Data Providers: China Equities

The biggest Chinese public companies and the top data providers that help investors find an edge.

We identified the 17 largest equities in China, based on market cap, for which alternative data exists. After interviewing most alternative data providers that cover these names, we compiled a list of the key providers for each company. This article was originally published on Integrity Research.

Email us at data@alternativedata.org with any questions.

Top China Data Sources:

Data Provider Evaluation Criteria:

  1. Buy side feedback: Anecdotes from fundamental buy side investors who have experience using these datasets.
  2. Data source type: Does the data source and analysis closely reflect and clarify company performance, narratives, or key metrics?
  3. Accuracy: How accurate have these providers been historically?
  4. Ease of use: Do the providers have raw data or do they also do their own QA and analysis in-house?

We also mapped all the alternative data providers that have data on these companies in the landscape below.

China Data Landscape (vFFF)




The Download 07 – 13 New Datasets, Point72 Invests in Database Startup



  • Point72 leads $25mm Series A for database startup, FaunaDB(PRnewswire)
    • Coatue invested in data science infrastructure company, Domino Data Lab, earlier this year. (WSJlogin required)
  • AppAnnieapp usage data provider, expands Chinese consumer analytics offering. (Reuters)
  • YipitData has combined web data with new email receipt data for GRUB, enabling KPI accuracy and insights not possible with either dataset alone.
  • YipitData launched Best Inc. (BSTI) dataset, which IPO’d last Wednesday.
  • 1010data launched a Suburban Shopper Panel (SSP), data into consumer behavior and demographics of rural and suburban shoppers. (Business Wire)
  • Prattle, a public sentiment analysis provider, launched an analytics platform of corporate earnings calls. (Prattle Blog)
  • QuadAnalytix and Mobee merged into Wiser Solutions to provide consumer/retail data from online and offline sources. (Business Wire)
  • VisibleAlpha, an analytics platform, partnered with Thompson Reuters.(Reuters)
  • M Science’s 3rd party email receipt panel is 3.5mm and it now has access to an EU credit card panel. (from Learn2Quant Hong Kong)
  • Sandalwood employs 3 main sources of data: UnionPay credit card panel, a partnership with JD.com, and web scraping for Tmall brand data. (from Learn2Quant Hong Kong)


  • Prosper – consumer survey data on 296 tickers (US) and 126 tickers (China), distributed through Consumer Edge Research.
  • Granular.ai – satellite data on industrial sectors and emerging economies.
  • Selbourne Research – data on payments ecosystem.
  • Geotab – fleet GPS tracking and fleet management data.
  • Predata – analytics platform based on market/risk signals and event predictions.
  • Statistical Surveys – data on recreational vehicles.
  • IndexMath – index for predicting UK stock market trends. 
  • 74% of hedge funds plan to increase spending on alternative data, based on a survey of 50 hedge funds by Greenwich Associates and Arcadia Data. (full report for purchase at Greenwich Associates) Other report highlights:
  • Market size for alternative data estimated between $183 – $200mm, and projected to double in 4 years. (Value WalkQuartz)
  • Use of alternative data could lead to a revenue uplift of 15% for asset managers, while additionally cutting costs by another 15%, according to Quinlan and Associates. (The Street)
  • Sentieo, a financial data platform, developed a short thesis on Darden (DRI) based on sub-brand level restaurant unit counts and implied growth rates.(Futures Magazine)
  • State Street launches Quantextual Idea Lab, a quant-based research management platform. (FINalternatives)
  • Private equity and corporates see opportunity in satellite, GPS, IoT, and economic alternative data. (International Business Times)

The Download 06: 14 New Datasets, 3 New Jobs, 21 Data Success Stories

Monday, September 11, 2017

  • PROME – customized web scraping services.
  • Datavore – data analytics and visualization platform.
  • Dataiku – data science platform, recently raised $28mm. (PRNewswire)
  • ExtractAlpha – licenses quant models from social, web, and market sources.
  • SensorTower – app usage and ad performance.
  • Inferess – converts news feed into event-driven analytics.
  • JWN Energy – oil and gas data repository.
  • Seer Aerospace – aircraft usage patterns and reliability datasets.
  • Epsilon  large panel of consumer data and insights.
  • RVIA – recreational vehicle data and trends.
  • WallStreetHorizon – corporate event related datasets. 
  • Eagle Alpha, a data aggregator, published 20 case studies of successful quant and discretionary alternative data applications for multiple data sources. The 80-page paper provides a thorough evaluation of the alternative data space for both quant and discretionary investors. (Eagle Alphapdf)
  • Sentiment provider, Prattle, correctly predicted various Fed and international central bank decisions. Quinlan & Associates published a 50-page report on the use of alternative data for alpha generation, including a case study on Prattle. (Quinlan & Associatespdf)
  • Anonymity concerns in geolocation data continue. (Financial News)
  • Nasdaq accelerates expansion into data analytics, acquiring asset manager research platform eVestment for $705mm. (NasdaqReuters)

Verified List of Alternative Data Providers

The hardest part about alternative data can very well be identifying all of the different data sources across hundreds of providers. The number of providers has skyrocketed in recent years, making it more complicated to evaluate the entire data universe.

Number of Alternative Data Providers

After thousands of conversations with investors, experts, and data providers, we have compiled the stack of the top 100 alternative data providers in the institutional investment space.


Alternative Data Stack

The stack focuses on providers used by fundamental investors. It excludes market data, economic/macro data, and market news/industry publications. Each provider’s position is intended to the firm’s product positioning relative to institutional investors. Data providers in the clusters towards the top are focused on data analysis and extracting insights from alternative data. Clusters that are positioned toward the bottom are more focused on data collection and quality assurance. and tend to not be directly consumed by fundamental analysts and PMs, but rather go through data brokers, the sell-side, or internal data teams for analysis.

We’ve published a public database of alternative data providers, which includes these and many other providers. Click on the provider name to be taken to their website. We will regularly update information as it becomes available. If you would like to be added to our database, or you would like to speak with us, email pablo@alternativedata.org.

Continue Reading


Download 04: Web Scraping Ruling, New Geo-Location Dataset, and New Features

A federal court ruled against LinkedIn, confirming that a startup can scrape its publicly available data – a potentially precedent-setting ruling in favor of web scraping based analytics (Ars TechnicaWSJ)

We launched an events and stats page on AlternativeData.org

New geo-location data vendor Thasos has launched data products on malls, dollar stores, public hospitals, casinos, and lumberyards:
“To test the impact of its data, Thasos released a report earlier this month on never-before-tracked foot traffic at Real Estate Investment Trusts that invest in shopping malls. The report, released before second quarter earnings, was accurate in predicting the selloff that occurred after the announcements.” (Institutional Investor)

Millennials are 2.5x more likely to use alternative data than non-Millennial equity analysts (AlternativeData.org)

As supply chains embrace Blockchain, global trade data will become public (International Business Times)

“How to Become a Data Scientist” – a high-quality guide written for anyone hoping to get into the field (Medium)



The Ultimate Guide to Selling Data to Hedge Funds

Does your company have data that is valuable to investors?

This guide outlines steps for any data owner to work directly with hedge funds to monetize their data.

In short: hedge funds are a complex and opaque industry, but if you master the key players and product challenges, there’s no market in the world with a higher ASP and a faster sale cycle – the ingredients of a valuable subscription revenue business.

How to sell data to hedge funds


Step 1: Know your audience

Hedge funds are not all the same. Below are the main types.

Quantitative Investing (a.k.a. Quant, Systematic, Algorithmic)

Quant funds utilize automated trading strategies based on algorithms and data. They are not looking to be right on every trade. They just want to be right more than they’re wrong and trade a lot of securities.

They typically purchase data that

  • Apply to 100s or 1,000s of securities
  • Has a long history of data they can backtest
  • Is published frequently. 5 years of historical data, published quarterly provides a weaker backtest than 5 years of historical data published daily

The best part about selling to quant funds is that their business model is based on data, so they employ professionals to speak to data vendors, understand their data and make compelling proposals when valuable.

The difficult part about selling to quant funds is if your dataset does not meet the above attributes, you’re probably not going to sell them anything. But you’ll learn that pretty fast.

More Info: Largest Quantitative Hedge Funds

Fundamental Investing (a.k.a. Discretionary, Stock-Picking)
Platform Funds (a.k.a. Pod, Multi-Manager, Multi-Strategy)

Platform Funds consist of many individual teams of PMs & Analysts who share centralized resources such as assets under management, trading & execution, compliance, data, office space, training and more.

Platform funds with discretionary strategies typically purchase data that

  • Is highly correlated with a company KPI (“Key Performance Indicator”) or key investor question of specific securities
  • Has a history of data they can backtest
  • Is unique

Since platform funds provide infrastructure shared among their investing teams (or “pods”), they often employ data sourcing professionals similar to Quant funds. These are also quick and pleasant conversations to have that require minimal sales infrastructure.

More info: Multi-Manager Funds

Long/Short Equity Hedge Funds

Long/Short Equity Hedge funds pick stocks. They tend to purchase similar data to Platform Funds

The Long/Short Equity Hedge Funds who spend the most on data:

  • Have a large amount of Assets Under Management (AUM)
  • Trade frequently. The more often they trade, the more data they want to look at. A rough proxy for this is “Turnover %” on a 13F database such as Whale Wisdom
  • Make concentrated bets. The larger their positions, the more they can spend. A proxy for this is “% of Portfolio” on Whale Wisdom

More Info: Institutional Investors Top 100 Hedge Funds (also includes Quant and Platform funds)

Event Driven Funds

These firms invest based on specific catalysts such as a merger, acquisition, bankruptcy, spinoff or legislation. If your dataset allows investors unique insights into key events, they may be a good match for Macro Funds

Other Types of Funds

The above designations are neither mutually exclusive nor collectively exhaustive. Many funds are combinations of the above or something else entirely, including:

  • Long Only/Mutual Funds: much longer term oriented investors with a different business model
  • Macro Funds: These firms invest across broader trends that affect a lot of stocks. If your data speaks to broader trends like inflation, currencies, weather, interest rates or global events they may be a good match for Macro Funds.
  • Credit Funds: invest in debt
  • Family Offices: manage funds of an individual family or group of families
  • Fund of Funds: invests in other funds
  • Sovereign Wealth Funds / Pension Funds: manages money of countries, endowments
  • Private Equity/Venture Capital: make large investments in mainly private companies

Step 2: Understand key use cases for your data 

For background on why hedge funds value alternative data, see Matt Turck’s excellent piece on the subject. High level, it helps to divide the institutional investor market into quantitative funds and discretionary funds, as the two have very different requirements and use cases for data.

To attract quant funds, your data should speak to a lot of companies and have a long time series. A good example is a panel of consumer transactions touching many public companies, that has a positive correlation with share prices. Once a quant fund has an understanding of your dataset, they can run a backtest to establish value.

To attract fundamental investors, it’s easier to start with a few case studies about specific public companies. Pick a few that your data can best speak to and run a correlation to their KPIs (e.g., Revenue, GMV, Gross Profit). The best companies are:

  • Stock price is driven by a key metric or investor question your data can speak to
  • Large market cap / average trading volume
  • High volatility
  • Always nice: high hedge fund ownership (examples on page 5)

The right company will vary by dataset, but a few examples:

  • Panel of Mobile App Usage: Correlate to observed app usage with reported DAUs for SnapChat, Facebook or Twitter
  • Panel of Consumer Transactions: Correlate to same store sales of retailers or GMV/sales of ecommerce companies
  • Social Sentiment: Correlate shifts in sentiment to revenue or share price of consumer / apparel brand companies

Step 3: Start with early adopters: quant funds and platform funds

Both types of organizations employ teams of people looking to speak with data owners, can move quickly and offer you a price for your data.

You can speak to these people directly and don’t need to work with a data broker or other intermediary, who can demand a revenue share of 50% or more.

Step 4: Sign up early adopters quickly with a limited distribution.

Selling data is a multi-stage game. Price discovery and productization can take years. Don’t overthink your first contracts or hold out for the last dollar. Speak to a handful of early adopters, negotiate fair, 1 year contracts with a limited distribution, say 5-10 funds. You can decide the specific number based on conversations with the funds.

Do it quickly so you can then focus your your time and resources on understanding exactly how investors are using your data and determine your strategy when renewals come around.

Step 5: Productize your data

Quant, platform and other large hedge funds employ data teams who can extract value from data in nearly any format. Selling into a broader audience of funds requires additional QA and analysis investments into your data product. For more information on developing this team see our post on How to Integrate Investment Analysts, Data Analysts and Engineers.

Productizing your data means providing additional QA and analysis to enable you to understand it’s value and enable a fund without a large data team to extract value from it.

Productizing your data, regardless of your customer distribution, will make it easier to use by more individuals at your customers, increasing its value.

Step 6: Determine the size of your eventual market

Once your first year contract is up, you will need to decide whether you want to expand the size of your distribution. The more funds you sell to, the less valuable your data will be to your customers.

Reasons you may be able to increase your distribution and sell to more types of funds:

  • It is the #1 dataset of its kind
  • Your dataset applies to lots and lots of companies
  • The data is granular and multi-dimensional. It’s more than a single datapoint and different types of investors use it to answer different types of questions
  • You can sell other services on top of the data
  • Compliance departments generally prefer datasets with a broader distribution
  • You can develop different data products for different customer types at different price points, so not everyone is getting the same experience
  • Diversify your revenue stream across more customers

Reasons you may want to maintain a limited distribution:

  • Reduce complexity
  • Reduce need for marketing, sales and customer service expenses
  • Enjoy higher ASP, higher margins
  • The primary use case of your data is a KPI estimate, which is more quickly commoditized than a granular dataset. Are you providing a revenue estimate or a way to understand an entire industry?

Frequently Asked Questions

How much money will I make?

Reasons you may have unrealistic expectations of the value of your dataset:

  • The value of your dataset will depend heavily on details such as accuracy, time series, compliance, release schedule
  • It will also depend on factors specific to the target company such as the existence of competitor datasets, the precision of sellside consensus, key investor questions, or the existence of legal/macro/regulatory overhangs
  • Investors will pay a large premium for the #1 dataset in a category. Are you #1?
  • Rumors of datasets commanding enormous premiums are more viral than ones about datasets nobody wants

What about contracts and compliance?

Hedge fund customers will demand certain representations about your dataset. The YipitData Master Services Agreement can give you a sense of what to expect.

You should consult a lawyer to help you with contracts relating to your specific dataset and to help you through your customer compliance reviews.

What should I never do?

Provide material, non-public information in violation of securities laws or personally identifiable information.

Provide misleading, doctored or “data-mined” historical correlations.

Conceal significant data outages or other issues that may affect your data’s accuracy.


We think a reasonable go-to-market strategy for hedge funds is:

  • Start with platform funds and quant funds
  • Set up one year contracts with a limited number (5-10) of buyers as soon as possible
  • Spend your next year productizing your data and learning about its use cases
  • Determine whether or not you want to expand the size of your distribution

If you believe you have data that may be of interest to hedge funds, we would be happy to speak with you, and if you are interested, refer you to hedge funds who have expressed interest in alternative data. Email me at: jim@alternativedata.org

James Moran is President/Co-Founder of YipitData, which analyzes web data to provide KPI estimates and answer key questions for investors.


Get weekly updates about alternative data:

Millennials Main Users Of Alternative Data

We completed a study published by Integrity Research looking at the age distribution of alternative data users vs. the entire buy-side equity analyst industry. We found that Millennials are 2.5x more likely to use alternative data than non-Millennial equity analysts.

82% of analysts who use alternative data are Millennials (20-38 year olds), highly concentrated between the ages of 26-34, based on YipitData’s user base.  More generally, Millennials represent 65% of buy-side equity analyst demographics based, on data from Ipreo.

Millennials Main Users of Alternative Data

Source: YipitData, IPREO; the scale of the alternative data users was enlarged to illustrate relative concentration.

According to YipitData, the average age of alternative data users is 33 years old, compared to 36 years for buy-side analysts generally.

It is likely that the core concentration of Millennials became investors 5-7 years ago, right at the time when alternative datasets started exploding. As a result, Millennials began incorporating alternative data as they were becoming investors. More experienced analysts and PMs have a steeper learning curve, having to adjust their investment style and rethink what’s possible.