The Download 17: Favorable Ruling For Web-Scraping, 4 New Datasets

Providers Added to the Database

  • Jettrack.io – Public data on corporate aircraft flight activity used to identify future corporate deals.
  • Datarama – Public data aggregated from specialised media sources on private and public companies in Southeast Asia and Greater China.
  • Scrapehero – Bespoke web scraping solutions.
  • SpaceKnow – Satellite provider with datasets covering autos, airlines, manufacturing, and construction.
  • See full public database of 280 data providers.
New Jobs
News & Insights
  • DC District Court is allowing a constitutional challenge to the CFAA to proceed, issuing language that web-scrapers may have First Amendment protection. (TechdirtBoing Boing)
  • IHS Markit will incorporate web-scraped data into its AI data initiative. The company has advanced beyond AI for process automation into AI for product improvement, which signals that integration with other alternative data categories is likely. (Integrity Researchsubscription required)
  • Study on Fed data leaks using taxi data hints at the wide range of use cases available for data collected by Uber and other ride-sharing services.(UChicago)
  • Alpha Architect summarizes high-level steps for performing sentiment analysis on Twitter. (Alpha Architect)
  • PreData, a social data analytics platform, uses volatility in social media along with other alternative data signals to predict North Korean missile launches and macro market trends alike. (AlleyWatch)
  • Decode project explores how blockchain may one day govern ownership of personal transactions data. Pilot already incubating in Amsterdam and Barcelona.  (The Guardian)



The Download 16: TSLA Production Tracker, YipitData Adds Email Receipt Panel, 8 New Datasets

New Datasets

  • YipitData integrates one of the largest and fastest growing email receipt panels into its data offering, expanding coverage and increasing product granularity and accuracy. (Full announcement)

Providers Added to the Database

  • Omney Data – Web data tracking retailer pricing and promotional activity/discounting.
  • Bloomberg Tesla Tracker – Tracks Model 3 production by counting VIN sequences across National Highway Traffic Safety Administration registrations, social media, and user submissions.
  • Verbatim Advisory Group – Survey data focused on business services, consumer products, retail & restaurants, industrials & energy, and TMT.
  • Caserta – Builds alternative data implementation systems for hedge funds and tier-1 banks.
  • Moody’s Data Alliance – Data portal tracking commercial and industrial loans to private companies. (Integrity Researchsubscription required)
  • AlphaLetters – Research provider that manually reviews and summarizes top academic papers on quant investment strategy.
  • See full public database of 277 data providers.


  • VisibleAlpha, a research interface, announces investment from HSBC, joining a group of several other sell-side investors. (Intergrity Researchsubscription required)
  • Neudata, a data consultant, launches a use-case research service for investors to apply alternative datasets for different investment narratives. (Neudata)
  • Autotrader (LON:AUTO) – Tracks listing counts, dealer counts, key product penetration rates, and UK used car transactions. (YipitData)
New Jobs
Consulting Projects (NEW)
  • Consulting project for transaction data experts: Projects for data analysts or scientists with experience making KPI estimates using email receipt and/or credit card data. Requires knowledge of unbiasing and modeling techniques beyond simple growth rate analysis. Competitive compensation. Contact jobs@alternativedata.org for more information.
News & Insights
  • A report from Morgan Stanley and Oliver Wyman estimates that 40% of employees on the buy-side will require fundamental retraining to improve data and analytics capabilities. (Oliver Wymanlong read)
  • SAE, the quant arm investment unit within BlackRock, uses alternative datasets to turn out tools for the asset manager’s traditional investment groups. (Financial Timessubscription required)
    • Economic gauges used across investment teams are created by SAE using internet searches, online invoices, and traffic patterns.
    • Over 37% of SAE employees are PhDs in computer science, physics, and engineering.
  • Datasets used to track Tesla Model 3 production include Bloomberg’s new VIN trackerU.S. import records, and imagery of factory lots(Bloomberg)
  • Earnest Research, a credit card data provider, reports that HelloFresh surpassed Blue Apron in share of the $5bn American meal-kit market.(Recode)
  • Funds that access public filing information see 1.5% higher returns in the following month than funds using no public information. (Bloomberg BriefFull Paper)
    • The research had its flaws, however, in identifying hedge funds accurately. (Integrity Researchsubscription required)
  • Insights from survey at JPMorgan quant conference in Asia. (@RobinWigg)
    • 53% say sentiment data is most promising data type.
    • Lack of talent and high fixed cost are top barriers to entry in using big data.
  • James Rosseau, Chief Commercial Officer of LegalShield, discusses data generation, application, and backtesting of Law Index dataset, which uses legal activity to forecast economic conditions. (SeekingAlpha)



The Download 15: Data On YELP, 16 New Datasets, Sector Data Specialist Job

New Datasets

  • YELP Dataset – Tracks Paying Advertiser Accounts, Request-a-Quote penetration, and major accounts by category and geography. (YipitData)
  • Cryptocurrency Dataset – Tracks investor sentiment on nearly 100 cryptocurrencies using post data from over 40,000 monthly investors. (Integrity Researchsubscription required)

Providers Added to the Database

  • Bridg – Credit card data provider for restaurant industry.
  • Venpath – Geo-location provider sourcing from 212 apps and 61m unique monthly devices.
  • Safegraph – Geo-location data from over 50m mobile devices, tracked to 15m POI and 1000 brands. Has raised $16m in funding.
  • X-mode –  Geo-location data on 30m monthly active users, obtained from 300+ apps.
  • Anonymous Provider – Data on 20% of US household moves, available 4-8 weeks before the move event. Insight into retail (regional demographic shifts), insurance, cable/internet, and banking. Contact data@alternativedata.org for more information.
  • Drillinginfo – Data on exploration & production, oilfield services, midstream, and financial services.
  • Rigdata – Drilling activity data with over 25 years experience covering US, Gulf of Mexico, and Western Canada oil & gas industry.
  • MarketCheck – Auto data provider with active inventory for over 35k US car dealers.
  • TVeyes – Data on brand placements in TV and radio, including logo and object recognition.
  • PriceStats – Data on online prices tracking inflation in 22 economies. (Institutional Investor)
  • Optimum Complexity – Risk analysis data for assessing tickers based on organizational complexity.
  • Legis – Prediction data on Congressional bill outcomes. (Economist)
  • Associated Press – Ticker level data on text archives, real time news, and human-curated database of 140k upcoming potentially newsworthy events.
  • Alpha Hat – Visualization platform focused on geolocation data with plans to expand to multiple datasets.
  • See full public database of 270 data providers.


  • Datastreamx, a data broker, announces blockchain-based network for decentralized data access. Smart contracts will set rules for data usage and payment. Network will launch in April 2018, but no ICO date is set. (Medium)
  • Crux Informatics, a data infrastructure provider, announces investment from Citi, increasing total funding to $21m. (PRnewswire)
New Jobs
News & Insights
  • Funds with in-house alternative data processing capabilities, including State Street and MSCI, are building related ETFs to sell to the competition. They are betting that smaller firms will get broad alternative data exposure through the ETFs, rather than the expensive process of building alternative data capabilities themselves. (Institutional Investor)
  • Schroders data insights team does not believe that wide access to the same alternative data sources arbitrages away alpha. Rather, they claim that the more data sources teams incorporate, the more possible permutations of analyses emerge, suggesting that ubiquitous alternative datasets still have unique value for each firm. (Morningstar)
  • Pledge to enact new cyber security standards for fintech firms may impact how Yodlee can access financial data. (Financial Timessubscription required)
  • Foursquare, a geo-location provider, reports that “mall death” is overstated, and that attendance at high-end luxury malls is actually on the rise. (Yahoo Finance)
  • Two data providers predict probability of bills passing in Congress.
    • Skopos Labs provides both outcome predictions and valuation impact forecasts on the ticker level. (Skopos Labs)
    • Legis has correctly forecasted the outcome for the first 44 bills for which it has issued predictions. (Economist)
  • Key takeaways from the Augvest-hosted geo-location panel:
    • Cell tower data, the most common geo-location data, is far less accurate than GPS or wifi.
    • Geo-location providers are differentiated by the app types that they work with (and the resulting bias of their sample) and whether they provide useful analyses/products on top of the raw data. Reveal and X-Mode provide raw data, while Safegraph and Cuebiq deliver features layered on.
  • Point72 names Kirk McKeown director of proprietary research. He will oversee Aperio, Data Sourcing and Strategy, and Point of the Spear, with the goal of aligning communication between investment managers and data scientists. (Financial Adviser – Private Wealth)
  • Marine Traffic, a ship-tracking provider, gives insight into the whereabouts of yacht seized by FBI off the coast of Bali in relation to the 1MDB scandal. (WSJ,subscription required)
Upcoming Events




The Download 14: Data on Airbnb 2017 Growth, 10 New Datasets, 6 New Jobs, 5 Requests for Data

New Datasets

  • Connotate – Web scraping, data collection, and monitoring services.
  • Mozenda – Web scraping software that integrates with databases and BI toolkits.
  • Business Intelligence Advisors – Founded by CIA employees, analyze earnings calls and other management commentary.
  • DAR Partners – Sales consultants for alternative data providers.
  • Infotrie – News analytics and sentiment data provider with data on 50k stocks, topics, people, and commodities.
  • RootMetrics – Mobile network performance data; subsidiary of IHS Markit.
  • OpenSignal – Crowdsourced mobile network performance data.


Requests for Data

The AlternativeData.org network has hundreds of multibillion L/S and Long-only funds looking for very targeted datasets. Below are some specific Requests for Data from a selection of these funds. Please reach out to data@alternativedata.org if you have datasets on any of the following:

  • LinkedIn – Profile data on employees of top ~25k companies.
  • Merchant Acquirers – Market share data of companies involved in credit card routing and exclusivity agreements (e.g. First Data Corp, WorldPay).
  • Amazon – Earnings power, industry expansion, country regulations, digital advertising revenue, etc.
  • Any alternative data on Canon or The New York Times
New Jobs
News & Insights
  • Airbnb grew ~50% in 2017 according to alternative data sources: 
    • YipitData tracked global listings up 40% and room-nights stayed up 50% YoY. (YipitData)
    • 1010Data, a credit card data provider, found that US bookings grew 49% YoY, significantly outpacing to the hotel industry average of 28%. (MediaPost)
  • AlternativeData.org published an analysis of alternative data full-time employees (FTEs) on the buy-side. 
    • The number of alternative data FTEs has grown ~450% in last 5 years.
    • Most alternative data FTEs have 11+ years experience and do not have graduate degrees.
    • Tech, Academia, and Data Providers are quickly becoming main channels for sourcing alternative data FTEs.
    • Cost of an alternative data team starts at $1.5 – $2.5m.
  • Funds are paying total compensation of nearly $180k for the average engineer/quant role. (efinancialcareers)
  • The Chartered Alternative Data Analyst Institute launches with plans to develop an exam-based curriculum for standardizing best practices in alternative data analytics. (FinAlternativesIntegrity Research)
  • J.P. Morgan attributes climb in Institutional Investor sell-side research rankings to alternative data incorporation. In a statement, Sunil Garg, J.P. Morgan’s head of international equity research in Asia and EMEA, says the bank has worked to ensure that its research coverage footprint is among the largest of all sell-side research houses by using alternative-data analysis techniques.” (Institutional Investor)
    • UBS also credited use of alternative data through UBS Evidence Lab for their top research ranking. (Institutional Investor)
  • Sentieo, a data interface, correctly predicts the Twitter, Grubhub, Skechers, and Sodastream earnings beats using data from Google Trends, Alexa, and Twitter mentions. (Sentieo)
  • Schroders incorporates alternative data not to implement quantitative approaches, but to augment its fundamental analyses. Having hired their first data scientist in September 2014, the fund now employs 27 such employees.“The data that does not fit into our analysts’ spreadsheets is the gap that we are trying to fill.” – Mark Ainsworth, Head of Data Insights, Schroders. (MarketsMedia)
  • Automakers explore monetizing the data collected by smarter cars. “Hedge funds probing the health of the economy want anonymized trunk sensor data to see if you bought anything when you went to the mall.” (Bloomberg)

Other reading:

  • T-Mobile claims that RootMetrics rankings, the widely cited reports on mobile coverage, are biased in favor of Verizon due to a dataset of “paid consultants.” T-Mobile points to OpenSignal, a crowdsourced mobile coverage dataset, as an alternative source that provides unbiased data. According to OpenSignal, T-Mobile is on equal ground with Verizon in coverage and speed. (Android Authority)
  • ARLnow.com, an Arlington, VA news source, reports large traffic volume from an “internal Amazon.com page devoted to its HQ2 search,” leading to speculation that Arlington is in the final mix for new Amazon HQ. The report does not disclose the referral URL or identification methods, leaving the possibility that a crawler or bot using Amazon Web Services is being mistaken for employee traffic. (ARLnowBusiness Insider)
    Get the latest on AlternativeData.org. Join over 1,000 investors from companies like Citadel, Millenium, Point72, Lone Pine, Tiger Global, Fidelity, and BlackRock.




Takeaways from Battlefin Miami

Battlefin brought together 107 asset managers ($760bn AUM), 94 data providers, and ~100 other industry professionals in Miami from January 30-31. Format was productive with packed, short presentations in the morning followed by an afternoon of back-to-back 15 minute one-on-one meetings. AlternativeData.org was a media partner for the event, from which we highlight new datasets, updates, and key takeaways below.

Twenty-four new datasets:

  • Consumer Edge Insights – Credit card transaction panel of over 15mm users from hundreds of US banks. Also has merchant scanner data, Amazon basket tracking with 100k opt-in panel, and survey data.
  • Standard Media Index – Ad spend data sourced directly from booking and invoice systems of media holding partners. Data is aggregated monthly.
  • Epsilon – Marketing company with ~130mm US users’ credit card transaction data.
  • Rystad Energy – Tracks 1,000 companies in the oil and gas industry, providing metrics on exploration, production, oilfield servicing, and North American shale.
  • BizQualify – Tracks company employee benefit plans using IRS and Department of Labor filings.
  • TMT Analysis – Mobile device data provider with metrics tracking unique ad-cookie IDs, IMEI data, and number portability.
  • EPFR – Daily fund flows data, showing the fund origin and destination of moving assets.
  • FeatureX – Satellite analytics provider. API allows for natural language querying.
  • Drawbridge – Data on cross-device consumer attribution.
  • Edison – Real-time data on user purchases and product demand, sourced directly from Edison’s mail app. Covers 11,000 brands. Acquired Return Path’s Consumer Insights business.
  • Dodge – Construction data provider with information on projects and bidding.
  • Linkup – Global job listing provider with 150mm jobs tracked since 2007. Provides both raw data and insights.
  • Sequentum – Web scraping software and solutions.
  • GovSpend – Data on government spending, filterable by products, companies, or people.
  • aWhere – Agriculture data provider with global coverage of key predictors including weather, pest, and disease risk.
  • Vigilant – Public records data provider with real-time alerts across courts, lobbying records, business filings, and campaign financing, among others.
  • Amenity Analytics – Text analytics platform for analyzing unstructured data. Customizes reports for earning call transcripts, regulatory filings, broker research, news, and more.
  • ListenFirst – Tracks social data across organic & paid channels to create a full picture of a company’s social presence.
  • Sharablee – Aggregates all social pages to assess social presence for brands and companies.
  • MKT Mediastat – Unique signals from company media coverage, including measurements of unexpected news coverage, rate of agreement across media sources, and linkages between companies.
  • QL2 – Public data on travel, retail, and automotive companies. Cover ~150 public and ~150 private companies.
  • Sustainalytics – Environment, social, and governance (ESG) score data provider. Provides the ESG scores shown on Yahoo Finance.
  • Owl Analytics – Data on environment, social, and governance (ESG) metrics. Mission is for investors to be able to maintain strategy but point their capital toward companies that have positive social and environmental impact.
  • ISS Analytics – Data on governance metrics as an indicator of company performance.


  • Main difference between top two web-traffic data providers:
    • Jumpshot – Created to monetize the data from antivirus software Avast. Has more reliable cohorts (people don’t uninstall antivirus software often) but has more panel bias.
    • SimilarWeb – Based on browser extensions, manages their panel bias better (given broad distribution of users) but suffers from higher cohort turnover.
  • AppAnnie, a mobile app usage provider, now has a dedicated professional services team that provides custom data and analytics from their dataset.
  • Enigma, a public data and infrastructure provider, uses data to measure new wells and operations of oil production. Correlates with revenue.
  • Ursa, a satellite data provider, says China dataset on oil storage is their most robust dataset. Ursa provides total storage and flows 2-3 months prior to government reports.
  • GroundTruth, a geolocation data provider, has a separate company called “Skymap” (200 employees) that is entirely devoted to “geo-fencing”, associating each location with a given place of business and keeping track of changes over time.
  • Cuebiq, a geolocation data provider, has ~72mm MAUs in US (one-third of smartphones).
  • Reveal Mobile, a geolocation data provider, has just started selling to institutional investors and has 125mm phones in US.
  • Thinknum, a web data aggregator, tracks FB check-ins. Their customer base is 20% sell-side. They have a tool that correlates a given data point with a stock price.
  • Thasos, a geolocation data provider, has 2.5 years of history and provides weekly delivery of over 400 KPIs. Best KPI to forecast is sales.
Key takeaways from presentations and discussions

Common theme throughout the conference was that access to certain data sources is no longer the main source of alpha, but rather the ability to process that data well and reach the best insights the fastest.

Nobody has figured out how to automate the data cleaning process. It is a heavily manual process that requires a lot of work everywhere. Philip Brittain presented the CRUX model to make data “Available, Accurate, and Actionable”. Focus on data engineering rather than data analysis to develop a process that maintains “data in motion”, providing stream of answers, while addressing maintenance and irregularities.

  • Elements of Data Engineering: ingestion, extraction, validation, structuring/storing, cleaning, normalization, mapping/standardizing, tagging/enriching, joining, de-duping.
  • Machine learning should theoretically be able to help automate a lot of this work.

Integrating various different alternative data sources requires a firm grasp of investment questions around a particular ticker. YipitData demonstrated how it created 7 different datasets from 3 data sources to develop a very granular product that addressed key investor questions on GRUB. Here’s how:

  • Start with the key investor questions for a particular name.
  • Search for the data sets that speak specifically to those questions.
    • If a dataset doesn’t address a key investor questions – make sure you have confidence in the data provider’s ability to dig into their data and create something new.
  • Focus on one data set first and then build from there.
    • YipitData started scraping just GRUB’s restaurant locations, but as the investment narrative on GRUB evolved, they layered on additional datasets that build upon one another.

Many data providers emphasized they are receiving increased attention from quant funds in the past 6 months. There seems to be a trend of the major quants starting to incorporate more traditionally fundamental-oriented alternative datasets. Common quant needs include:

  • High time granularity and delivery frequency (at least weekly).
  • Coverage across many tickers (100+) for a given metric.
  • Long time series (3+ years) for a given metric.

Chris Petrescu, ex Data Strategy at WorldQuant, emphasized the importance of having a dedicated data analysis team with an engineer that is focused on answering the main questions on the data.

  • It can be exciting to work with data owners that have no finance experience and offer a valuable raw product, but most analysts often underestimate the amount of work required to turn that into valuable insights.
  • Alpha is found in stitching datasets together and drawing broader conclusions from them, not looking at one standalone.

Challenges for geolocation data providers:

  • Getting a highly specific location (confusing a spot with its next door location).
  • Differentiating between customers vs. employees?
    • Ability to measure “cross visitation” vs. simply aggregate footfall is an advantage over satellite data, but is very hard to attribute.
  • Changes by Apple/Google to their OS (location services APIs), needs a lot of oversight and testing to adapt SDKs and ensure consistency.
    • Past few years have shown significant reduction of SDKs that can exist in-app, so data providers using SDKs now need to show clear value to keep high penetration.

Satellite imagery is best suited for restaurant, home improvement, and specialty store sectors, according to backtest of RS Metrics data from Wolfe Research. The satellite provider evaluation found that industries with more concentrated peak hours of operations have the most success in capturing traffic.

  • Best performing sub-industry: restaurants, home improvement retail, specialty stores, department stores, home furnishing retail.
  • Tickers with highest correlation: LOW, CMG, HD, JCP, BWLD, TGT, ROST, LL, BIG, TSCO.
  • Still, credit card and foot traffic data can be better predictors for these sectors, depending on geographic bias and percent of customers paying with cash.

Observations on satellite data:

  • Frequency and resolution of satellite imagery are expected to improve drastically over the next 5 years as we move toward real-time visual analytics.
  • Satellite data for Asian markets is often less reliable due to the higher cloud cover/air pollution levels.

StockTwits could be used as a source of sentiment data for cryptocurrencies.25% of all engagement and communications on the 1.5mm user social network is now cryptocurrency related.




Buy-side Alternative Data Employee Analysis

We compiled a dataset of alternative data full-time employees (FTEs) on the buy-side to analyze the various recruiting trends impacting institutional investors. As competition for data talent heats up, it is essential to understand the landscape, background, and cost of these professionals.

Key Takeaways:

  • The number of alternative data FTEs has grown ~450% in last 5 years.
  • Most alternative data FTEs have 11+ years experience and do not have graduate degrees.
  • Tech, Academia, and Data Providers are quickly becoming main channels for sourcing alternative data FTEs.
  • Cost of an alternative data team starts at $1.5 – $2.5m.
Building the employee database.

Our methodology leveraged LinkedIn, IPREO, and the AlternativeData.org network to scan through the 14k buy-side funds to find individuals that are focused on alternative data initiatives full time. We first identified all data-focused individuals within discretionary funds and then screened for all false positives, including employees that work with traditional datasets (e.g. macro, business, market, etc.). We then reviewed each individual’s profile to confirm their focus on alternative data and arrived at a final database of 163 funds that employ a total of 340 alternative data FTEs (Figure 1).

bottom right 2 Images for AD.org - Data on Investors Using Alternative Data copy.004

Figure 1. Building the dataset.

This methodology has some limitations, suggesting that the actual number of data FTEs may be even higher:

  • Various alternative data FTEs are not on LinkedIn, IPREO, or our network
  • People’s titles don’t always reflect their responsibilities/focus
  • Most people don’t highlight “alternative data” in their profiles
  • People don’t update their LinkedIn profiles very often
How quickly is the landscape evolving?

We charted the growth in alternative data FTEs over time, capturing the acceleration of this skillset over the last five years (Figure 2). While both the total number of employees and the total number of funds employing alternative data FTEs is increasing, the total employee count is increasing at a faster rate. This suggests that funds are increasingly hiring more data talent and building out entire teams.


Figure 2. 4x Growth of alternative data FTEs in last 5 years.

We compared growth in alternative data FTEs to growth in alternative data providers and identified that, while providers had a correlated trend, their inflection point for growth occurred roughly four years earlier than FTEs (Figure 3). This suggests that between 2009-2012, funds realized that they could no longer outsource (or avoid) the need to analyze and integrate new sources of alternative data.


Figure 3. Funds are playing catch-up building out their alternative data teams.

What is the composition of alternative data FTEs?

We investigated the different roles that comprise the data FTE sample to understand its composition (Figure 4). We grouped various different job titles into 6 major categories to better identify trends across each function. We found that 59% of FTEs are in Data Analyst and Data Scientist positions. These are also the functions that have been growing the fastest, at 3x the rate of the other data categories (Figure 5).

bottom right 2 Images for AD.org - Data on Investors Using Alternative Data copy.011

Figure 4. Majority of Buy-side alternative data FTEs are Data Analysts and Data Scientists. Note: Data Scout refers to roles in which the primary responsibility is data sourcing.


Figure 5. Data Analyst and Data Scientist have the highest growth rate of major alternative data FTE functions.

Not just hedge funds in this game.

An important takeaway from looking at the types of funds in the dataset was that hedge funds are not the only ones that have been adding alternative data FTEs. Long-only funds are adding considerable amounts of these employees. We identified several long-only funds that have built full data teams or have many alternative data FTEs, including Schroders, Fidelity, Capital Group, Neuberger Berman, T.Rowe Price, and Invesco.

Given long-only investors experience much longer investment cycles (~5 years) than their hedge fund counterparts (quarterly), it is reasonable to assume that they have not yet seen and validated the impact of alternative data in their investment decisions. As a result, we expect that several more long-only funds will commit to building dedicated data teams in the near future as more alternative data is incorporated and its ROI is demonstrated.

Backgrounds of Alternative Data FTEs.

We examined the dataset to identify profile characteristics and trends that would help in recruiting alternative data skill sets. We first looked at the educational concentration and previous employer type of the four main functions to understand the general background of each function. Most functions had relatively high concentrations of STEM backgrounds, except for Heads of Data, who were almost entirely from traditional investment backgrounds (Figure 6). We expect that the background of Heads of Data will diversify with time, as Data Analysts and Data Scientists with STEM backgrounds progress into leadership roles. 

bottom right 2 Images for AD.org - Data on Investors Using Alternative Data copy.018

Figure 6. Heads of Data have more traditional buy-side backgrounds than other alternative data functions.

How do these employees differ from typical buy-side talent?
When compared to the average educational background on the buy-side, alternative data FTEs hold significantly more STEM degrees, but less Ivy League degrees and MBAs (Figure 7). It is evident how the industry profile will change significantly in the coming years. Recruiters will need to adapt, as few will have extensive networks of referrals for this skillset.
bottom right 2 Images for AD.org - Data on Investors Using Alternative Data copy.024

Figure 7. Alternative data FTEs have more STEM degrees than the average buy-side employee. Much lower concentration of Ivy League or MBA degrees for these roles.

Did most alternative data professionals attend graduate school?

We also examined education levels across the different functions and found that graduate degrees are highly concentrated to Data Scientist positions (Figure 8). Over 40% of Data Scientists hold a graduate degree. While we expected this given the technical sophistication of that role, one could also conclude that you probably do not need to hire a PhD or graduate student for the majority of these roles.


Figure 8. Only Data Scientists have a high concentration of graduate degrees amongst alternative data FTEs. Most roles don’t require a graduate degree.

Where to find alternative data talent?

In 2012, most talent came from other funds (69%) or the sell side (20%). In 2017, the Sell-side has remained largely the same (19%), but sourcing from other funds has decreased substantially (48%) (Figure 9). Over the last 5 years, funds have substantially increased their hiring from tech companies, academia, and data providers. We expect these channels to continue diversifying and growing as the industry seeks to fill the increasing demand for the alternative data skillset.

bottom right 2 Images for AD.org - Data on Investors Using Alternative Data copy.026

Figure 9. Funds are increasingly sourcing Alternative Data FTEs from tech, academia, and data providers.

How experienced are these professionals?

We looked at work experience and found that the majority of funds hired individuals with 11+ years of experience (Figure 10). Few funds are currently building out these teams with recent college graduates. This will change over time, as the alternative data skillset is only around seven years old. As the use cases for alternative data grow, we expect funds to invest more in hiring and training younger talent for roles on their data teams.


Figure 10. The majority of alternative data employees have 11+ years of experience.

How much will this cost?

Finally, we gathered compensation figures to estimate the cost of building a small, but complete, data team at a fund (Figure 11). We estimated that a team comprised of each of the functions and three Data Analysts would start at $1.5m – $2.5m, at an entry level. With consideration for insurance, benefits, overhead, etc., it is likely that the true cost could be twice as much. Moreover, from the size of some teams and anecdotal research, several top funds are already spending over $10m on alternative data teams.


Figure 11. Alternative data FTE team compensation starts at $1.5m – 2.5m.

We are just getting started.

Competition for alternative data talent on the buy-side is escalating and has yet to hit full stride. As data sources grow, integration improves, ROI manifests, and more long-only funds begin building their data teams, we expect demand for this skill set to accelerate. One clear takeaway is that there is not enough alternative data talent within the institutional investor industry to sustain the growing demand. We expect to see funds training younger candidates and increasingly competing with tech and other industries to hire top talent. As demand grows, the cost of attracting top talent out of other fields will increase as well.

See our article How to Integrate Data Analysts, Data Engineers, and Research Analysts to learn more.


The Download 13: Using Alternative Data to Win Board Seats, 7 New Datasets


New Datasets

  • Slingshot Aerospace – Data provider with satellite, aerial, and drone capabilities.
  • Dun & Bradstreet (PAYDEX) – Tracks company health with dollar-weighted score for how promptly a business pays its bills.
  • Gyana – UK geolocation data provider.
  • h2o – Machine learning platform allowing insights without data science expertise.
  • Brain Company – Sentiment data provider using public data.
  • Endor – Predictive engine generating results from questions asked in plain language; has applications in consumer research.
  • Bitvore – Identifies ticker-level price inflections from analyzing the news.


  • Nowcast, a Japanese transaction data provider, teamed up with CCC Marketing to forecast company sales based on T-card data, a popular rewards card generating $63 billion in annual purchases. (Bloomberg)
  • Thasos Group, a geolocation provider, formed agreement with Chinese government to assess GDP growth on district level.
  • ExtractAlpha, a social/sentiment provider, added new ClosingBell dataset providing crowd-sourced buy/sell ratings from a collaborative trading app.
  • See full public database of 218 data providers.
  • D.E. Shaw used alternative data to win a seat on the board of Lowe’s. Using satellite, census, and survey data, the fund made their case that an additional $8 billion in annual revenue was left on the table. (WSJsubscription required)
  • Competition for alternative data analyst talent is escalating across hedge funds and long-onlies. Full analysis to be published on AlternativeData.org. (Financial Timessubscription required)
  • Integrity Research went deep into the history of legal challenges to alternative data usage and best practices for compliance with trading regulations. (Integrity Researchlong read)
  • Electronic Frontier Foundation, Internet Archive, and DuckDuckGo filed amicus brief supporting HiQ and attacking LinkedIn’s suggestion that the Computer Fraud and Abuse Act should be used to limit web scraping of publicly available data. (Integrity Researchsubscription required)
  • Three articles discussed the challenges faced in using quantitative techniques to make investment decisions.
    • “We have yet to find quantitative techniques that can anticipate human behavior. . . . Our fundamental process exists to . . . understand those changes on the margin. . . . Our quantitative group is really responsible for helping us establish accurate starting points.” – Ryan Caldwell, Co-Founder and Chief Investment Officer, Chiron Investment Management. (Forbes)
    • “To me the biggest challenge is processing — being able to build that pipeline. The actual code to derive the alpha part is so much smaller than the whole wrapper around it which is focused on cleaning and processing, reconciliation and post-trade processes.” – Mansi Singhal, Co-Founder, qplum. (finextra)
    • A case study on Voleon fund showed how quant strategies are harder in practice than in theory. (WSJsubscription required)
  • Earnest Research, a credit card transaction provider, published findings that AMZN received 89% of all holiday spending across Walmart, Best Buy, Target, and itself. (Bloomberg)
  • Sentieo, a data interface, showed that analyst sentiment moved favorably on utilities last quarter, while manager sentiment trended negatively. (Forbes)

Takeaways from Quandl’s Conference

Quandl assembled 400 buy-side investors, data providers, and sell-side professionals at their second Alternative Data Conference on January 18, up 143% from last year. Presentations focused on 5 main themes: A.I., resources for success in using alternative data, quantamental analysis, presenting new datasets, and compliance.

Six new datasets were introduced:

  • Legal Shield – leverages legal information from a network of 1.7m subscribers, 6.9k broker clients, and 34 law firms to predict macro indicators: consumer confidence, housing starts, total bankruptcies, foreclosure starts, and existing home sales.
  • Quandl M&A Insights – uses aviation industry partnerships to track daily activity of 43k private jets, unmasking FAA block list.
  • SimilarWeb – web traffic and app usage data provider. Highly accurate on Williams Sonoma e-commerce sales.
  • Dun & Bradstreet – provides metrics that measure the health of private companies.
    • PCC Indicator (monthly) tracks ~400 businesses for each of the 43k zip codes in the US. Highly correlated to US GDP at a regional level.
    • PAYDEX is a dollar-weighted score for how promptly a company pays its bills.
    • Collected this data from a trade program over 30 years.
  • S&P Market Intelligence – using language analytics to understand sentiment from earnings calls. Has data on 8,300 companies with history back to 2004.
  • Wolfe Research – partnered with Quandl, by using its Dodge US Construction dataset to develop 30+ construction factors with forecasting power on asset returns.

Tools mentioned throughout the conference:

  • Jupyter – open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
  • Trust.mit.edu – info on creating a secure multi-party system that enables interaction with a protected dataset without sharing it.
  • MAPD – data integration and visualization platform. Uses GPU (really fast) database and runs on AWS. Has an open source offering called MapD Core.
  • Gunning Fog Index – open source “readability” test, similar to what is used by language analysis providers.
  • Vin.place – database of car ownership, sourced from VIN numbers.

You can predict human behavior better by understanding groups and tribes rather than individuals – Alex Pentland, Professor at MIT.

  • 90-95% of human behavior is predictable when analyzing groups. The character of that 5-10% unpredictable behavior can be a leading indicator of trends and performance.

Top safety recommendations for data practitioners from Alex Pentland:

  • Do not put all your data in one place – move/share answers, not the data itself.
  • Create an auditable Q&A system – use blockchain to log your process.
  • Never decrypt the data – use a secure multi-party platform. (More details attrust.mit.edu)
    • Data owners should think about charging for access to data and not for the data itself.

New dataset on buy-side alternative data full-time employees (FTEs) was created to analyze how funds are recruiting and compensating their teams. James Moran, Co-founder and President of YipitData, presented the major hiring trends and integration best practices from the dataset. Some highlights:

  • Number of alternative data FTEs at funds has doubled every year over the last 5 years.
  • Significant demand for data talent also coming from Long-only investors.
  • Educational background at funds likely to shift significantly toward STEM majors (41% of alt. data FTEs), primarily Mathematics and Computer Science.
  • Tech Companies, Data Providers, and Academia accelerate as FTE sourcing targets.
  • An introductory alternative data team likely to cost more than $3mm with many funds already paying well over $10mm on alternative data FTEs.
  • Full analysis to be published on AlternativeData.org.

Matthew Rothman, MIT Professor, was hired to lead Goldman’s “Data as a Service” offering. He left the details of this offering TBD.

  • He’s an “alternative data contrarian” who believes we should be focused on addressing 1st order data problems (internal/proprietary data that already exists) instead of 3rd order data challenges from 3P alternative data sources.
    • Pushed the audience to use technology and analytics to find value in their own internal data and current processes.
  • Provided a couple of examples of data applications:
    • Measuring garbage as a better productivity indicator.
    • Sports car ownership a sensation-seeking indicator for leadership risk tolerance, habits, and attitudes. (Can use vin.place)

Top line performance numbers don’t really tell the full story. Analysts should use alternative data as a fundamental snapshot of what is happening underneath the hood, understand category and unit level trends. – Michael Recce, Chief Data Scientist at Neuberger Berman.

  • People don’t realize how much information they leave online. Cookies can be tied to a particular IP address, even once you delete your cookies.
    • This data can be aggregated, sold, and used to see what specific companies (e.g. funds) are over-indexing in search terms and content.
  • Funds must place engineers and technologists at the center of the investment idea process.

Schroders launched their data insights unit in late 2014 and now has a team of 22 engineers and data scientists. Led by Mark Ainsworth in London, the unit just added its first New York team member.

  • They successfully used satellite imagery to predict the outcome of M&A activity in the UK.

2017 Engagement Highlights

AlternativeData.org saw a lot of interest in 2017. We’d like to share some of the most highly engaged content in case you missed it. Please email us with any comments or content recommendations for the year ahead.


Below is a distribution of clicks generated by each data category in our Provider Database during 2017.


  • Subscribers are most interested in niche datasets. ‘Emerging Data Categories’ had the highest concentration of clicks and includes niche datasets that are usually industry sector specific and/or private company exhaust data (e.g. transportation, video game, and retail).
  • Weighed by provider count, Credit/Debit Card data had the highest CTR at 90% above the average CTR of other data source categories.
    • Web Data and App Usage both received strong engagement at 20% above average CTR.
  • While Social/Sentiment received a large amount of clicks (10% of total), its weighted CTR was 40% lower than average given the large number of providers in the category. 
    • Web Traffic and Sell-side were among the lowest weighted CTRs, more than 30% below average CTR.
  • Data Brokers, Infrastructure/Interface, and Consultants received a combined 31% of all clicks, highlighting the challenges of discovery fatigue and data analysis/integration. 
  • These were the newly discovered data providers that received the most clicks:
    • One Click Retail (Emerging/Consumer) – Includes AMZN dataset.
    • Random Walk (Geo-location) – Consumer foot traffic and email receipts.
    • Re-analytics (Web Data) – Fashion, retail, consumer, and travel.
    • Dawex (Data Broker) – Open alternative data marketplace.
    • EEDAR (Emerging/Video Games) – Data on over 127K game products.
    • Broughton Capital (Emerging/Transportation) – Trucking, rail, and airfreight.
    • BayStreet Research (App Usage) – Smartphone, tablet, and wearables.
    • Mavrx (Satellite) – Satellite, aerial, and infrared data for agriculture.
    • FaunaDB (Infrastructure/Interface) – Enterprise data warehousing.
  • When Silicon Valley came to Wall Street. (Financial Times – subscription required)
  • UBS wins Institutional Investor #1 in equities thanks to their investment in data team, Evidence Lab. (Institutional Investor)
  • Two academics publish their process for using sentiment data to predict iPhone X success. (insideBIGDATA)
  • Alpha Architect, a small asset manager, published a primer on machine learning for investors. (Alpha Architect)
  • 74% of hedge funds plan to increase spending on alternative data, based on a survey of 50 hedge funds by Greenwich Associates and Arcadia Data. (Greenwich Associates – subscription required)
  • Market size for alternative data estimated between $183 – $200mm, and projected to double in 4 years. (Value Walk – subscription requiredQuartz)
  • Web scraped listings help predict a decline in US retail employment.(Financial Times – subscription required)
  • A federal court ruled against LinkedIn, confirming that a startup can scrape its publicly available data – a potentially precedent-setting ruling in favor of web scraping based analytics. (Ars TechnicaWSJ – subscription required)

Number of alternative data providers: 212
Discretionary funds using alternative data: 163
Alternative data full-time employees at funds: 340

Growth of Alternative Data Providers 01_25_18.png


The Download 12: 7 New Datasets, Google Maps vs. Apple Maps, BattleFin Discount

New Datasets
  • Jiguang – App usage provider on over 800mm Chinese Android devices.

  • IPqwery – Collects patent and IP ownership data from multiple public records offices.

  • Broughton Capital – Has seven transportation data sets across trucking, rail, and airfreight.

  • FNGO – Provides Korean export data through partnership with Korea Customs Service. High correlation with product revenue for Samsung, Hyundai, and many more.

  • ThinkTopic – Analytics tools for satellite imagery.

  • TruValue Labs – Provides ESG metrics as an indicator of company performance. Recently benchmarked its ESG scores on a group of equities, outperforming the S&P 500 by 3-5% over the past five years. (prsnewwire)


  • Growth in alternative data sources combined with MiFID II consolidation (and the resulting increase in sell-side research competition) likely to create more research products using alternative data in 2018.
    • The history of Wall Street has been about copying and emulation … Having 35 analysts cover the same stock in the same way is just not going to cut it anymore.” – Barry Hurewitz, Global COO UBS Research. (Integrity Researchsubscription required)
    • “The use of alternative data to make investment decisions will only intensify in the coming year.”  Bjørn Sibbern, NASDAQ. (Markets Media)
  • Deloitte published a white paper evaluating risk/reward in alternative data implementation. (Deloitte)
  • CTOs and CIOs among most in-demand roles for asset managers. “Larger houses on the buy side … are looking to hire a CTO or CIO or VP of IT with cloud and data analytics experience and leadership skills, being able to bring strategies to the table and communicate to business leaders effectively.”  – Emmeline Kuhn, Leathwaite. (efinancialcareers)
  • Google Maps far superior to Apple Maps because of its combined use of satellite and street-view imagery. (justinobeirne.comlong read) 

Miami Beach, FL  |  January 30-31, 2018
BattleFin’s Discovery Day is a 2-day pre-arranged one-on-one meeting event that connects investment firms looking to integrate alternative data into their investment process. BattleFin is targeting 100 alternative data companies in the satellite imagery, geolocation, sentiment, web scraping, social and other categories. The event will also cover sourcing, technology and compliance topics. More on the agenda here and event logistics here.

 Subscribers receive a 20% discount by using code:  AltData20


To RSVP, fill out the contact form, which will then take you to the ticket purchasing page.