Battlefin brought together 107 asset managers ($760bn AUM), 94 data providers, and ~100 other industry professionals in Miami from January 30-31. Format was productive with packed, short presentations in the morning followed by an afternoon of back-to-back 15 minute one-on-one meetings. was a media partner for the event, from which we highlight new datasets, updates, and key takeaways below. Twenty-four new datasets:
  • Consumer Edge Insights - Credit card transaction panel of over 15mm users from hundreds of US banks. Also has merchant scanner data, Amazon basket tracking with 100k opt-in panel, and survey data.
  • Standard Media Index - Ad spend data sourced directly from booking and invoice systems of media holding partners. Data is aggregated monthly.
  • Epsilon - Marketing company with ~130mm US users’ credit card transaction data.
  • Rystad Energy - Tracks 1,000 companies in the oil and gas industry, providing metrics on exploration, production, oilfield servicing, and North American shale.
  • BizQualify - Tracks company employee benefit plans using IRS and Department of Labor filings.
  • TMT Analysis - Mobile device data provider with metrics tracking unique ad-cookie IDs, IMEI data, and number portability.
  • EPFR - Daily fund flows data, showing the fund origin and destination of moving assets.
  • FeatureX - Satellite analytics provider. API allows for natural language querying.
  • Drawbridge - Data on cross-device consumer attribution.
  • Edison - Real-time data on user purchases and product demand, sourced directly from Edison’s mail app. Covers 11,000 brands. Acquired Return Path’s Consumer Insights business.
  • Dodge - Construction data provider with information on projects and bidding.
  • Linkup - Global job listing provider with 150mm jobs tracked since 2007. Provides both raw data and insights.
  • Sequentum - Web scraping software and solutions.
  • GovSpend - Data on government spending, filterable by products, companies, or people.
  • aWhere - Agriculture data provider with global coverage of key predictors including weather, pest, and disease risk.
  • Vigilant - Public records data provider with real-time alerts across courts, lobbying records, business filings, and campaign financing, among others.
  • Amenity Analytics - Text analytics platform for analyzing unstructured data. Customizes reports for earning call transcripts, regulatory filings, broker research, news, and more.
  • ListenFirst - Tracks social data across organic & paid channels to create a full picture of a company's social presence.
  • Sharablee - Aggregates all social pages to assess social presence for brands and companies.
  • MKT Mediastat - Unique signals from company media coverage, including measurements of unexpected news coverage, rate of agreement across media sources, and linkages between companies.
  • QL2 - Public data on travel, retail, and automotive companies. Cover ~150 public and ~150 private companies.
  • Sustainalytics - Environment, social, and governance (ESG) score data provider. Provides the ESG scores shown on Yahoo Finance.
  • Owl Analytics - Data on environment, social, and governance (ESG) metrics. Mission is for investors to be able to maintain strategy but point their capital toward companies that have positive social and environmental impact.
  • ISS Analytics - Data on governance metrics as an indicator of company performance.
  • Main difference between top two web-traffic data providers:
    • Jumpshot - Created to monetize the data from antivirus software Avast. Has more reliable cohorts (people don’t uninstall antivirus software often) but has more panel bias.
    • SimilarWeb - Based on browser extensions, manages their panel bias better (given broad distribution of users) but suffers from higher cohort turnover.
  • AppAnnie, a mobile app usage provider, now has a dedicated professional services team that provides custom data and analytics from their dataset.
  • Enigma, a public data and infrastructure provider, uses data to measure new wells and operations of oil production. Correlates with revenue.
  • Ursa, a satellite data provider, says China dataset on oil storage is their most robust dataset. Ursa provides total storage and flows 2-3 months prior to government reports.
  • GroundTruth, a geolocation data provider, has a separate company called “Skymap” (200 employees) that is entirely devoted to “geo-fencing”, associating each location with a given place of business and keeping track of changes over time.
  • Cuebiq, a geolocation data provider, has ~72mm MAUs in US (one-third of smartphones).
  • Reveal Mobile, a geolocation data provider, has just started selling to institutional investors and has 125mm phones in US.
  • Thinknum, a web data aggregator, tracks FB check-ins. Their customer base is 20% sell-side. They have a tool that correlates a given data point with a stock price.
  • Thasos, a geolocation data provider, has 2.5 years of history and provides weekly delivery of over 400 KPIs. Best KPI to forecast is sales.
Key takeaways from presentations and discussions
Common theme throughout the conference was that access to certain data sources is no longer the main source of alpha, but rather the ability to process that data well and reach the best insights the fastest. Nobody has figured out how to automate the data cleaning process. It is a heavily manual process that requires a lot of work everywhere. Philip Brittain presented the CRUX model to make data “Available, Accurate, and Actionable”. Focus on data engineering rather than data analysis to develop a process that maintains “data in motion”, providing stream of answers, while addressing maintenance and irregularities.
  • Elements of Data Engineering: ingestion, extraction, validation, structuring/storing, cleaning, normalization, mapping/standardizing, tagging/enriching, joining, de-duping.
  • Machine learning should theoretically be able to help automate a lot of this work.
Integrating various different alternative data sources requires a firm grasp of investment questions around a particular ticker. YipitData demonstrated how it created 7 different datasets from 3 data sources to develop a very granular product that addressed key investor questions on GRUB. Here’s how:
  • Start with the key investor questions for a particular name.
  • Search for the data sets that speak specifically to those questions.
    • If a dataset doesn’t address a key investor questions - make sure you have confidence in the data provider’s ability to dig into their data and create something new.
  • Focus on one data set first and then build from there.
    • YipitData started scraping just GRUB’s restaurant locations, but as the investment narrative on GRUB evolved, they layered on additional datasets that build upon one another.
Many data providers emphasized they are receiving increased attention from quant funds in the past 6 months. There seems to be a trend of the major quants starting to incorporate more traditionally fundamental-oriented alternative datasets. Common quant needs include:
  • High time granularity and delivery frequency (at least weekly).
  • Coverage across many tickers (100+) for a given metric.
  • Long time series (3+ years) for a given metric.
Chris Petrescu, ex Data Strategy at WorldQuant, emphasized the importance of having a dedicated data analysis team with an engineer that is focused on answering the main questions on the data.
  • It can be exciting to work with data owners that have no finance experience and offer a valuable raw product, but most analysts often underestimate the amount of work required to turn that into valuable insights.
  • Alpha is found in stitching datasets together and drawing broader conclusions from them, not looking at one standalone.
Challenges for geolocation data providers:
  • Getting a highly specific location (confusing a spot with its next door location).
  • Differentiating between customers vs. employees?
    • Ability to measure “cross visitation” vs. simply aggregate footfall is an advantage over satellite data, but is very hard to attribute.
  • Changes by Apple/Google to their OS (location services APIs), needs a lot of oversight and testing to adapt SDKs and ensure consistency.
    • Past few years have shown significant reduction of SDKs that can exist in-app, so data providers using SDKs now need to show clear value to keep high penetration.
Satellite imagery is best suited for restaurant, home improvement, and specialty store sectors, according to backtest of RS Metrics data from Wolfe Research. The satellite provider evaluation found that industries with more concentrated peak hours of operations have the most success in capturing traffic.
  • Best performing sub-industry: restaurants, home improvement retail, specialty stores, department stores, home furnishing retail.
  • Tickers with highest correlation: LOW, CMG, HD, JCP, BWLD, TGT, ROST, LL, BIG, TSCO.
  • Still, credit card and foot traffic data can be better predictors for these sectors, depending on geographic bias and percent of customers paying with cash.
Observations on satellite data:
  • Frequency and resolution of satellite imagery are expected to improve drastically over the next 5 years as we move toward real-time visual analytics.
  • Satellite data for Asian markets is often less reliable due to the higher cloud cover/air pollution levels.
StockTwits could be used as a source of sentiment data for cryptocurrencies.25% of all engagement and communications on the 1.5mm user social network is now cryptocurrency related.
[yikes-mailchimp form="1"]