2019.9 - Data Release Log

Overview

This month featured updates to the property schema hoping to help customers know when the current status occurred.

Record Counts

The following are the updated record counts for each vertical:

  • Business data increased by 2.2 million to 92.3 million
  • People data remains at 9.5 million
  • Product data increased by 11.6 million to 140.9 million
  • Property data increased by 2.6 million to 34.8 million

New Sources

The following data types have received new sources:

  • 2 Business data sources
  • 1 Product data source
  • 1 Property data source

Schema Changes

  • Property data receives a new field mostRecentStatusDate and the accompanying nested field value statuses.date which describes the date a status took place.
  • imageURLs will only contain the most recent set of links from a particular domain versus continuing to compile new links as they are gathered.
  • We're still continuing our transition from crawlResultFiles to crawlIDs, Product data is the remaining data source being transitioned.
  • geoLocation continues to be implemented across our database for Business and Property data
  • dateSeen to a single value for many nested JSON object fields continues to be transitioned for all data types.
  • The new UPC and EAN fields along with the container field called gtins continues to be implemented for all Product data.

Fixes

  • Categories and features were being populated with bad seller ranking data, removal is still in-progress.
  • Address values have begun being normalized into USPS provided values. These values will be uploaded here in the coming days to assist with searching.
  • Updated 5 Business data source, 3 Property data sources, and 1 Product data sources