How Property Records Are Merged

For each record we collect, we generate 1 or more keys for the record. Each key value is based on different unique identifiers that are available from the record's data. If we see a different record with 1 or more of the same keys values, we will merge these two records.

For example, we may generate a property record like this when crawling a web page:

{
  "address": "123 Anywhere St",
  "city": "Austin",
  "province": "TX",
  "country": "US",
  "numBedroom": 3,
  "numBathroom": 3
}

This record will generate the following keys:

"keys": [
  "US/TX/Austin/123AnywhereSt"
]

Let's say we then crawl another web page for the same product and generate this data:

{
  "address": "123 Anywhere St",
  "city": "Austin",
  "province": "TX",
  "country": "US",
  "neighborhoods": [
    "Rolling Hills",
  ]
}

This record will generate the same keys value as the previous record, so the two records will be merged together. The resulting record will be:

{
  "address": "123 Anywhere St",
  "city": "Austin",
  "province": "TX",
  "country": "US",
  "neighborhoods": [
    "Rolling Hills",
  ],
  "numBathroom": 3,
  "numBedroom": 3
}

Property records use the following fields to generate keys:

  • address
  • city
  • country
  • province
  • taxID
  • mlsNumber

taxID and mlsNumber are used in conjunction with province.

What Happens When Datafiniti Finds Conflicting Data

Datafiniti is always looking for the most up to date property data. In doing so, we might find new or updated data from the source of a property list. Sometimes this data could vary from the existing data record in Datafiniti's database. In this case we have a data validator that will determine what data gets appended or overwritten. This process is usually determined by the nature of the top level schema field.

Appended Fields

Appended Fields are meant to serve as a history of data about the property itself. When a Datafiniti finds a conflict that shows the source of the data has updated information about the property, we will append this data to the following schema fields as an array.

  • brokers
  • deposits
  • descriptions
  • domains
  • features
  • fees
  • imageURLs
  • languagesSpoken
  • leasingTerms
  • managedBy
  • parking
  • paymentTypes
  • people
  • phones
  • prices
  • propertyTaxes
  • reviews
  • statuses

Example:

When the Datafiniti's Scraper detects a change in the price of the record of 123 Anywhere St, Datafiniti will have the new price and isSold sale status to the prices array.

"address": "123 Anywhere St",
 "country": "US",
 "dateAdded": "2022-04-26T05:45:22Z",
 "dateUpdated": "2022-06-25T23:59:53Z",
 "prices": [
                {
                    "amountMax": 389000,
                    "amountMin": 389000,
                    "availability": "true",
                    "currency": "USD",
                    "dateSeen": [
                        "2022-04-26T05:45:22Z"
                    ],
                    "isSold": "false"
                }
   ]
"address": "123 Anywhere St",
 "country": "US",
 "dateAdded": "2022-04-26T05:45:22Z",
 "dateUpdated": "2022-07-30T05:45:22Z",
 "prices": [
                {
                    "amountMax": 389000,
                    "amountMin": 389000,
                    "availability": "true",
                    "currency": "USD",
                    "dateSeen": [
                        "2022-07-30T05:45:22Z"
                    ],
                    "isSold": "false"
                },
                {
                    "amountMax": 500000,
                    "amountMin": 500000,
                    "availability": "true",
                    "currency": "USD",
                    "dateSeen": [
                        "2022-07-30T05:45:22Z"
                    ],
                    "isSold": "false"
                }
   ]

πŸ“˜

DateUpdated

Please note that any change (appended or Overwritten) will update the dataUpdated field in the record to signify when the change took place.

Overwritten Fields

Overwritten Fields are used as a quick source of convenient data that serves as a fluid all ways changing field. This is determined upon the most recent scrape of the data record source. In cases where Datafiniti has determine the source of the data is correct we will update the following fields where possible:

  • dateUpdated
  • floorSizeValue
  • listingName
  • lotSizeValue
  • mostRecentStatus
  • mostRecentStatusDate
  • mostRecentStatusFirstDateSeen
  • numBathroom
  • numBedroom
  • numFloor
  • numPeople
  • numUnit
  • mostRecentStatusFirstDateSeen
  • numBathroom
  • numBedroom
  • numFloor