Build a database of wines

Introduction

You can use Datafiniti to quickly collect and generate large databases of product information. That data can be as general or as specific as you want it to be.

In this example, we'll build a variety of databases for wines.

Finding the right categorical search

Let's start with something really simple:

{
  "query": "categories:wine"
}

This is a very generic search on the categories field, which collects all the raw categorical information we know about a product. It returns the following (truncated for brevity):

{
    "num_found": 959865,
    "records": [
      {
        "brand": "MASON SHAKER",
        "categories": [
          "Kitchen Equipment",
          "Home",
          "Ice/Wine Bucket"
            ],
        "colors": [
          "GREY"
        ],
        "dateAdded": "2022-09-09T19:29:18Z",
        "dateUpdated": "2023-03-31T18:45:56Z",
        "mostRecentPriceAmount": 16,
        "mostRecentPriceCurrency": "USD",
        "mostRecentPriceDate": "2023-03-31T18:45:56.360Z",
        "mostRecentPriceFirstDateSeen": "2023-03-31T18:45:56.360Z",
        "name": "MASON SHAKER W&P DESIGN Collins Ice Cube Tray"
      }
    ]
}

So that's not exactly what we want. Let's try something else:

{
  "query": "categories:wines"
}

This returns:

{
    "num_found": 21192,
    "records": [
      {
        "asins": "B01N001860",
        "brand": "Josh Cellars",
        "categories": [
          "Grocery",
          "Wine",
          "All White Wines",
          "Wine, Beer & Liquor",
          "White Wines",
          "Alcohol",
          "Alcoholic Beverages",
          "Beer, Wine Spirits",
          "Adult Beverages",
          "Grocery & Gourmet Food",
          "White",
          "Food & Beverage",
          "Beer, Wine, & Spirits",
          "Beer, Wine & Spirits",
          "Beverages",
          "Food"
        ],
        "dateAdded": "2017-07-24T23:58:51Z",
        "dateUpdated": "2023-03-31T09:44:23Z",
        "name": "Sauvignon Blanc",
        "primaryCategories": [
          "Food Beverages & Tobacco"
        ],
        "secondaryCategories": [
          "Alcoholic Beverages"
        ],
        "taxonomy": [
          "grocery & gourmet food > alcoholic beverages > wine > red",
          "grocery & gourmet food > alcoholic beverages > wine > non-alcoholic wine",
          "grocery & gourmet food > home brewing & winemaking > winemaking ingredients > ingredient kits"
        ]
      }
    ]
}

Ok, this looks a little better, and now we're seeing some additional data that has more structured categories. Let's try searching on those.

{
  "query": "taxonomy:\"grocery & gourmet food > alcoholic beverages > wine\""
}

This returns:

{
    "num_found": 22403,
    "records": [
      {
        "brand": "Whitehaven",
        "categories": [
          "Grocery",
          "Wine",
          "Wine, Beer & Liquor",
          "Food & Beverage",
          "Beer, Wine, & Spirits",
          "Beverages",
          "Adult Beverages"
        ],
        "dateAdded": "2014-02-18T01:27:14Z",
        "dateUpdated": "2023-03-31T09:45:44Z",
        "manufacturer": "E.&J. Gallo Winery",
        "manufacturerNumber": "14765624",
        "mostRecentPriceAmount": 16.99,
        "mostRecentPriceCurrency": "USD",
        "mostRecentPriceDate": "2023-03-26T17:09:20.972Z",
        "name": "New Zealand Sauvignon Blanc White Wine",
        "primaryCategories": [
          "Food Beverages & Tobacco"
        ],
        "taxonomy": [
          "grocery & gourmet food > home brewing & winemaking > winemaking ingredients > ingredient kits",
          "grocery & gourmet food > alcoholic beverages > wine"
        ]
      }
    ]
}
  

That's looking pretty good. We can see that we have just over 22,000 matches for wine products in the database. We can download the entire database by doing the following:

{
  "query": "taxonomy:\"grocery & gourmet food > alcoholic beverages > wine\"",
  "download": true,
  "format": "CSV"
}

This will download every matching record in a CSV format. We can also download it in JSON if we want. You should also note that while the example records above show a truncated form, the actual records you download will have all the fields available in our product schema.

More advanced filtering

We might want to build a more specific database. We can use any combination of filters on the product data fields to accomplish this. Here are just a few examples:

Searching for wines that cost more than $100

{
  "query": "taxonomy:\"grocery & gourmet food > alcoholic beverages > wine\" AND {prices.amountMin:>100 AND prices.currency:USD}"
}

Searching for wines with reviews

{
  "query": "taxonomy:\"grocery & gourmet food > alcoholic beverages > wine\" AND reviews:*"
}

Searching for wines from New Zealand

{
  "query": "taxonomy:\"grocery & gourmet food > alcoholic beverages > wine\" AND {features.key:\"Country of Origin\" AND features.value:\"New Zealand\"}"
}

Example files

Here are example bulk download files of our previous query:

Conclusion

With the knowledge of how to pull records with taxonomy, you can target specific types of products to quickly collect and generate large databases of product information.