Open Grocery Database Project

Since there are plenty of online retailers selling hundreds of thousands of grocery and drugstore products, one hopes that the brands and manufacturers would come together to make the information related to their products freely available. In reality, it is extremely hard to find one reliable, extensive, standardized, easy to use and open source grocery database. This project aims to change this bizarre reality and give everyone free and unrestricted access to simple downloadable database files containing UPC centric information about hundreds of thousands of grocery products.

We start with a small installment. Our first file contains a little over 100K grocery products with the following data points: grp_id, upc14, upc12, brand and product_name. We are not including other data points in this file to conserve space and make the handling of the data manageable.

In the next installment we will include information such as detailed product description, product attributes, image file URL’s, category, manufacturer and distributor information. Please note the grp_id (grocery product id) which will serve as key to our database alongside grb_id, etc (unique id’s for brand names, category names, manufacturer, etc).

I recommend that we focus on ‘upc12’ for all the products (grp_id is an internal identifier, use it to join tables and create your version of this database). However, UPC12 is not always available, and for many products in this database it was converted from longer or shorter versions of the UPC.

Anyone dealing with UPC lists knows that despite its stated “standard” status, UPC numbers come in all shapes and forms. Our database was build from a variety of UPC lengths gathered from many sources. But we did our best to convert them to the 12 digit format.

We have used a variety of techniques from simple formatting (adding leading zeros) to calculating check-sum digits for 10 and 11 digit source UPC’s using this Excel / Access formula:

=MOD(10-MOD(SUMPRODUCT(–(MID(A2,{1,3,5,7,9,11},1)))*3+SUMPRODUCT(–(MID(CA2,{2,4,6,8,10},1))),10),10)

Should anyone have a better system, or cleaner data, or ideas, please share your thoughts and data on this page. We can accommodate small requests for formatting or combining data points if it can be useful to the public. Feel free to make suggestions, requests and share your information by commenting on this page.  — thanks and enjoy, BenD

UPDATE

Jut uploaded a list of brands, and manufacturers including a count of how many products they each have.

 

Download File

2017-01-24T13:18:34+00:00 Categories: Data, News, Products|17 Comments

17 Comments

  1. Spice Dune March 20, 2017 at 11:35 am - Reply

    On a related note: grocerybear has a grocery price API that has prices for products updated daily from 13 cities across the US. This API is super easy to use

  2. Edward Woolery March 30, 2016 at 8:19 pm - Reply

    great work, I’m starting a project and i will be glad to share my database that i create. i so far have not heard anyone talk about my idea :) good for me! and I’m happy to see many other database apps/projects in the comments. i also think i will expand to android after i finish the web application ( simple front end, database and then a complex backend to help with future expansion or work loads. /thumbsup

  3. Rafeek March 26, 2016 at 10:19 am - Reply

    Very helpful for the beginners who are looking for UPC codes. Appreciate everyone’s effort behind building the system.

  4. Dave January 11, 2016 at 10:20 pm - Reply

    when is the “next installment” coming out?

  5. Oliver Batchelor August 10, 2015 at 5:27 am - Reply

    Hi there,
    Looks like a nice resource – I’ve been trying to use it to scrape product images by searching for the product names. This seems to work some of the time, unfortunately the abbreviations in the database aren’t found – i.e. google image does not known that Ssg = sausage or that itln = Italian (there are a lot of them, too many to manually fix I think).

    Thanks!

  6. Amandeep Dhaliwal June 24, 2015 at 8:19 pm - Reply

    Hi,

    This is very useful information. Thanks for uploading.

    I have a quick question about grp_id. I am not able to match both UPC and Brand files using this key. e.g. grp_ip 632 is used for different brands in both the files. Can you please correct if I am misinterpreting the data.

    Thanks,
    Amandeep

    • ArtR September 15, 2015 at 4:48 am - Reply

      grp_id and grb_id appear to be the (internal to grocery.com) unique “Primary Key” in both files and have nothing to do with each other. If you want to relate Grocery_UPC and Grocery_Brands use the “Brand” column in each table. I know it is a longer “text” field but that is how they have it set up right now which allows down loaders to use their choice of database engines to manage their data.
      In truth, Grocery_UPC should probably use the grb_id of the Grocery_Brands table in its “Brand” field but for those who only want the UPC file and NOT the brands file, this seems to be the best format. Seems to have been a typo in their description above because you can’t join on those “id” fields. As they say, they will be expanding this – someday…
      Hope this helps.

    • ArtR September 15, 2015 at 5:14 am - Reply

      Example:
      SELECT tblUPC_Items.upc14, tblUPC_Items.upc12, tblUPC_Items.name, tblUPC_Brands.Brand, tblUPC_Brands.Manufacturer
      FROM tblUPC_Items INNER JOIN tblUPC_Brands ON tblUPC_Items.brand = tblUPC_Brands.Brand;

      Hope this helps,
      Art

  7. Kairos March 27, 2015 at 4:47 pm - Reply

    Hi,

    I only want to let you know about my app for Android where I’ve implemented your DB and it works really fine. It’s a Shopping List app for and has the features of saving all your purchases and give you reports, statistics and more.

    It’d be interesting to add images to DB. If you want to mention my app on your website I’ll appreciate it.

    You can download it from:

    http://slideme.org/application/shoppingtrack

    Thank you.

  8. Jeremy Denais February 16, 2015 at 12:29 am - Reply

    Hi,

    I am also interested in participating in the project of building a useful database of products. My angle is recycling. I created a database a products and for each product, I attach a number of elements (parts) with the attribute recyclable or not.

    I am downloading your UPC database as I write and hope to integrate it in my app. The app allows users to add products to the database and attach elements to it. If my app gets any love and the database grows, I’ll let you know so we can keep things up to date.

    You can download my app here (Android only) :
    https://play.google.com/store/apps/details?id=com.apps4better.recycle4better

    Good luck.
    Jeremy

    • Jeremy Denais February 16, 2015 at 1:16 am - Reply

      Quick update, I successfully imported your database in mine.

      I have a question: Why do you bother having UPC14 and UPC12 and why do you convert other formats into UPC12? For my app, I just keep the barcode number as it appear on the product. It is easier for me and I think safer for matching the barcode with the database.

      Am I missing something?
      Thanks anyway for putting together this database.

      Jeremy

  9. C Marshall February 7, 2015 at 11:14 pm - Reply

    This is really great! When will you be updating with the next installment? The image urls will be very helpful. For that matter, the ingredients and nutritional info would be huge as well. Thanks very much for your thoughtfulness and help launching this project! We appreciate you!

  10. Simone Casciaroli January 7, 2015 at 6:35 pm - Reply

    Are you going to share also nutritional information for the groceries? it would be very hand to have such information for a large dataset

  11. Rachel Anzelmo October 5, 2014 at 7:55 pm - Reply

    I’m not sure what to do with the information that I downloaded. Currently, I am using an app that scans the grocery item using the upc and then allows me to price the item. I have not been successful in motivating friends and family to contribute to the database. Anyway, I continue to scan and load the $ amount so that I can find the store that has it either on sale or at a consistent price. I also have upc for products that Costco and BJ carry along with prices. Please forward any information about getting prices for products. Thanks for the database. R

    • Editor October 24, 2014 at 4:41 pm - Reply

      Can you share the app that your are using here? Soon we will upload pricing information for most of the 100,000 products in our database.

    • Berly February 17, 2017 at 12:52 am - Reply

      Would you mind sharing your price database

Leave A Comment

show that YOU are HUMAN! * Time limit is exhausted. Please reload the CAPTCHA.