Open Grocery Database Project

Since there are plenty of online retailers selling hundreds of thousands of grocery and drugstore products, one hopes that the brands and manufacturers would come together to make the information related to their products freely available. In reality, it is extremely hard to find one reliable, extensive, standardized, easy to use and open source grocery database. This project aims to change this bizarre reality and give everyone free and unrestricted access to simple downloadable database files containing UPC centric information about hundreds of thousands of grocery products.

We start with a small installment. Our first file contains a little over 100K grocery products with the following data points: grp_id, upc14, upc12, brand and product_name. We are not including other data points in this file to conserve space and make the handling of the data manageable.

In the next installment we will include information such as detailed product description, product attributes, image file URL’s, category, manufacturer and distributor information. Please note the grp_id (grocery product id) which will serve as key to our database alongside grb_id, etc (unique id’s for brand names, category names, manufacturer, etc).

I recommend that we focus on ‘upc12’ for all the products (grp_id is an internal identifier, use it to join tables and create your version of this database). However, UPC12 is not always available, and for many products in this database it was converted from longer or shorter versions of the UPC.

Anyone dealing with UPC lists knows that despite its stated “standard” status, UPC numbers come in all shapes and forms. Our database was build from a variety of UPC lengths gathered from many sources. But we did our best to convert them to the 12 digit format.

We have used a variety of techniques from simple formatting (adding leading zeros) to calculating check-sum digits for 10 and 11 digit source UPC’s using this Excel / Access formula:

=MOD(10-MOD(SUMPRODUCT(–(MID(A2,{1,3,5,7,9,11},1)))*3+SUMPRODUCT(–(MID(CA2,{2,4,6,8,10},1))),10),10)

Should anyone have a better system, or cleaner data, or ideas, please share your thoughts and data on this page. We can accommodate small requests for formatting or combining data points if it can be useful to the public. Feel free to make suggestions, requests and share your information by commenting on this page.  — thanks and enjoy, BenD

UPDATE

Jut uploaded a list of brands, and manufacturers including a count of how many products they each have.

 

Download File

Comments

48 responses to “Open Grocery Database Project”

  1. Galen

    Hey Jeremy – are you still working on this project? I’m working on developing a scoring system for evaluating/comparing grocery products for health and sustainability. Packaging and recyclability are obviously an important piece when it comes to a products environmental impact, and an area I’m struggling to compare at scale.

    1. William Joseph Odom

      Anyone know of other sources for grocery product dimensional information. Would love dimension, UPC, image, etc.

    2. Alex Hodgson

      Hi Galen,

      I came at this from a similar angle, but just wanting a single database where the packaging recyclabliity is graded (based on the recyclability labeling on the packages themselves) so it’s easier to chose products and supermarkets based on recylability.

      How far have you got with your project?

      1. Oliver Jones

        Hey Galen/Alex,

        Did you two connect on this? I’m working on a similar project in the UK and would love to discuss. Contact me on LinkedIn /olivermaggsjones

  2. Don Johnson

    Is there a place to go to learn what products sales the most. Like what salsa has the largest sales in Kansas City, and who is second and thrid?

    1. Editor

      There are two companies that provide data base on scan activity and other various methods. Google IRI Data and Nielsen Data

  3. Ravi

    This is amazing effort, Is there any updated version of this available with images?

  4. Jaysan

    can’t wait to see product details with images. Hope it’s coming soon

  5. Ryan

    This is fantastic! When will the next installment with product details, images, etc be released? Is there a way I can get a preview of this? Thanks!

  6. Jose

    Looking for a UPC Database with picture of items

  7. Rosen

    We are working on a price database with min and max by generic unit of measure

  8. Leo Moore

    I doing something similar in the UK. I would be interested in hearing how you got on. Do you have a website?

    1. Mike

      Contact me if you’re still working on this

  9. Hi, how can we get additional sets and pix?
    Thanks

  10. Spice Dune

    On a related note: grocerybear has a grocery price API that has prices for products updated daily from 13 cities across the US. This API is super easy to use

    1. Galen

      Any idea how grocerybear is getting their pricing info?

Leave a Reply

Your email address will not be published. Required fields are marked *