Since there are plenty of online retailers selling hundreds of thousands of grocery and drugstore products, one hopes that the brands and manufacturers would come together to make the information related to their products freely available. In reality, it is extremely hard to find one reliable, extensive, standardized, easy to use and open source grocery database. This project aims to change this bizarre reality and give everyone free and unrestricted access to simple downloadable database files containing UPC centric information about hundreds of thousands of grocery products.
We start with a small installment. Our first file contains a little over 100K grocery products with the following data points: grp_id, upc14, upc12, brand and product_name. We are not including other data points in this file to conserve space and make the handling of the data manageable.
In the next installment we will include information such as detailed product description, product attributes, image file URL’s, category, manufacturer and distributor information. Please note the grp_id (grocery product id) which will serve as key to our database alongside grb_id, etc (unique id’s for brand names, category names, manufacturer, etc).
I recommend that we focus on ‘upc12’ for all the products (grp_id is an internal identifier, use it to join tables and create your version of this database). However, UPC12 is not always available, and for many products in this database it was converted from longer or shorter versions of the UPC.
Anyone dealing with UPC lists knows that despite its stated “standard” status, UPC numbers come in all shapes and forms. Our database was build from a variety of UPC lengths gathered from many sources. But we did our best to convert them to the 12 digit format.
We have used a variety of techniques from simple formatting (adding leading zeros) to calculating check-sum digits for 10 and 11 digit source UPC’s using this Excel / Access formula:
=MOD(10-MOD(SUMPRODUCT(–(MID(A2,{1,3,5,7,9,11},1)))*3+SUMPRODUCT(–(MID(CA2,{2,4,6,8,10},1))),10),10)
Should anyone have a better system, or cleaner data, or ideas, please share your thoughts and data on this page. We can accommodate small requests for formatting or combining data points if it can be useful to the public. Feel free to make suggestions, requests and share your information by commenting on this page. — thanks and enjoy, BenD
UPDATE
Jut uploaded a list of brands, and manufacturers including a count of how many products they each have.
Hey Jeremy – are you still working on this project? I’m working on developing a scoring system for evaluating/comparing grocery products for health and sustainability. Packaging and recyclability are obviously an important piece when it comes to a products environmental impact, and an area I’m struggling to compare at scale.
Anyone know of other sources for grocery product dimensional information. Would love dimension, UPC, image, etc.
Hi Galen,
I came at this from a similar angle, but just wanting a single database where the packaging recyclabliity is graded (based on the recyclability labeling on the packages themselves) so it’s easier to chose products and supermarkets based on recylability.
How far have you got with your project?
Hey Galen/Alex,
Did you two connect on this? I’m working on a similar project in the UK and would love to discuss. Contact me on LinkedIn /olivermaggsjones
Is there a place to go to learn what products sales the most. Like what salsa has the largest sales in Kansas City, and who is second and thrid?
There are two companies that provide data base on scan activity and other various methods. Google IRI Data and Nielsen Data
This is amazing effort, Is there any updated version of this available with images?
can’t wait to see product details with images. Hope it’s coming soon
This is fantastic! When will the next installment with product details, images, etc be released? Is there a way I can get a preview of this? Thanks!
Looking for a UPC Database with picture of items
We are working on a price database with min and max by generic unit of measure
I doing something similar in the UK. I would be interested in hearing how you got on. Do you have a website?
have a look at https://uk.openfoodfacts.org
Contact me if you’re still working on this
Hi, how can we get additional sets and pix?
Thanks
On a related note: grocerybear has a grocery price API that has prices for products updated daily from 13 cities across the US. This API is super easy to use
Any idea how grocerybear is getting their pricing info?