Since there are plenty of online retailers selling hundreds of thousands of grocery and drugstore products, one hopes that the brands and manufacturers would come together to make the information related to their products freely available. In reality, it is extremely hard to find one reliable, extensive, standardized, easy to use and open source grocery database. This project aims to change this bizarre reality and give everyone free and unrestricted access to simple downloadable database files containing UPC centric information about hundreds of thousands of grocery products.
We start with a small installment. Our first file contains a little over 100K grocery products with the following data points: grp_id, upc14, upc12, brand and product_name. We are not including other data points in this file to conserve space and make the handling of the data manageable.
In the next installment we will include information such as detailed product description, product attributes, image file URL’s, category, manufacturer and distributor information. Please note the grp_id (grocery product id) which will serve as key to our database alongside grb_id, etc (unique id’s for brand names, category names, manufacturer, etc).
I recommend that we focus on ‘upc12’ for all the products (grp_id is an internal identifier, use it to join tables and create your version of this database). However, UPC12 is not always available, and for many products in this database it was converted from longer or shorter versions of the UPC.
Anyone dealing with UPC lists knows that despite its stated “standard” status, UPC numbers come in all shapes and forms. Our database was build from a variety of UPC lengths gathered from many sources. But we did our best to convert them to the 12 digit format.
We have used a variety of techniques from simple formatting (adding leading zeros) to calculating check-sum digits for 10 and 11 digit source UPC’s using this Excel / Access formula:
=MOD(10-MOD(SUMPRODUCT(–(MID(A2,{1,3,5,7,9,11},1)))*3+SUMPRODUCT(–(MID(CA2,{2,4,6,8,10},1))),10),10)
Should anyone have a better system, or cleaner data, or ideas, please share your thoughts and data on this page. We can accommodate small requests for formatting or combining data points if it can be useful to the public. Feel free to make suggestions, requests and share your information by commenting on this page. — thanks and enjoy, BenD
UPDATE
Jut uploaded a list of brands, and manufacturers including a count of how many products they each have.
great work, I’m starting a project and i will be glad to share my database that i create. i so far have not heard anyone talk about my idea :) good for me! and I’m happy to see many other database apps/projects in the comments. i also think i will expand to android after i finish the web application ( simple front end, database and then a complex backend to help with future expansion or work loads. /thumbsup
may be my idea is same as you
Very helpful for the beginners who are looking for UPC codes. Appreciate everyone’s effort behind building the system.
when is the “next installment” coming out?
Hi there,
Looks like a nice resource – I’ve been trying to use it to scrape product images by searching for the product names. This seems to work some of the time, unfortunately the abbreviations in the database aren’t found – i.e. google image does not known that Ssg = sausage or that itln = Italian (there are a lot of them, too many to manually fix I think).
Thanks!
Hi,
This is very useful information. Thanks for uploading.
I have a quick question about grp_id. I am not able to match both UPC and Brand files using this key. e.g. grp_ip 632 is used for different brands in both the files. Can you please correct if I am misinterpreting the data.
Thanks,
Amandeep
grp_id and grb_id appear to be the (internal to grocery.com) unique “Primary Key” in both files and have nothing to do with each other. If you want to relate Grocery_UPC and Grocery_Brands use the “Brand” column in each table. I know it is a longer “text” field but that is how they have it set up right now which allows down loaders to use their choice of database engines to manage their data.
In truth, Grocery_UPC should probably use the grb_id of the Grocery_Brands table in its “Brand” field but for those who only want the UPC file and NOT the brands file, this seems to be the best format. Seems to have been a typo in their description above because you can’t join on those “id” fields. As they say, they will be expanding this – someday…
Hope this helps.
Example:
SELECT tblUPC_Items.upc14, tblUPC_Items.upc12, tblUPC_Items.name, tblUPC_Brands.Brand, tblUPC_Brands.Manufacturer
FROM tblUPC_Items INNER JOIN tblUPC_Brands ON tblUPC_Items.brand = tblUPC_Brands.Brand;
Hope this helps,
Art
Hi,
I only want to let you know about my app for Android where I’ve implemented your DB and it works really fine. It’s a Shopping List app for and has the features of saving all your purchases and give you reports, statistics and more.
It’d be interesting to add images to DB. If you want to mention my app on your website I’ll appreciate it.
You can download it from:
http://slideme.org/application/shoppingtrack
Thank you.
Hi,
I am also interested in participating in the project of building a useful database of products. My angle is recycling. I created a database a products and for each product, I attach a number of elements (parts) with the attribute recyclable or not.
I am downloading your UPC database as I write and hope to integrate it in my app. The app allows users to add products to the database and attach elements to it. If my app gets any love and the database grows, I’ll let you know so we can keep things up to date.
You can download my app here (Android only) :
https://play.google.com/store/apps/details?id=com.apps4better.recycle4better
Good luck.
Jeremy
Quick update, I successfully imported your database in mine.
I have a question: Why do you bother having UPC14 and UPC12 and why do you convert other formats into UPC12? For my app, I just keep the barcode number as it appear on the product. It is easier for me and I think safer for matching the barcode with the database.
Am I missing something?
Thanks anyway for putting together this database.
Jeremy
This is really great! When will you be updating with the next installment? The image urls will be very helpful. For that matter, the ingredients and nutritional info would be huge as well. Thanks very much for your thoughtfulness and help launching this project! We appreciate you!
Are you going to share also nutritional information for the groceries? it would be very hand to have such information for a large dataset
The USDA has a nutritional database here: http://ndb.nal.usda.gov/ Then click about the Database.
I’m not sure what to do with the information that I downloaded. Currently, I am using an app that scans the grocery item using the upc and then allows me to price the item. I have not been successful in motivating friends and family to contribute to the database. Anyway, I continue to scan and load the $ amount so that I can find the store that has it either on sale or at a consistent price. I also have upc for products that Costco and BJ carry along with prices. Please forward any information about getting prices for products. Thanks for the database. R
Can you share the app that your are using here? Soon we will upload pricing information for most of the 100,000 products in our database.
Would you mind sharing your price database
Hello.i need data base of usa and dubai grocery store. Can you provide me.