Since there are plenty of online retailers selling hundreds of thousands of grocery and drugstore products, one hopes that the brands and manufacturers would come together to make the information related to their products freely available. In reality, it is extremely hard to find one reliable, extensive, standardized, easy to use and open source grocery database. This project aims to change this bizarre reality and give everyone free and unrestricted access to simple downloadable database files containing UPC centric information about hundreds of thousands of grocery products.
We start with a small installment. Our first file contains a little over 100K grocery products with the following data points: grp_id, upc14, upc12, brand and product_name. We are not including other data points in this file to conserve space and make the handling of the data manageable.
In the next installment we will include information such as detailed product description, product attributes, image file URL’s, category, manufacturer and distributor information. Please note the grp_id (grocery product id) which will serve as key to our database alongside grb_id, etc (unique id’s for brand names, category names, manufacturer, etc).
I recommend that we focus on ‘upc12’ for all the products (grp_id is an internal identifier, use it to join tables and create your version of this database). However, UPC12 is not always available, and for many products in this database it was converted from longer or shorter versions of the UPC.
Anyone dealing with UPC lists knows that despite its stated “standard” status, UPC numbers come in all shapes and forms. Our database was build from a variety of UPC lengths gathered from many sources. But we did our best to convert them to the 12 digit format.
We have used a variety of techniques from simple formatting (adding leading zeros) to calculating check-sum digits for 10 and 11 digit source UPC’s using this Excel / Access formula:
=MOD(10-MOD(SUMPRODUCT(–(MID(A2,{1,3,5,7,9,11},1)))*3+SUMPRODUCT(–(MID(CA2,{2,4,6,8,10},1))),10),10)
Should anyone have a better system, or cleaner data, or ideas, please share your thoughts and data on this page. We can accommodate small requests for formatting or combining data points if it can be useful to the public. Feel free to make suggestions, requests and share your information by commenting on this page. — thanks and enjoy, BenD
UPDATE
Jut uploaded a list of brands, and manufacturers including a count of how many products they each have.
Does anyone knows about data available in India?
Did you get one?
Hi Gaurav,
Did you get this database, if yes then please share it at manoj.km20@gmail.com. Thanks in advance.
Rgds,
MK
Hi,
Great project! I was wondering how you are progressing with the UK grocery database and if you have found different issues vs US.
Many thanks!
Edoardo
Hi Ben,
I am looking for Grocery Database for Saudi Arabia Country. Can i get any thing on the same.
Best Regards,
Rafeeq
Two things:
1. I wonder if a CSV format would be more directly useful than XLS.
2. Just curious how often this file is/will be updated?
Does anyone have a category database? I am looking for a database that contains categories for items sold.
I would be interested in anyone who knows anything about margins per food type. That basically means information on
buy_price, sell_price, waste, item or category.
waste means throwing away expired produce.
I suspect this information is trade secret and hard to come by but hoping somebody has gone down the margins route.
This looks to be the start of a fantastic project! I’m working on a meal planning application and (like other’s have said) I am looking forward to seeing what sort of prices you’re gathering and how you’re organizing them. In the meantime I’m gathering some on my own. I’d love to talk with you about this project, but can’t find an email on the site here. Other than this comment thread, is there an email address I can contact you at?
Hey Jeremy – are you still working on this project? I’m working on developing a scoring system for evaluating/comparing grocery products for health and sustainability. Packaging and recyclability are obviously an important piece when it comes to a products environmental impact, and an area I’m struggling to compare at scale.
Anyone know of other sources for grocery product dimensional information. Would love dimension, UPC, image, etc.
Hi Galen,
I came at this from a similar angle, but just wanting a single database where the packaging recyclabliity is graded (based on the recyclability labeling on the packages themselves) so it’s easier to chose products and supermarkets based on recylability.
How far have you got with your project?
Hey Galen/Alex,
Did you two connect on this? I’m working on a similar project in the UK and would love to discuss. Contact me on LinkedIn /olivermaggsjones
Is there a place to go to learn what products sales the most. Like what salsa has the largest sales in Kansas City, and who is second and thrid?
There are two companies that provide data base on scan activity and other various methods. Google IRI Data and Nielsen Data
This is amazing effort, Is there any updated version of this available with images?
can’t wait to see product details with images. Hope it’s coming soon
This is fantastic! When will the next installment with product details, images, etc be released? Is there a way I can get a preview of this? Thanks!
Looking for a UPC Database with picture of items
We are working on a price database with min and max by generic unit of measure
I doing something similar in the UK. I would be interested in hearing how you got on. Do you have a website?
have a look at https://uk.openfoodfacts.org
Contact me if you’re still working on this
Hi, how can we get additional sets and pix?
Thanks
On a related note: grocerybear has a grocery price API that has prices for products updated daily from 13 cities across the US. This API is super easy to use
Any idea how grocerybear is getting their pricing info?
great work, I’m starting a project and i will be glad to share my database that i create. i so far have not heard anyone talk about my idea :) good for me! and I’m happy to see many other database apps/projects in the comments. i also think i will expand to android after i finish the web application ( simple front end, database and then a complex backend to help with future expansion or work loads. /thumbsup
may be my idea is same as you
Very helpful for the beginners who are looking for UPC codes. Appreciate everyone’s effort behind building the system.
when is the “next installment” coming out?
Hi there,
Looks like a nice resource – I’ve been trying to use it to scrape product images by searching for the product names. This seems to work some of the time, unfortunately the abbreviations in the database aren’t found – i.e. google image does not known that Ssg = sausage or that itln = Italian (there are a lot of them, too many to manually fix I think).
Thanks!
Hi,
This is very useful information. Thanks for uploading.
I have a quick question about grp_id. I am not able to match both UPC and Brand files using this key. e.g. grp_ip 632 is used for different brands in both the files. Can you please correct if I am misinterpreting the data.
Thanks,
Amandeep
grp_id and grb_id appear to be the (internal to grocery.com) unique “Primary Key” in both files and have nothing to do with each other. If you want to relate Grocery_UPC and Grocery_Brands use the “Brand” column in each table. I know it is a longer “text” field but that is how they have it set up right now which allows down loaders to use their choice of database engines to manage their data.
In truth, Grocery_UPC should probably use the grb_id of the Grocery_Brands table in its “Brand” field but for those who only want the UPC file and NOT the brands file, this seems to be the best format. Seems to have been a typo in their description above because you can’t join on those “id” fields. As they say, they will be expanding this – someday…
Hope this helps.
Example:
SELECT tblUPC_Items.upc14, tblUPC_Items.upc12, tblUPC_Items.name, tblUPC_Brands.Brand, tblUPC_Brands.Manufacturer
FROM tblUPC_Items INNER JOIN tblUPC_Brands ON tblUPC_Items.brand = tblUPC_Brands.Brand;
Hope this helps,
Art
Hi,
I only want to let you know about my app for Android where I’ve implemented your DB and it works really fine. It’s a Shopping List app for and has the features of saving all your purchases and give you reports, statistics and more.
It’d be interesting to add images to DB. If you want to mention my app on your website I’ll appreciate it.
You can download it from:
http://slideme.org/application/shoppingtrack
Thank you.
Hi,
I am also interested in participating in the project of building a useful database of products. My angle is recycling. I created a database a products and for each product, I attach a number of elements (parts) with the attribute recyclable or not.
I am downloading your UPC database as I write and hope to integrate it in my app. The app allows users to add products to the database and attach elements to it. If my app gets any love and the database grows, I’ll let you know so we can keep things up to date.
You can download my app here (Android only) :
https://play.google.com/store/apps/details?id=com.apps4better.recycle4better
Good luck.
Jeremy
Quick update, I successfully imported your database in mine.
I have a question: Why do you bother having UPC14 and UPC12 and why do you convert other formats into UPC12? For my app, I just keep the barcode number as it appear on the product. It is easier for me and I think safer for matching the barcode with the database.
Am I missing something?
Thanks anyway for putting together this database.
Jeremy
This is really great! When will you be updating with the next installment? The image urls will be very helpful. For that matter, the ingredients and nutritional info would be huge as well. Thanks very much for your thoughtfulness and help launching this project! We appreciate you!
Are you going to share also nutritional information for the groceries? it would be very hand to have such information for a large dataset
The USDA has a nutritional database here: http://ndb.nal.usda.gov/ Then click about the Database.
I’m not sure what to do with the information that I downloaded. Currently, I am using an app that scans the grocery item using the upc and then allows me to price the item. I have not been successful in motivating friends and family to contribute to the database. Anyway, I continue to scan and load the $ amount so that I can find the store that has it either on sale or at a consistent price. I also have upc for products that Costco and BJ carry along with prices. Please forward any information about getting prices for products. Thanks for the database. R
Can you share the app that your are using here? Soon we will upload pricing information for most of the 100,000 products in our database.
Would you mind sharing your price database
Hello.i need data base of usa and dubai grocery store. Can you provide me.